Extending biblatex to support multiple scripts

As regular readers will know, I’ve taken an interest in biblatex since it was first developed. Since the original author disappeared, I’ve been at least formally involved in maintain the code. So far, that’s been limited to tackling a few tricky low-level TeX issues, but there are some bigger issues to think about.

Philip Kime, lead Biber and biblatex developer, is keen to extend the LaTeX end to supporting multiple scripts. The Biber end is already done (in the ‘burning edge’ version), and writes to the .bbl file in the format:

 \field{form=original,lang=default}{labeltitle}{Title}
 \list{form=original,lang=default}{location}{1}{%
   {Москва}%
 }
 \list{form=romanised,lang=default}{location}{1}{%
   {Moskva}%
 }

However, that presents a big issue: how to do that without breaking every existing style. Supporting scripts means we need an additional argument for a very large number of commands: some of them need to have two optional arguments, and some of them need to be expandable:

\iffieldundef[form=original,lang=default]{....}

Reading (two) optional arguments and working through keyval options expandably is tricky, which is where I come in. The natural way for me to solve the first problem is to use LaTeX3, and the xparse package. However, that’s a big change for biblatex, so before I (and the rest of the biblatex team) go for this I though it would be worth raising the issue and looking for opinions. The alternative is to write the code into biblatex directly, but it’s complicated and as I’ve already done the job once I’m reluctant to do this!

So, what I want to know is ‘What do users think?’ Is it reasonable to require xparse as part of `biblate

LaTeX3 and document environments

There was a question on the {TeX} Q&A site recently about compiling a document reading

\documentclass{minimal}
\document
a
\enddocument

This doesn’t work: I’ve explained why as my answer to the question. I also mentioned that in LaTeX3 I’d expect this won’t work. Here, I want to look at that in a bit more detail.

First, the background. When LaTeX2e creates an environment foo, what actually happens is two macros called \foo and \endfoo are defined (hence the original question). When you do

\begin{foo}
...
\end{foo}

LaTeX looks for the \foo macro, and if it finds it starts a group and inserts \foo. At the end of the environment, LaTeX will use \endfoo if it exists, but this is not required.

There are some problems with this scheme. First, it’s possible to abuse the system, as something like

\begin{emph}
...
\end{emph}

will not raise an error even though \emph was never intended to be used as the start of an environment. (LaTeX2e does not require that \endemph exists here: it’s only the start of an environment which must be defined.) Secondly, due to this mixing it’s not possible to have a foo environment and independent \foo macro: the two are tied together. Finally, as \endfoo is ‘special’ \newcommand won’t let you create macros which start \end....

For LaTeX3, we want the approach to be different. Document commands and document environments should be independent of one another, and so xparse defines a pair of special internal macros when you use

\NewDocumentEnvironment{foo}...

This is totally independent of a macro called \foo, so the plan is that you’ll be able to have an environment called foo and a separate \foo command, or indeed one called \endfoo.

At present, most people will be using xparse with LaTeX2e. That means we still need to follow what LaTeX2e does. So at the moment you’re not going to see the change: messing with the LaTeX2e mechanism looks like a very bad idea. However, for a native LaTeX3 format the clash will disappear.

From \newcommand to \NewDocumentCommand

Following on from my last post, I thought it might be useful to give some simple example of how the xparse package works and why it’s useful. I want to do this to show end users of LaTeX how it can replace \newcommand, so the example will not involve anything too complex, code-wise.

First, why would you want to use xparse’s \NewDocumentCommand in place of \newcommand? First, \NewDocumentCommand can make macros that take a mixture of arguments that \newcommand cannot. With \newcommand, you can make a macro that takes a number of mandatory arguments, or ones where the first argument is option and in square brackets, but that is it. Anything else then needs the use of TeX programming or internal LaTeX macros: not really helpful for end users. The second thing is that \newcommand macros are not ‘robust’. This shows up where you need to \protect things, which can be very confusing. Macros created with \NewDocumentCommand are robust, and this means that they work more reliably.

I’m going to illustrate moving from \newcommand to \NewDocumentCommand with a series of simple examples. For all of them, you need to load the xparse package:

\usepackage{xparse}

Macros with no arguments

The simplest type of macro is one with no arguments at all. This isn’t going to show off xparse very much but is is a starting point. The traditional method to do this is

\newcommand\NoArgs{Text to insert}

which becomes

\NewDocumentCommand\NoArgs{}{Text to insert}

That does not look too bad, I hope. Notice that I’ve got an empty argument in the xparse case: this is where the arguments are listed, and with \NewDocumentCommand there always has to be a list of arguments, even if it is empty. That’s a contrast with the \newcommand approach, where we only need to mention arguments when there are any.

One or more mandatory arguments

The most common type of argument for a macro is a mandatory one. With \newcommand, we’d give a number of arguments to use:

\newcommand\OneArg[1]{Text using #1}
\newcommand\TwoArgs[2]{Text using #1 and #2}

\NewDocumentCommand is a bit different. Since it can work with different types of argument, each one is give separately as a letter. A mandatory argument is ‘m’, so we’d need

\NewDocumentCommand\OneArg{m}{Text using #1}
\NewDocumentCommand\TwoArgs{mm}{Text using #1 and #2}

This is still pretty similar to \newcommand: the useful stuff starts when life gets a little more complicated.

One of more optional (square brackets) arguments

To really get something clever out of xparse, the arguments need to be a little more varied than I’ve show so far. Let’s look at optional arguments, which LaTeX puts in square brackets. If I want the first argument to be optional, then LaTeX can help me

\newcomand\OneOptOfTwo[2][]{Text with #2 and perhaps #1}
\newcomand\OneOptOfThree[3][]{Text with #2, #3 and perhaps #1}

If I want anything else, I’m on my own (so no more \newcommand examples!). First, let’s do the two example using xparse. An optional argument in square brackets, which works like a \newcommand one, is ‘O’ followed by {}:

\NewDocumentCommand\OneOptOfTwo{O{}m}%
  {Text with #2 and perhaps #1}
\NewDocumentCommand\OneOptOfTwo{O{}mm}%
{Text with #2, #3 and perhaps #1}

How about two optional arguments? It’s pretty obvious:

\NewDocumentCommand\TwoOptOfThree{O{}O{}m}%
  {Text with #3 and perhaps #1 and #2}

What if we want something as a default value for the optional argument? With \newcommand, that would be

\newcommand\OneOptWithDefault[2][default]%
  {Text using #1 (could be the default) and #2}

which would become

\NewDocumentCommand\OneOptWithDefault{O{default}m}%
  {Text using #1 (could be the default) and #2}

The same idea applies to each optional argument: whatever is in the braces after the O is the default value.

More complex optional arguments

You might be wondering why we need the ‘{}c after ‘O’ when there is no default value: why not just ‘o’? Well, there is ‘o’ as well. Unlike \newcommand, \NewDocumentCommand can tell the difference between an option argument that is not given and one that is empty. To do that, it provides a test to see if the argument is empty:

\NewDocumentCommand\OneOptOfTwoWithTest{om}{%
  \IfNoValueTF{#1}
    {Do stuff with #2 only}
    {Do stuff with #1 and #2}%
}

Don’t worry if you forget to do the test: the special marker that is used here will simply print ‘-NoValue-’ as a reminder!

Two types of optional argument

Sometimes you might want two different optional arguments, and be able to tell which is which. This can be done by using something other than square brackets, often using angle brackets (‘<’ and ‘>’). We can do that using the letter ‘d’ (or ‘D’ if we give a default).

\NewDocumentCommand\TwoTypesOfOpt{D<>{}O{}m}%
  {Text using #1, #2 and #3}

What input syntax does this make? Let’s look at some examples

\TwoTypesOfOpt{text}             % One mandatory
\TwoTypesOfOpt[text]{text}       % A normal optional
\TwoTypesOfOpt<text>{text}       % A special optional
\TwoTypesOfOpt<text>[text]{text} % Both optionals

How did that work? The first two characters after the ‘D’ are used to find the optional argument, so in this case ‘<’ and ‘>’.

Finding stars or other special markers

Another common idea in LaTeX is to use a star to indicate some special version of a macro. Creating those with \newcommand is difficult, but it is easy with \NewDocumentCommand

\NewDocumentCommand\StarThenArg{sm}{%
  \IfBooleanTF#1
    {Use #2 with a star}
    {Use #2 without a star}%
}

Here, ‘s’ represents a star argument. You’ll see that it ends up as #1, while the mandatory argument is #2. You’ll also see that there needs to be a test to see if there is a star (\IfBooleanTF). This doesn’t mention stars as the test can be used for other things.

Summing up

There is more to xparse than I’ve mentioned here, but I hope that this is a useful flavour of what it can be used for. To get more flexibility there is a bit more to think about compared to \newcommand, but the overall consistency is hopefully worth it.

Promoting xparse

I’ve had a few chances in recent weeks to promote the xparse package. Regular readers will know that xparse is part of the efforts of the LaTeX3 Project, and is meant to provide a replacement for \newcommand, etc., for creating document macros. The promotions have come up where there are things that are hard to do with \newcommand but easy with xparse. One example was creating an optional argument for a macro but giving it in normal braces. With xparse, this is pretty straight forward

\documentclass{article}
\usepackage{xparse}
\NewDocumentCommand\en{g}{%
  \IfNoValueTF{#1}{\epsilon}{\epsilon_{#1}}%
}
\begin{document}
\( \en \) and \( \en{stuff} \).
\end{document}

This is clear, and does not need any knowledge of the internals of the LaTeX3 approach. So I’m going to keep promoting xparse for day to day LaTeX users, even if they aren’t using much code. Hopefully this will make life easier for them, and for me, and will be a real benefit now from the LaTeX3 work.

LaTeX3: xparse

The next step for LaTeX3 development is to revise the two existing “xpackages” which are available to make a link between the code level and the user: xparse and template. Of the two, the xparse package is by far the easier to understand.

In LaTeX2e, you can either use the LaTeX method to create new commands:

\newcommand*\mycommand[2][]{%
  code goes here, using #1 and #2
}

or you can do things yourself using a mixture of TeX primitives and LaTeX internal functions (and ignoring the issue of default value):

\def\mycommand{%
  \@ifnextchar[{%
    \@mycommand
  }{%
    \@mycommand[]%
  }%
}
\def\@mycommand[#1]#2{%
  code goes here, using #1 and #2
}

This does not make for code which is easy to alter to reflect different input syntax (for example XML), or to change internal functions without knowing how the user input works. The idea of xparse is to separate out the user syntax from the internal code. Currently, exactly what tools need to be available is still being decided. The current version of xparse would expect the following syntax:

\DeclareDocumentCommand \mycommand { o m } {
  code goes here, using #1 and #2
}

The idea is that each argument is represented by a letter, for example o for an optional argument, m for a mandatory one or s for an optional star. This makes it possible it see immediately from the definition how many arguments are needed (one for each letter). It also means that the internal functions, which implement things, can be separated totally from the user part of the system. In that way, internal functions can have a fixed number of arguments, and leave xparse to supply them. So if the internal function needs to be changed, it does not matter how it is used, or vice versa.

That all sounds very good, but there are some outstanding issues. For example, handling verbatim arguments is not straight-forward. There are a couple of possible approaches to this. Either stick to a simple system, and accept that not everything can be done in the same way, or make the system more flexible but complicated. I’m currently in favour of the first approach: almost every user function is simple (especially as we have the ε-TeX extensions and so can \scantokens our way out of a lot of problems). I’d be interested to hear what other people think would be useful.