Too much information!

A recent announcement on comp.text.tex for a new package, silence, coincides with some discussion in the LaTeX3 team about how to handle messages for the user. The LaTeX3 stuff is currently looking at the low-level side of the area, whereas silence is dealing with things for the user.

The problem of “too much information” is clearly one that has attracted attention. An awful lot of what LaTeX prints is not interesting, most of the time. I’d say that a better model would be less, more targetted information as standard. A “developer” mode, printing more detail, is still needed but to be honest even then do many people care about some of the stuff that gets logged. Most of the time, I’d say no.

The silence approach (filtering on a per-package basis and also based on specific text in messages) is clearly very powerful but somewhat complicated. I’d hope that the LaTeX3 team will provide some filtering along with properly named messages (rather than the current situation where messages just appear directly in the code). Perhaps not quite as powerful as silence, but a lot better than at the moment.

LaTeX3 as a low-level language

There is quite a lot going on with the low level code for LaTeX3 at the moment. The number of commits to the code is ticking over nicely, as the code is revised and Will Robertson gets the test system written. Will has taken on a thankless task with this job, and I think is owed a debt of gratitude by everyone interested in LaTeX3.

One thing that is clear is that LaTeX3 (at the low level) is a programming language in itself, distinct from TeX. This is something of a risk, as it means that you cannot simply take what is done currently and convert it to the new system without thinking. On the other hand, the idea is to provide a system which makes programming easier and clearer, with some of the “features” of TeX hidden underneath a working LaTeX3 layer. The aim is that expansion and the somewhat odd methods for assignment to low-level TeX variables become something only the kernel team need to worry about.

It seems to me that there is a “window of opportunity” for the kernel team to show that something can be delivered, and that LaTeX3 as a programming language is the way to do this. The arrival of LuaTeX will bring real programming to TeX: Karl Berry has pointed out that this will bring programmers to the TeX world who would never consider writing serious (La)TeX code. So I think that LaTeX3 needs to show that the language is viable before widespread take-up of LuaTeX means that no-one is interested.

With the LaTeX3 low-level language approaching stability, I’d suggest the team need to make the next step and move on to document design and user macros. There are lots of ways this could be done, but an obvious one would be to produce a “microkernel”. By this, I mean something which can do the same as:

\documentclass{minimal}
\begin{document}
Hello World!
\end{document}

without using LaTeX2ε at all. I’d expect the user syntax to change a bit, but in essence the minimal LaTeX document is not going to change (at least, not if LaTeX3 is to succeed).

The experts will know that there is quite a lot of work to get to the test file I’ve suggested, and so going from where LaTeX3 is now to a working microkernel is non-trivial. However, it would be a good opportunity to demonstrate that real (usable) progress is happening, and would also avoid the problems associated with trying to build everything from the bottom up with no top level in sight. A microkernel would show that delivery is possible, and I hope raise interest in LaTeX3 from the current and potential development community.

Of course, a microkernel will still leave a lot to do. However, with something to build on I’d expect interest and ideas to accumulate and the new system to grow reasonably fast.

Low-level definition changes

The current LaTeX3 refactor is examining a number of different parts of the LaTeX3 code base. Although the code ideas work well, in general, they’ve built up over some time and this means that not everything is consistent. At the same time, issues that have happened with LaTeX2ε are helping to inform ideas about what is needed for LaTeX as a programming language.

Two somewhat related issues to do with definition are currently being revised. The first concerns the TeX \long definition concept. At a user level, restricting function input so that an error occurs with a \par token is normally a good idea. Often, trying to send a \par token to a user function indicates that closing brace, which is not a good thing. However, for a programming language (expl3) things are different. The general programming tools in expl3 need to handle any input, with validation on the boundary between user and internal functions. For that reason, many of the functions provided by expl3 are \long. As part of the refactor, the standard method for creating new functions will be to make them \long: you have to specifically ask for restricted arguments. In many ways, this is the same as the \newcommand versus \newcommand* situation in LaTeX2ε: a bit more typing is needed if you don’t want to accept paragraphs.

One area that is being improved is the “module prefixes”. A lot of these are quite logical (for example, everything to do with comma-separated lists starts \clist), but some of the more basic parts of the language are more variable. The basic TeX definition primitives \let, \def and \edef were originally simply given argument specifiers becoming \let:NwN, \def:Npn and \def:Npx, respectively (and so on for related primitives). None of these names have any module name at all: not really good for consistency. The team are now moving to a radically different idea, dropping terms such as “let” and “def” entirely. All of the functions are given the module prefix \cs, and by analogy with other parts of LaTeX3, these functions can all be regarded as setting something. This leads to names such as:

  • \cs_set_eq:NwN (\let)
  • \cs_set:Npn (\long\def)
  • \cs_set:Npx (\long\edef)
  • \cs_set_nopar:Npn (\def)
  • \cs_set_protected:Npn (\protected\long\def)
  • \cs_set_protected_nopar:Npn (\def) (\protected\def)

Globally setting is simply a case of replacing set by gset:

  • \cs_gset_eq:NwN (\global\let)
  • \cs_gset:Npn (\global\long\def)
  • \cs_gset:Npx (\global\long\edef)
  • \cs_gset_protected:Npn (\global\long\protected\def)
  • \cs_gset_nopar:Npn (\global\def)

For creating new functions (where a check is made first in the hash table), the set (or gset) term is replaced by new (or gnew), again following the pattern elsewhere in LaTeX3. The above is all illustrated with the basic argument specifiers Npn and Npx, but the full range of variants are of course still available. Notice how the shorter definition names are \long, and to get a restricted definition the nopar term is needed.

Overall, this seems quite a logical change for LaTeX3. As a self-consistent programming language, it all adds up (although it will take a bit of getting used to!).

V is for variable

There is currently a lot of activity going on with the LaTeX3 code base, as the team work through various issues about the code. One of the ongoing changes is to the argument specifiers used in the code, where some rationalisation is taking place. Perhaps the most interesting new idea being implemented is the v/V specifier for variables.

The idea of the two new specifiers is that the day-to-day LaTeX programmer should not need to worry about complex runs of \expandafter primitives, or how variables are stored at a TeX level. LaTeX3 has some variables which are TeX primitive types (such as _toks or _int) and others which are stored as macros (_tlp and _clist, for example). Using the V specifier, you get the content of a variable, independent of the storage method and without needing to think about expansion. So \foo:V \l_variable_type is equivalent to \foo:n {content of l_variable_type}. We also have the v specifier, which first constructs a csname before getting the content: \foo:v {l_variable_type}.

This is quite a departure from the TeX or LaTeX2ε way of thinking about things, but makes LaTeX3 variables more more similar to those of other languages. As yet, the new specifiers have not been fully deployed in the expl3 code. However, as they are I’d expect the clarity this idea brings to be very welcome. I’m looking forward to trying it out in the experiments I’ve done with LaTeX3.

LaTeX3 argument specifiers improvements

One of the key ideas of LaTeX3 is argument specifiers. These are part of the name of a function which tell you both how many arguments it needs, and what happens to them. Each argument gets a single letter to describe how it is processed. One of the key things this does is make expansion much easier, as \expandafter runs can be avoided. It also makes it easier to have a family of similar functions which take subtly different arguments. So we might have \foo:N, which takes a macro name as an argument, and \foo:c, which creates a csname from its argument.

There have been a lot of ideas about what argument specifiers to use. This has led to a rather extended set of letters in use at the moment. The team have been reviewing them, as there are clearly too many. It looks like most of the ideas are now sorted: my personal interpretation of the plan is laid out below. The letters are best thought of in a few different “classes”, and all stand for something in English.

First, there is the D specifier, which means do not use. All of the TeX primitives are initially \let to a D name, and some are then given a second name. Only the kernel team should use anything with a D! Currently, there are a few primitives that you might need that have only got a D name in LaTeX3, but this should (hopefully) be sorted soon.

Next, there are two specifiers for no manipulation (pass exactly as given). For a single token, the specifier is N, whereas for one or more tokens in braces the specifier is n. Usually, if you use a single token for an n argument, all will be well. So for example \foo:Nn \ArgumentOne {ArgumentTwo} then expects to process “\ArgumentOne” and “ArgumentTwo”.

Next, and deserving a class of its own, is the c specifier for csname. A c argument will be turned into a csname before being used. So \foo:c {ArgumentOne} will act in the same way as \foo:N \ArgumentOne.

Related to n and N are the v and V specifiers, which mean value of variable. A variable in LaTeX3 can be a primitive TeX construct (such as a count, toks, muskip, etc.) or a TeX macro used to store a value (a “tlp” in LaTeX3 terminology). TeX lets us store unexpanded content in a macro using \def, or fully expanded using \edef. The v and V specifiers are used to get the content of a variable without needing to worry about how many expansions to use, which depends on whether the variable is a TeX count, a macro, a toks, etc. A V argument will be a single token (similar to N), so we might have \foo:V \MyVariable; on the other hand, using v a csname is constructed first, and then the value is expanded, for example \foo:v {MyVariable}. The key point here is that depending on the underlying nature of the variable (count, toks, macro, etc.) the number of expansions may vary, but at the LaTeX3 level the programmer does not need to worry about this.

There will still be places where some control of expansion is needed. Rather than need to count \expandafters, LaTeX3 provides the argument specifiers o, d, f and x. Here, o means one expansion, d double expansion and x exhaustive expansion (\edef). The f specifier stands for full expansion, and in contrast to x stops at the first non-expandable item without trying to execute it. The difference between x and f is subtle but allows some clever expansion tricks at a low level in expl3. In all cases, the argument is expanded before passing to the underlying function. So \foo:o \SomeInput will expand \SomeInput once and pass the result to \foo:n, whereas \foo:x \SomeInput will \edef \SomeInput before sending that result to \foo:n. Thus expansion becomes a matter of a single letter change.

For logic tests, there are the branch specifiers T (true) and F (false). For numerical tests, there is also the C specifier which means comparison. This should be something which can be tested numerically, for example { \MyCount > 10 }. All three specifiers treat the input in the same way as n (no change), but make the logic much easier to see: \foo:CTF { \MyCount > 10 } { Bigger } { Not-bigger }.

The letter p is used for primitive TeX arguments (or parameters). This means whatever you might put after \def (which is given the name \def:Npn in LaTeX3). This could be as simple as \def:Npn \foo:N #1 { Some-code }, or can even be entirely blank: \def:Npn \foo: { Code-here }.

Finally, there is the w specifier for weird arguments. This covers everything else, but mainly applies to delimited values (where the argument must be terminated by some arbitrary string). For example \def:Npn \foo:w #1 \stop { Some-code } needs to have \stop somewhere in the input, and is therefore weird.

There are quite a lot of letters there, but there is also a logic and it soon becomes very easy to see which one you need. All of the letters have names reflecting what they mean, so hopefully the pattern soon becomes clear.

Is creating a unit a “user” function?

Working on siunitx version 2, I’m confronted with a slight problem. Currently, the macros used to manipulate units are called

  • \newunit
  • \renewunit
  • \provideunit

which follows the LaTeX 2ε \newcommand, etc.; the same is true for prefixes and so forth. These names were probably not the best choice: for example, biblatex also has a \newunit macro (although there is not a clash, luckily).

I’ve been thinking of better names, but I’m not sure whether these should be document level (all lower-case), or design level (mixed upper-case and lower-case). Some ideas:

  1. \DeclarePhysicalUnit (create without checks)
  2. \NewPhysicalUnit (create with checks: would require \RenewPhysicalUnit, etc.)
  3. \NewUnit (as 2 but shorter)
  4. \newphysicalunit (as 2 but document level)
  5. \createphysicalunit (is “create” better than “new”?)
  6. \createunit (avoids the confusion with biblatex)

You’ll see that I’ve not included “SI” in any of the above: it looks odd, and of course you can create non-SI units anyway. How do other people see this?