Programming LaTeX3: More on expansion

In the last post, I looked at the idea of expandability, and how we can use x-type expansion to exhaustively expand an argument. I also said that there was more to this, and hinted at two other argument specifications, f– and o-type expansion. We need these because TeX is a macro-expansion language, and while LaTeX3 coding does hide some of this detail it certainly does not get rid of all of it. Both of these forms of expansion are somewhat specialised, but both are also necessary!

Full (or forced) expansion

The f-type (‘full’ or ‘force’) expansion argument is in some ways similar to the x-type concept, as it’s about trying to expand as much as possible. So

\tl_set:Nn \l_tmpa_tl { foo }
\tl_set:Nn \l_tmpb_tl { \l_tmpa_tl }
\tl_set:Nx \l_tmpc_tl{ \l_tmpb_tl }
\tl_show:N \l_tmpc_tl

and

\tl_set:Nn \l_tmpa_tl { foo }
\tl_set:Nn \l_tmpb_tl { \l_tmpa_tl }
\tl_set:Nf \l_tmpc_tl{ \l_tmpb_tl }
\tl_show:N \l_tmpc_tl

give the same result: everything is expanded, and \l_tmpc_tl contains ‘foo’. There are two crucial differences, however. First, x-type variants are not expandable, even if their parent function was. On the other hand, f-type expansion is itself expandable. So something like

\cs_new:Npn \my_function:n { \tl_length:n {#1} }
\int_eval:n { \exp_args:Nf \my_function:n { \l_tmpa_tl } + 1 }

will work as we’d want: \l_tmpa_tl will be expanded, then processed by \my_function:n and the result will be evaluated as an integer. Try that with \exp_args:Nx and it will fail.

The second difference is what happens when we hit a non-expandable token. With x-type expansion, TeX will look at the next thing in the input, and so tries to expand everything in the input (hence ‘exhaustive’). On the other hand, f-type expansion stops when the first non-expandable token is found. So

\tl_set:Nn \l_tmpa_tl { foo }
\tl_set:Nn \l_tmpb_tl { bar }
\tl_set:Nf \l_tmpc_tl { \l_tmpa_tl \l_tmpb_tl }
\tl_show:N \l_tmpc_tl

will show

foo\l_tmpb_tl

That happens because the f in foo is not expandable. So f-type expansion stops before it gets to \l_tmpb_tl, while x-type expansion would keep going.

The second point is important as it means that some functions will not give the expected result if used inside an f-type expansion. We show this in the code documentation for LaTeX3: functions which fully expand inside both f– and x-type expansions are shown with a hollow star, while those that only work inside an x-type expansion are shown with a filled star.

As you might pick up, f-type expansion is somewhat specialised. It’s useful when creating expandable commands, but for non-expandable ones x-type expansion is usually more appropriate.

Expanding just the once

As TeX is a macro expansion language, there are some tasks that are best carried out, or even only doable, using an exact number of expansions. To allow a single expansion, the argument specification o (‘once’) is available. To use this, you need to know what will happen after exactly one expansion. Functions which may be useful in this way have information about their expansion behaviour included in the documentation, while token list variables also expand to their content after exactly one expansion. Examples using o-type expansion tend to be low-level: perhaps the best example is dropping the first token from some input, so

\tl_set:No \l_tmpa_tl { \use_none:n tokens }
\tl_show:N \l_tmpa_tl

will show ‘okens’.

As with f-type expansion, expanding just once is something of a specialist tool, but one that is needed. It also completes the types of argument specification we can use, so we’re now in a position to do some more serious LaTeX3 programming.

Programming LaTeX3: Expandability

In the last part, I looked at integer expressions, and how they can be used to calculate integer values. What I did not do was say exactly what can go inside an integer expression. That’s because it links in to a wider concept, and one that is very familiar to TeX programmers: expandability.

What is expandability?

To understand expandability, we need to think about what TeX does when we use functions. TeX is a macro expansion language, and as I’ve already said that means that LaTeX3 is too. When we use a function in a place where TeX can execute all of the built-in commands (‘primitives’), we don’t really need to worry about that too much. However, there are places where life is more complicated, as TeX will only execute some of the primitives. These places are ‘expansion contexts’. In these places, only some functions will work as expected, and so it’s important to know what will and will not work.

LaTeX3 and expandability

For traditional TeX programmers, understanding expandability means knowing the rules that TeX applies to decide what can and cannot be expanded. For LaTeX3, life is different as the documentation includes details of what will and will not work. If you read the documentation, you will see that some functions are marked with a star: those are expandable. For the moment, we won’t worry about the non-starred functions, other than to note that we can’t use them when we need expandability.

So sticking with our starting point, integer expressions, if we look at a function like \int_eval:n, this leads to the conclusion that the argument can only contain

  • Numbers, either given directly or stored inside variables
  • The symbols +, -, *, \, ( and )
  • Functions which are marked with a star in the LaTeX3 documentation, plus any arguments these themselves need.

That hopefully makes some sense: \int_eval:n is supposed to produce a number, and so what we put in should make sense for turning into a number. The same idea applies to all of the other integer expression functions we saw last time.

Expansion in our own functions

Integer expression functions make a good example for expandability, but if that was the only area that expansion applied it would not be that significant. However, there are lots of other places where we want to carry out expansion, and this takes us back to the idea of the argument specification for functions. There are three expansion related argument specifiers: f, o and x. Here, I’m going to deal just with x-type expansion, and will talk about f– and o-type expansion next time!

So what is x-type expansion? It’s exhaustive: expanding everything until only non-expandable content remains. We’ve already seen the idea that for example \tl_put_right:NV is related to \tl_put_right:Nn, so that we can access the value of a variable. So it should not be too much of a leap to see a relationship between \tl_set:Nn and \tl_set:Nx. So when we do

\tl_set:Nn \l_tmpa_tl { foo }
\tl_set:Nn \l_tmpb_tl { \l_tmpa_tl }
\tl_set:Nx \l_tmpc_tl { \l_tmpb_tl }
\tl_show:N \l_tmpc_tl

TeX exhaustively expands: it expands \l_tmpb_tl, and finds \l_tmpa_tl, then expands \l_tmpa_tl to foo, then stops as letters are not expandable. Inside an x-type expansion, TeX just keeps going! So we don’t need to know much about the content we are expanding.

With a function such as \int_eval:n, the argument specification is just n, so you might wonder why. The reason is that x-type functions are (almost) always defined as a variant of an n-type parents. So functions that have to expand material (there is no choice) just have an n. (There are also a few things that \int_eval:n will expand that and x-type argument will not, but in general that’s not an issue as it only shows up if you make a mistake.)

Functions that can be expanded

I’ve said that functions that can be expanded fully are marked with a star in the documentation, but how do you make your own functions that can be expanded? It’s not too complicated: a function is expandable if it uses only expandable functions. So if all of the functions you use are marked with a star, then yours would be too.

There is a bit more to this, though. We have two (common) ways of creating new functions

  • \cs_new:Npn
  • \cs_new_protected:Npn

The difference is expandability. Protected functions are not expandable, whereas ones created with \cs_new:Npn should be. So the rule is to use \cs_new_protected:Npn unless you are sure that your function is expandable.

LaTeX2e hackers note: What about \protected@edef?

Experienced TeX hackers will recognise that x-type expansion is build around TeX’s \edef primitive. If you’ve worked a lot with LaTeX2e, you’ll know that with any user input you should use \protected@edef, rather than \edef, so that LaTeX2e robust commands are handled safely.

If you are working with LaTeX2e input, you’ll still need to use \protected@edef to be safe when expanding arbitrary user input. All LaTeX3 functions are either fully-expandable or engine-protected, so don’t need the LaTeX2e mechanism, but of course that is not true for LaTeX2e commands.

Programming LaTeX3: Integers and integer expressions

In the last entry, I talked about token list variables. As we’ve seen, these can be used to hold basically anything, but at the cost that there is no internal structure. I’ve also hinted that LaTeX3 provides a number of richer data types. One that we will need sooner rather than later is the int type for storing integers. At the same time, we can look more widely at what are called integer expression: calculations which work with whole numbers.

Storing integers

Based on what we have already seen with token lists, it should be no surprise that we can create and set int variables with function names you might be able to guess:

\int_new:N \l_my_a_int
\int_set:Nn \l_my_a_int { 1 + 1 }
\int_show:N \l_my_a_int % => '2'

Creating and setting the variable should seem easy enough here, but you might wonder about the result of showing the content here: it’s not what we put in. That’s because LaTeX3 treats the second argument of \int_set:Nn as an integer expression: something to be evaluated to give an integer.

Integer expressions

All LaTeX3 functions which work with integers are set up to evaluate integer expressions, so it’s important to understand what they do. Expressions can use the standard arithmetic operations +, -, * (times) and /, plus parentheses. There are also some functions available for additional more complicated mathematical operations (for example \int_mod:nn to calculate the remainder on division).

More significantly, we can include other functions which themselves yield integers. For example, we’ve seen that it’s possible to work out the length of a token list, which is an integer:

\int_set:Nn \l_my_a_int { \tl_length:n { Hello } * 2 } % => 10

We can’t use any function here: there are some restrictions. Clearly we need to get an integer out, but the functions also need to be expandable: that will be the topic of the next post!

Integer conditionals

A key use of integers is in conditionals. Earlier, we saw that conditionals in LaTeX3 are defined so that we have distinct true and false branches to follow. That applies to integer conditionals in exactly the same way as anything else

\int_new:N \l_my_b_int
\int_set:Nn \l_my_b_int { 7 }
\int_compare:nTF { 1 = \l_my_a_int }
  { TRUE }
  { FALSE }
\int_compare:nNnTF { \l_my_a_int } = { \l_my_b_int }
  { TRUE }
  { FALSE }

You might wonder what is going on here: there are two different conditionals, both of which do a comparison. Well, there are two types of integer conditionals. The first type works out where the comparator is, and so only requires three arguments. The second type has to be given the two integer expressions to compare separately. It’s a bit more awkward to read, but the latter version is faster (it’s closer to the underlying TeX). You can pick whichever one you prefer: as I work on low-level code, I go for speed!

Closely related to conditionals are loops, and again these come pre-defined.

\int_zero:N \l_my_a_int % Hopefully obvious!
\int_while_do:nn { \l_my_a_int < 10 }
  {
    \int_use:N \l_my_a_int \\
    \int_incr:N \l_my_a_int
  }

Hopefully most of this code is clear: we zero the counter, then loop until it reaches 10. For each loop, I’ve printed (used) the value directly, then incremented it by one. (There are a whole family of these functions, with do_while in addition to while_do and nNn versions as for conditionals.)

Integer expressions beyond \int_ functions

Integer expressions are not limited to \int_ functions. Indeed, we’ve already seen one in \prg_replicate:nn. This illustrates a general point: anywhere that LaTeX3 expects an integer, it’s coded to accept integer expressions.

One function that I can’t miss out here is \int_eval:n, which just works out the value of the expression and leaves it in the input. It underlies a lot of the higher-level use of integer expressions, and we are certain to meet it later.

Programming LaTeX3: More on token list variables

In my previous post, I introduced the idea of a token list variable, the LaTeX3 term for a macro used to store ‘stuff’. Token list variables (tl vars) are the basis of many of the higher level data types in LaTeX3, and they also have arbitrary contents. As a result, there are a lot of generic functions to do things with tl vars.

Adding content, changing content

A very common thing to do with stored material is either to add to it, which we can do either on the left or the right. The LaTeX3 functions to do this are called \tl_put_left:Nn and \tl_put_right:Nn (and so on), which makes it easy to build up complicated material quite quickly. So

\tl_new:N \l_my_a_tl
\tl_set:Nn \l_my_a_tl { stuff }
\tl_put_right:Nn \l_my_a_tl { ~here }
\tl_put_left:Nn \l_my_a_tl { My~ }
\tl_use:N \l_my_a_tl

will print ‘My stuff here’. That’s easy enough to do without LaTeX3 coding, but find-and-replace is a bit more involved. So the functions

\tl_replace_once:Nnn and \tl_replace_all:Nnn are working a little harder:

\tl_set:Nn \l_my_a_tl { stuff~to~change }
\tl_replace_once:Nnn \l_my_a_tl { change } { alter }
\tl_use:N \l_my_a_tl % 'stuff to alter'
\tl_replace_all:Nnn \l_my_a_tl { t } { q }
\tl_use:N \l_my_a_tl % 'squff qo alqer'

Adding one tl var to another

So far, I’ve added literal input to tl vars. That’s useful, but a very common task to to combine two or more variables together. To do that, we need a way to access the content of a variable. First, what doesn’t work is doing

\tl_new:N \l_my_b_tl
\tl_set:Nn \l_my_a_tl { stuff }
\tl_set:Nn \l_my_b_tl { ~more~stuff }
\tl_put_right:Nn \l_my_a_tl { \l_my_b_tl }

as what ends up inside

\l_my_a_tl is stuff\l_my_b_tl. This is where LaTeX3’s expansion control comes into play. So far, we’ve seen arguments of type N and n, but there are others. There are a number of other types, but I want here to introduce just one one: V. A V-type argument will pass the value of a variable, rather than its name. So the correct way to add the content of one token list variable to another is

\tl_set:Nn \l_my_a_tl { stuff }
\tl_set:Nn \l_my_b_tl { ~more~stuff }
\tl_put_right:NV \l_my_a_tl \l_my_b_tl

which results in \l_my_a_tl containing stuff more stuff. Now, the LaTeX3 kernel does not provide every possible combination of argument types (although it does provide \tl_put_right:NV). That’s not a problem, as they can easily be created:

\cs_generate_variant:Nn \tl_put_right:Nn { NV }

This is a ‘soft’ process: if the variant requested already exists, nothing happens, but otherwise the variant is created. So provided the base function exists, you can always create any variants you need.

Mappings

Another key idea when working with tl vars is the ability to map to each token they contain. For that, there are again a couple of useful functions, \tl_map_function:NN and \tl_map_inline:Nn. The two differ mainly in expandability, a concept we’ve not covered just yet! I’ll be coming back to that in a later post, so for the moment I’ll just use \tl_map_inline:Nn. What does a mapping do? Try

\tl_set:Nn \l_my_a_tl { stuff }
\tl_map_inline:Nn \l_my_a_tl { I~saw~'#1'. \\ }

and you should get a listing of each separate token in the tl var:

I saw ‘s’. I saw ‘t’. I saw ‘u’. I saw ‘f’. I saw ‘f’. As you can hopefully see, within the second argument of \tl_map_inline:Nn the place holder #1 is used to insert a single token from the tl var. For a more complicated tl var

\tl_set:Nn \l_my_a_tl { { stuff } ~ { which } ~ is ~ { complicated } }
\tl_map_inline:Nn \l_my_a_tl { I~saw~'#1'. \\ }

we get

I saw ‘stuff’. I saw ‘which’. I saw ‘i’. I saw ‘s’. I saw ‘complicated’. So you’ll see that spaces are ignored by the mapping, and that a brace group counts as a single item. I’ve not covered every token list and token list variable function, but hopefully the basic concepts are now laid out. In the next post, I’ll move on to some other concepts, so that we can being to put more structures together.

Programming LaTeX3: Token list variables

In the last post, I talked about the concept of a token list and some general functions which act on token lists. That’s fine if you just want to take some input and ‘do stuff’, but a very common requirement when programming is storing input, and for that we need variables. LaTeX3 provides a number of different types of variable: we’ll start with perhaps the most general of all, the token list variable.

Token list variables

So what is a token list variable (‘tl’)? You might well guess from the name that its a way of storing a token list! As such, a tl can be used to hold just about anything, and indeed this means that several of the other variable types we’ll meet later are tls with a special internal structure.

Before we can save anything in a tl, we need to create the variable: this is a general principle of programming LaTeX3. We can then store something inside the variable by setting it:

\tl_new:N \l_mypkg_name_tl
\tl_set:Nn \l_mypkg_name_tl { Fred }

Hopefully, the analysis of this code is not too hard. First, \tl_new:N creates a new token list variable which I’ve called \l_mypkg_name_tl. (I’ll explain how the naming works in a little while.) The second line will set the new tl to contain the text Fred. Assuming that the surrounding code has done nothing strange, we’ve stored four letter tokens in \l_mypkg_name_tl.

As I said, a tl can contain anything: we are not limited to letters. So

\tl_new:N \l_mypkg_other_tl
\tl_set:Nn \l_mypkg_other_tl { \ERROR ^ _ # $ ! }

is also perfectly-valid for the content of a token list variable (although whether we’ll be able to use it safely is a different matter).

Variable naming and TeX’s grouping system

From the earlier discussion of the way that functions are named in LaTeX3, it might be obvious that there is also a system to how variables are named. Skipping over the initial \l_, what we’ve got is a module name (mypkg), some further description of the nature of the variable (in this case name), and finally the variable type (tl), divided up by _ in exactly the same way we did for functions. We’ll see that other variables follow the same scheme.

So what’s the leading \l_ about? This tells us about the scope that we should use when setting the variable. As TeX is a macro expansion language, variables are not local to functions. However, they can be local to TeX groups, which are created in LaTeX3 using

\group_begin:
% Code here
\group_end:

Setting a variable locally means that any changes stay within a group

\tl_new:N \l_mypkg_name_tl
\tl_set:Nn \l_mypkg_name_tl { Fred }
\group_begin:
  \tl_set:Nn \l_mypkg_name_tl { Ginger }
\group_end:
% \l_mypkg_name_tl reverts to 'Fred'

On the other hand, we sometimes need global variables which ignore any groups

\tl_new:N \g_mypkg_name_tl
\tl_gset:Nn \g_mypkg_name_tl { Fred }
\group_begin:
  \tl_gset:Nn \g_mypkg_name_tl { Ginger }
\group_end:
% \g_mypkg_name_tl still 'Ginger'

So the \l_ or \g_ tells you what scope the variable contents have, and whether you should set or gset it. (You can probably work out that gset means ‘set globally’.)

Using the content of token list variables

Okay, putting stuff into token list variables is all very well and good, but unless we can do something with the content then it’s not really that useful. Of course, we can do things with the content of variables. The most basic thing to do is simply to insert the content of the tl into the input that TeX is working with

\tl_use:N \l_mypkg_name_tl

That’s very handy, but we can also examine the content of a token list variable. For example, we saw before that \tl_length:n will produce the length of a token list, and we can do the same for a token list variable using \tl_length:N.

\tl_set:Nn \l_mypkg_name_tl { Fred }
\tl_length:N \l_mypkg_name_tl % '4'

There’s a lot more we can do with token list variables, but this post is already long enough, so I’ll come back to more that we can do with them in the next post.

Tips for TeX programmers: the internals of token list variable

Experienced TeX programmers are probably wondered about token list variables, and in particular exactly what the underlying TeX structure is. A tl is just a macro that we are using as a variable rather than function. That should not be too much of a surprise, as storing tokens in macros is very much basic TeX programming. So \tl_set:Nn is almost the same as the \def primitive.

What might worry you slightly is that I said

\tl_new:N \l_mypkg_other_tl
\tl_set:Nn \l_mypkg_other_tl { \ERROR ^ _ # $ ! }

will work. That won’t work with \def, and you’d normally expect to need a token register (toks) for this. However, we don’t use toks for LaTeX3 programming at all, and that’s because we require e-TeX. So

\tl_set:Nn \l_mypkg_other_tl { \ERROR ^ _ # $ ! }

is actually the same as

\edef \l_mypkg_other_tl { \unexpanded { \ERROR ^ _ # $ ! } }

which will allow us to put any tokens inside a macro.

The other thing you might notice is that I’ve said that tls have to be declared, even though at a TeX level this is not the case. This is a principle of good LaTeX3 programming, and although it’s not enforced as standard any non-declared token list variables are coding errors. You can test for this using

\usepackage[check-declarations]{expl3}

which uses some slow checking code to make sure that all variables are declared before they are used.

Programming LaTeX3: Category codes, tokens and token lists

Understanding LaTeX3 programming relies on understanding TeX concepts, and one we need to get to grips with is how TeX deals with tokens. Experienced TeX programmers will probably find the first part of this post very straight-forward, so might want to skim read the start!

Category codes and tokens

When TeX reads input, it is not only the characters that are there that are important. Each character has an associated category code: a way to interpret that character. The combination of a character and it’s category code then sets how TeX will deal with the input. For example, when TeX read ‘a’ it finds that it’s (normally) a letter, and so tokenizes the input as ‘a, letter’. This seems pretty obvious: ‘a’ is a letter, after all. But this is not fixed, at least for TeX. I’ve already mentioned that within the LaTeX3 programming environment : and _ can be part of function names: that’s because they are ‘letters’ while we are programming! What’s of most importance now is that a control sequence (something like \emph or \cs_new:Npn) is stored as a single token. So most of the time it these can’t be divided up into their component characters: they act as a single item.

Token lists

The fact that TeX works with tokens means that most of the time we carry out operations on a token-by-token basis, rather than as strings. In LaTeX3  terminology, an arbitrary set of tokens is called a token list, and which of has both defined content and defined order. To get a better feel for how token lists work, we’ll apply a few basic token list functions to some simple input:

\documentclass{article}
\usepackage{expl3}
\ExplSyntaxOn
\cs_new:Npn \demo:n #1
  {
    \tl_count:n {#1} ;
    \tl_if_empty:nT {#1} { Empty! }
    \tl_if_blank:nTF {#1}
      { Blank! }
      {
        Head = \tl_head:n {#1} ;
        Tail = \tl_tail:n {#1} ;
        End
      }
  }
\cs_new_eq:NN \demo \demo:n
\ExplSyntaxOff
\newcommand*{\hello}{hello}
\begin{document}
\demo{Hello world}

\demo{ }

\demo{}

\demo{\hello}
\end{document}

Okay, what’s going on here? Well, as we saw

last time I’ve created a new function, in this case called \demo:n, which contains the code I want to use. In contrast to the last post, I’ve not used it directly but have instead used \cs_new_eq:NN to make a copy of this function but with a document-level name. This is a general LaTeX3 idea: the internals of your code should be defined separately from the interface (indeed, we’ll see later that there is a more formalised way of creating a document-level function). You can probably work out that \cs_new_eq:NN needs two arguments: the new function to create and the old one to copy. (For experienced TeX programmers, it will be no surprise that this is a wrapper around the \let primitive.) Moving on to what \demo:n is doing, the first thing to see is that I’ve defined it with one argument, agreeing with the :n part of its name. I’ve then done some simple tests on the argument. The first is \tl_count:n, which will count how many tokens are in the input and simply output the result. You’ll notice that it’s ignored the space in Hello world: it’s a common feature of TeX that spaces are often skipped over. You can also see the space-skipping behaviour in the line where I feed \demo a space: the result has a ‘length’ of zero. Also notice that as promised \hello is only a single token. (There is an experimental function in LaTeX3 to count the length of a token list including the spaces. Most of the time, we’ll actually want to ignore them so we won’t worry about that here!) We then have to conditionals, \tl_if_empty:nT and \tl_if_blank:nTF. First, we’ll look at what a conditional does in general, then at these two in particular. The LaTeX3 approach to conditionals is to accept either one or two arguments, which might read T, F or TF, so in general there are always three related functions:

\foo_if_something:nT
  \foo_if_something:nF
  \foo_if_something:nTF

The test is always the same for the three related versions, with the

T and F part tells us what code is used depending on the result of the test. So if we do a test and it’s true, the T code will be used if it’s there, and the F code will be skipped entirely, while if there is no T code then nothing happens. It’s of course the other way around when the test is false! So what’s happening with \tl_if_empty:nT and \tl_if_blank:nTF? In the first test, we only print { Empty! } if there is nothing at all in the argument to \demo:n. If the argument is no empty, then this test does nothing at all. On the other hand, the \tl_if_blank:nTF test will print { Blank! } if the argument is either entirely empty or is only made up of spaces (so it looks blank). However, if it’s not blank then we apply two more functions. The functions \tl_head:n and \tl_tail:n find the very first token and everything but the very first token, respectively. So \tl_head:n finds just the H of Hello world while \tl_tail:n finds ello world. I’ve only used them if the entire argument is not blank as they are not really designed to deal with cases where there is nothing to split up! You might wonder about the last test, where \demo{\hello} has Hello as the head part and nothing as the tail. That happens because what is tested here is \hello, a single token, which is then turned into the text we see by TeX during typesetting. That can be avoided, but at this stage we’ll not worry too much!

Programming LaTeX3: Creating functions

Teaching a programming language traditionally starts with a method to print ‘Hello World’. For programming LaTeX3, we can’t quite start there as

\documentclass{article}
\begin{document}
Hello world
\end{document}

will happily do that without needing any programming. So I’ll start by printing ‘Hello World’ lots of times!

Our first function

LaTeX3 has a built-in method for creating multiple copies of text, which we could use directly. However, that would mean using a code-level macro in the document itself, and so I’ll create a wrapper macro. For this first example, I’ll include all of the document:

\documentclass{article}
\usepackage{expl3}
\ExplSyntaxOn
\cs_new:Npn \SayHello #1
  { \prg_replicate:nn {#1} { Hello~World!~ } }
\ExplSyntaxOff
\begin{document}
\SayHello{100}
\end{document}

This will give you, as promised, 100 copies of ‘Hello World!’.

So what is going on here? As you might work out, I’ve defined a new command called \SayHello which prints as many copies of ‘Hello World!’ as requested. Later on we’ll see that this is usually not how I’d choose to create a ‘document command’, but for the moment I’ll pass over that point so we can get some basics established.

The structure of function names

Getting down to detail, I’ve introduced two LaTeX3 functions here: \cs_new:Npn and \prg_replicate:nn. As promised, these use : and _ as ‘letters’ in their names. But what do they do? As you might guess from the names, \cs_new:Npn is used to create a new control sequence, while \prg_replicate:nn makes lots of copies of something (it replicates stuff). The naming convention for LaTeX3 is that the first part of the name (\cs_… or \prg_…) refers to the module the function comes from. So \cs_new:Npn is from the module for control sequences, which we abbreviate as cs, while \prg_replicate:nn is from the general programming utilities module, which is abbreviated as prg. For programmers working outside of the LaTeX3 kernel, a module is probably going to be the same as a LaTeX2e package. So the module part of the name is used to divide up code into related blocks: each module should use a unique prefix, and I’ll tend to use \mypkg… for demonstration purposes.

Up to the :, the rest of the name is up to the programmer and should help you understand what a function does. So \cs_new:Npn tells us that the function makes new a control sequence, and so is pretty similar to LaTeX2e’s \newcommand. We can have multiple parts to the name divided by _ for ease of reading. For example \cs_new_nopar:Npn is available for creating new functions which will give an error if they pick up a \par: this is similar to \newcommand*. You can probably work out the analysis of \prg_replicate:nn yourself!

The part of the name after the : is perhaps one of the most confusing ideas for new LaTeX3 programmer, especially if they are used to other languages. It’s called the argument specification or signature of the function, and tells us about the number and type of arguments a function takes. If you have experience in other programming languages, you’re probably wondering why we include this information in the function name. As we’ll see as we look in more detail at LaTeX3, this approach works as it reflects how TeX works.

So what do the different letters mean? Each letter (usually) represents one argument for a function. So \prg_replicate:nn with two letters after the : needs two arguments. (For those of you who haven’t come across arguments before, something like \maketitle takes no arguments, \emph needs one argument, \setlength takes two arguments, and so on.) The letter itself then tells us about the type of argument: n means tokens in braces (a ‘normal’ argument).  In \cs_new:Npn, the n-type argument is the code which we are creating. An N means that the argument has to be a single token without any braces: in our current case this will be the name of the new function. The p is a bit more complicated: it means that the second argument here is a parameter specification. Here, we can use #1, #2, etc., to represent the arguments for the new function, in exactly the same way we do in the code. So when we use \SayHello, it will expect to find one argument, and will insert that into the place marked as #1 in the code part.

Analysis \prg_replicate:nn

The same analysis applies to \prg_replicate:nn, which we can now see needs two arguments, both in braces. The first one is the number of times to repeat, and the second argument is what to repeat. So in \SayHello the number of repetitions is supplied by the user (this will replace #1), but the text is fixed by the programmer.

The reference for finding out what functions are available, and what arguments they take, is interface3. I’ll only be covering a selection of what is available, so over time you’ll need to get familiar with the formal documentation to find out what you can do. If you take a look there, you’ll see that the first argument for \prg_replicate:nn is an integer expression. That means that we don’t have to use a number directly here, but can also use something that will result in a number once TeX has worked it out. That will carry through to our user function, so

\SayHello{ 10 - 3 + 4 }

will be valid input.

Functions or macros?

Experienced TeX programmers will probably be worried that I’m talking about ‘functions’ and not about ‘macros’. TeX is a macro expansion language, which means that when it reads \SayHello, it replaces it by the code we’ve defined as the meaning of \SayHello, then reads the start of the inserted code, replaces it as necessary and so on until it has something to typeset (such as a letter) or execute (a ‘primitive’). That means that programming TeX is very different from programming using true functions.

The LaTeX3 programming approach allows us to treat many macros as if they were functions, but there are places where we’ll need to think about macros being expanded. Throughout the LaTeX3 documentation, programming is described in terms of functions, and so I’ll stick to that approach. Bear in mind that underlying everything is a set of macros, and that this will show up from time to time.

Programming LaTeX3: The programming environment

In the previous post, I mentioned that programming LaTeX3 today really means programming using LaTeX3 ideas but on top of LaTeX2e. To do that, we are going to need to load the appropriate code, and then access the LaTeX3 programming environment. The exact detail depends on whether we are programming in the preamble of a LaTeX document or creating a package. I’ll look at both of these before taking a closer look at the LaTeX3 programming environment in general.  What you should notice is that the use of a separate programming environment very much separates out the process of creating code from creating documents: that is quite deliberate and is something that we’ll see again in the series.

In the preamble of a document

The LaTeX3 programming code usable with LaTeX2e is available as a package called expl3 (which for various reasons is distributed as part of l3kernel). This is loaded in the usual way

\documentclass{article}
\usepackage{expl3}

That loads the code, but does not get us into the programming environment. To do that, we need to use a couple of new macros

\ExplSyntaxOn
% Code goes here
\ExplSyntaxOff

In some ways, this is similar to the LaTeX2e \makeatletter … \makeatother idea, but as we’ll see it’s a bit more advanced.

In a LaTeX2e package

In exactly the same way as in a document, the first stage in using LaTeX3 programming in a package is to load the code.

\RequirePackage{expl3}

Once again, that loads the code but does not switch the syntax on. We could use \ExplSyntaxOn here, but for packages a more flexible alternative is to declare the package as being LaTeX3-based:

\ProvidesExplPackage
  {mypkg}               % Package name
  {2011-12-11}          % Release date
  {1.0}                 % Release version
  {Some things I wrote} % Description

This is a special version of the standard \ProvidesPackage macro, which will automatically turn on LaTeX3 programming syntax and more importantly turn it off at the end of the package. It also deals properly with nested package loading, and so is the recommended way to use LaTeX3 syntax inside LaTeX2e packages.

The coding environment

Whether you’re using LaTeX3 syntax in a document or a package, the basic ideas are the same. The first thing to notice is that white space (spaces, tabs and new lines) are ignored inside the programming environment. This means we can use it to lay out our code more clearly, but you might wonder how to actually include a space. This is handled by defining ~ as a ‘normal’ space, rather than as the usual non-breaking version.

The programming environment also makes it possible to use : and _ inside the names of commands, which are more formally called control sequences. TeX decides what is a valid control sequence name based on something called the category code of each character. I’ll be explaining more about category code as we go along, but for the moment the key is to understand that that a control sequence is \ followed either by exactly one non-‘letter’ or by one or more ‘letters’. Inside the code environment : and _ are treated as letters by TeX: this is the same idea as using @ as an extra ‘letter’ in LaTeX2e code.

Not only are : and _ available for use in control sequences but they are required by the conventions of LaTeX3 programming. In contrast to LaTeX2e’s sometimes haphazard use of @ in names, there are guidelines for applying both : and _ in LaTeX3 names. Rather than give a formal list now, I’ll bring in the system in the next couple of posts using some examples.

One difference between programming in a document and in a package is the status of @. LaTeX2e automatically makes it a letter in package code, but in a document this does not happen. LaTeX3 does not assign any special meaning to @, and so these difference are not affected by loading LaTeX3 support.

A standard document

As we’ll be needing the basics here for everything from now on, I’ll assume that you are using a short testing document for LaTeX3 programming:

\documentclass{article}
\usepackage{expl3}
\ExplSyntaxOn
% Code will go here
\ExplSyntaxOff
\begin{document}
\end{document}

Programming LaTeX3: Background

Before the series on programming LaTeX3 can really get started, it’s going to be important to establish some background, basic concepts and indeed what the aims are. So in this post I’m going cover some of these issues: we won’t be seeing any code just yet! The approach I’m aiming to take is to bring in concepts as they are needed: this may mean a few simplifications in the beginning to allow ideas to be developed.

LaTeX3: What is available now?

The very first thing to cover is what the current status of LaTeX3 is, and what the aim of this series is. Anyone following LaTeX3 development will know that at the moment it’s not ready for creating documents independent of LaTeX2e. What is available now is a programming layer: l3kernel. At the same time, one of the aims with LaTeX3 is to clearly separate out programming, design decisions and actually using LaTeX. So what I will be covering here is programming. At the same time, I’ll aim to highlight concepts which are not necessarily tied to LaTeX3 programming but which the LaTeX3 Project feel are part of the overall aims of LaTeX3 development.

The target audience

I have two distinct audiences in mind in writing this series. The first is experienced (La)TeX programmers who want to see what ideas LaTeX3 introduces. These people will be familiar with many basic TeX concepts, and will want to see the relationship between what they are used to and the ‘LaTeX3 way’. The second group is experience LaTeX2e users who want to learn to program LaTeX, and have decided to miss out learning to program TeX first. It’s important that the latter group are included: another key aim for LaTeX3 is to provide a complete set of documentation and support without having to say ‘read The TeXbook’ as a requirement to make progress.

What both of these groups have in common is lots of experience with LaTeX2e. So I’m going to expect familiarity with LaTeX2e’s user syntax, concepts and so on. So that will very much be the baseline: I do hope that the more experienced LaTeX programmers will bear with me.

Requirements

As I’ve indicated, programming LaTeX3 currently means works on top of LaTeX2e. So to get started you need a LaTeX2e installation, which for most people means either TeX Live or MiKTeX. Most of the code in the programming layer of LaTeX3 has been moving to a stable situation for some time, but there are refinements going on all of the time. As a result, I’ll be assuming that readers have the latest CTAN releases of l3kernel and l3packages installed. That can be done by downloading them from CTAN directly, or using the package managers in TeX Live 2011 or MiKTeX 2.9.

Programming LaTeX3: Introduction

Development of LaTeX3 has attracted interest from other TeX programmers for a while. One of the big barriers to new entrants is that programming LaTeX3 is distinct from programming LaTeX2e or plain TeX. So what is needed is a ‘Programming LaTeX3’ guide. The problem is getting one written: these things take time, and what to write is also something of a challenge.

To make a start on tackling this, I thought it would be useful to write a series of short blog posts, taking one area of LaTeX3 at a time and looking at it from the point of view of beginner in programming LaTeX3. The idea is that by keeping things short I can divide the problem into manageable chunks (both for readers and for me), and get feedback on each part before taking on the next one. If I make decent progress, I’ll then have some material to edit into something like an article for TUGBoat.

Now, to do a reasonable job I will have to cover some things I’ve looked at before: sorry if it turns out to be repetitive in places. I’m planning to start by looking at how you can actually start programming LaTeX3 today, covering the idea of ‘LaTeX3 in 2e’, for example. Then it will be on to the basics of the language, before we even get to creating any macros. Ideas for topics to cover are very welcome!