# Some TeX Developments

Coding in the TeX world

## Registering expl3 module

Namespace management in TeX is a co-operative affair: we all share one space, so conventions such as \my@clever@macro are important. For LaTeX2e work, this has always been done on a very informal basis: look around and find a space! For LaTeX3, it seems like a good idea to make things a little bit more ordered. We’ve therefore set up a simple flat-file prefix register, which will track all of the prefixes in use in expl3 code (provided people tell us, of course).

Written by Joseph Wright

November 4th, 2012 at 6:32 pm

Posted in LaTeX3

## Fixing problems the rapid way

The latest l3kernel update included a ‘breaking change’: something we know alters behaviour, but which is needed for the long term. Of course, despite the fact the team try to pick up what these things will break, we missed one, and there was an issue with lualatex-math as a result, which showed up for people using unicode-math (also reported on TeX-sx). Luckily, those packages all use GitHub, as does the LaTeX3 team, so it was easy to quickly fork the code and for me to create a fix. That’s the big advantage of having code available using one of the distributed version systems (GitHub and BitBucket are the two obvious places): sending in a fix is a two-minute job, even if it’s someone else’s project. So I’d encourage everyone developing open code to got to CTAN to consider using one of these services: it really does make fixing bugs easier. From report to fix and CTAN update in less than 24 h, which I’d say is pretty good!

Written by Joseph Wright

August 24th, 2012 at 9:27 am

Posted in LaTeX3

Tagged with , , ,

## LaTeX3 gets a new FPU

For a while now we’ve been promising to update the floating point functions in LaTeX3 to provide an expandable approach to calculations. We are now making good progress in swapping the old code for the new material. There are still a few things to be decided, but the general plan looks pretty clear.

The new code lets you parse floating point expressions in the same way you can for integers. So you can write

\fp_eval:n { 10 / 3  + 0.35 }

and get the output

3.683333333333333

without needing to worry about working with dimensions. Even more usefully, there are a range of built in mathematical functions, so things like

\fp_eval:n { sin ( pi / 3 ) }

gives

0.8660254037844386

as you’d want.

You might pick up from the above output that the new FPU works to a higher accuracy than the old one: 16 significant figures. That’s been done without a bad performance hit: great credit goes to Bruno Le Floch for the work on this. Of course, you don’t have to have 16 figure output: the code includes rounding functions and various display options, so for example

\fp_eval:n { round ( sin ( pi / 3 ) , 3 ) }

At the moment you’ll have to grab the latest development code to get this new version as part of expl3, but it will be in the next CTAN snapshot some time in June. You might also find that not every function you want implemented is available: for example, there are currently no inverse trigonometric functions. Those are all on the to do list, but there not needed for typesetting so may be a little while.

Written by Joseph Wright

May 15th, 2012 at 3:58 pm

Posted in LaTeX3

Tagged with

## Programming LaTeX3: More on expansion

In the last post, I looked at the idea of expandability, and how we can use x-type expansion to exhaustively expand an argument. I also said that there was more to this, and hinted at two other argument specifications, f- and o-type expansion. We need these because TeX is a macro-expansion language, and while LaTeX3 coding does hide some of this detail it certainly does not get rid of all of it. Both of these forms of expansion are somewhat specialised, but both are also necessary!

## Full (or forced) expansion

The f-type (‘full’ or ‘force’) expansion argument is in some ways similar to the x-type concept, as it’s about trying to expand as much as possible. So

\tl_set:Nn \l_tmpa_tl { foo }
\tl_set:Nn \l_tmpb_tl { \l_tmpa_tl }
\tl_set:Nx \l_tmpc_tl{ \l_tmpb_tl }
\tl_show:N \l_tmpc_tl

and

\tl_set:Nn \l_tmpa_tl { foo }
\tl_set:Nn \l_tmpb_tl { \l_tmpa_tl }
\tl_set:Nf \l_tmpc_tl{ \l_tmpb_tl }
\tl_show:N \l_tmpc_tl

give the same result: everything is expanded, and \l_tmpc_tl contains ‘foo’. There are two crucial differences, however. First, x-type variants are not expandable, even if their parent function was. On the other hand, f-type expansion is itself expandable. So something like

\cs_new:Npn \my_function:n { \tl_length:n {#1} }
\int_eval:n { \exp_args:Nf \my_function:n { \l_tmpa_tl } + 1 }

will work as we’d want: \l_tmpa_tl will be expanded, then processed by \my_function:n and the result will be evaluated as an integer. Try that with \exp_args:Nx and it will fail.

The second difference is what happens when we hit a non-expandable token. With x-type expansion, TeX will look at the next thing in the input, and so tries to expand everything in the input (hence ‘exhaustive’). On the other hand, f-type expansion stops when the first non-expandable token is found. So

\tl_set:Nn \l_tmpa_tl { foo }
\tl_set:Nn \l_tmpb_tl { bar }
\tl_set:Nf \l_tmpc_tl { \l_tmpa_tl \l_tmpb_tl }
\tl_show:N \l_tmpc_tl

will show

foo\l_tmpb_tl

That happens because the f in foo is not expandable. So f-type expansion stops before it gets to \l_tmpb_tl, while x-type expansion would keep going.

The second point is important as it means that some functions will not give the expected result if used inside an f-type expansion. We show this in the code documentation for LaTeX3: functions which fully expand inside both f- and x-type expansions are shown with a hollow star, while those that only work inside an x-type expansion are shown with a filled star.

As you might pick up, f-type expansion is somewhat specialised. It’s useful when creating expandable commands, but for non-expandable ones x-type expansion is usually more appropriate.

## Expanding just the once

As TeX is a macro expansion language, there are some tasks that are best carried out, or even only doable, using an exact number of expansions. To allow a single expansion, the argument specification o (‘once’) is available. To use this, you need to know what will happen after exactly one expansion. Functions which may be useful in this way have information about their expansion behaviour included in the documentation, while token list variables also expand to their content after exactly one expansion. Examples using o-type expansion tend to be low-level: perhaps the best example is dropping the first token from some input, so

\tl_set:No \l_tmpa_tl { \use_none:n tokens }
\tl_show:N \l_tmpa_tl

will show ‘okens’.

As with f-type expansion, expanding just once is something of a specialist tool, but one that is needed. It also completes the types of argument specification we can use, so we’re now in a position to do some more serious LaTeX3 programming.

Written by Joseph Wright

April 29th, 2012 at 12:24 pm

Posted in LaTeX3

Tagged with

## Programming LaTeX3: Expandability

with one comment

In the last part, I looked at integer expressions, and how they can be used to calculate integer values. What I did not do was say exactly what can go inside an integer expression. That’s because it links in to a wider concept, and one that is very familiar to TeX programmers: expandability.

## What is expandability?

To understand expandability, we need to think about what TeX does when we use functions. TeX is a macro expansion language, and as I’ve already said that means that LaTeX3 is too. When we use a function in a place where TeX can execute all of the built-in commands (‘primitives’), we don’t really need to worry about that too much. However, there are places where life is more complicated, as TeX will only execute some of the primitives. These places are ‘expansion contexts’. In these places, only some functions will work as expected, and so it’s important to know what will and will not work.

## LaTeX3 and expandability

For traditional TeX programmers, understanding expandability means knowing the rules that TeX applies to decide what can and cannot be expanded. For LaTeX3, life is different as the documentation includes details of what will and will not work. If you read the documentation, you will see that some functions are marked with a star: those are expandable. For the moment, we won’t worry about the non-starred functions, other than to note that we can’t use them when we need expandability.

So sticking with our starting point, integer expressions, if we look at a function like \int_eval:n, this leads to the conclusion that the argument can only contain

• Numbers, either given directly or stored inside variables
• The symbols +, -, *, \, ( and )
• Functions which are marked with a star in the LaTeX3 documentation, plus any arguments these themselves need.

That hopefully makes some sense: \int_eval:n is supposed to produce a number, and so what we put in should make sense for turning into a number. The same idea applies to all of the other integer expression functions we saw last time.

## Expansion in our own functions

Integer expression functions make a good example for expandability, but if that was the only area that expansion applied it would not be that significant. However, there are lots of other places where we want to carry out expansion, and this takes us back to the idea of the argument specification for functions. There are three expansion related argument specifiers: f, o and x. Here, I’m going to deal just with x-type expansion, and will talk about f- and o-type expansion next time!

So what is x-type expansion? It’s exhaustive: expanding everything until only non-expandable content remains. We’ve already seen the idea that for example \tl_put_right:NV is related to \tl_put_right:Nn, so that we can access the value of a variable. So it should not be too much of a leap to see a relationship between \tl_set:Nn and \tl_set:Nx. So when we do

\tl_set:Nn \l_tmpa_tl { foo }
\tl_set:Nn \l_tmpb_tl { \l_tmpa_tl }
\tl_set:Nx \l_tmpc_tl { \l_tmpb_tl }
\tl_show:N \l_tmpc_tl

TeX exhaustively expands: it expands \l_tmpb_tl, and finds \l_tmpa_tl, then expands \l_tmpa_tl to foo, then stops as letters are not expandable. Inside an x-type expansion, TeX just keeps going! So we don’t need to know much about the content we are expanding.

With a function such as \int_eval:n, the argument specification is just n, so you might wonder why. The reason is that x-type functions are (almost) always defined as a variant of an n-type parents. So functions that have to expand material (there is no choice) just have an n. (There are also a few things that \int_eval:n will expand that and x-type argument will not, but in general that’s not an issue as it only shows up if you make a mistake.)

## Functions that can be expanded

I’ve said that functions that can be expanded fully are marked with a star in the documentation, but how do you make your own functions that can be expanded? It’s not too complicated: a function is expandable if it uses only expandable functions. So if all of the functions you use are marked with a star, then yours would be too.

There is a bit more to this, though. We have two (common) ways of creating new functions

• \cs_new:Npn
• \cs_new_protected:Npn

The difference is expandability. Protected functions are not expandable, whereas ones created with \cs_new:Npn should be. So the rule is to use \cs_new_protected:Npn unless you are sure that your function is expandable.

## LaTeX2e hackers note: What about \protected@edef?

Experienced TeX hackers will recognise that x-type expansion is build around TeX’s \edef primitive. If you’ve worked a lot with LaTeX2e, you’ll know that with any user input you should use \protected@edef, rather than \edef, so that LaTeX2e robust commands are handled safely.

If you are working with LaTeX2e input, you’ll still need to use \protected@edef to be safe when expanding arbitrary user input. All LaTeX3 functions are either fully-expandable or engine-protected, so don’t need the LaTeX2e mechanism, but of course that is not true for LaTeX2e commands.

Written by Joseph Wright

April 21st, 2012 at 8:02 pm

Posted in LaTeX3

Tagged with

## A LaTeX format beyond LaTeX2e

The question of why LaTeX3 development is not focussed on LuaTeX came up yesterday on the TeX-sx site. I’ve added an answer there covering some of the issues, but I thought that something a bit more open-ended might also be useful on the same topic.

Before I look at the approaches that are available, it’s worth asking why a format is needed beyond LaTeX2e. There are a few reasons I feel it’s needed, but a few stand out.

The first, strangely, is stability. LaTeX2e is stable: there will be no changes other than bug fixes. That means that a document written 10 or more years ago should still give the same output when typeset today. That sounds great, but there is an issue here. While the kernel is stable, packages are not, and the limitations of the kernel mean that there are a lot of packages. So for a lot of real documents, stability in the kernel does not mean that they will still work after many years, at least without some effort. So we need a kernel which provides a lot more of the basics, and perhaps new approaches to providing stable code.

Secondly, and related, is the fact that most real documents need a lot of packages, and that is a barrier to new users. Again, stability is great but not if it means we don’t continue to attract new people to the LaTeX world. I think that the LaTeX approach is a good one, so that is important to me. So I feel that we need a format which works well and provides a lot more functionality as standard.

Thirdly, there are some fundamental issues which are hard to address, such as inter-paragraph spacing, the placement of floats and better separation of design from input. There all need big changes in LaTeX, and it’s not realistic to hope to bolt such changes on to LaTeX2e and have everything continue to work.

All of that tells me we need a new kernel. So the question is how to achieve that. There are at least four programming approaches I’ve thought about.

Two are closely related: stick with TeX macro programming and cross-engine working, but make things more systematic. Perhaps the simplest way to do this is to adopt an approach similar to the etoolbox package, and to essentially add to the structures already available. The more radical approach in the same area is to do what the LaTeX3 Project have to date, and define a new programming language from the ground up using TeX macros.  There are arguments in favour of both of these approaches: I’ve done some experiments with a more etoolbox-like method for creating a format. My take here is that if you really want something more systematic than LaTeX2e then you do have to go to something like the LaTeX3 method: dealing with expansion with names like \csletcs gets too unwieldy as you try to construct an entire format.

Moving to a LuaTeX-only solution, and doing a lot of the programming in Lua, is the method that the ConTeXt team has decided on. This brings in a proper programming language without any direct effort, but leaves open some issues Using Lua does not automatically solve the challenges in writing a better format, and using LuaTeX does not mean not that there is no TeX programming to do. So a LuaTeX-only approach would still need some TeX work.

Finally, there is the argument for parsing LaTeX-like input in an entirely new way. In this model, you don’t use TeX at all to read the user’s input: that’s done by another language, and TeX is only involved at all when you do the typesetting. That sound challenging, and the big issue here is finding someone who has the necessary programming skills (I certainly do not).

Of the four approaches, it seems to me that from where we are now, the LaTeX3 approach is not so bad. If you were starting today with no code at all, and not background in programming expl3 or Lua, you might pick the LuaTeX method. That’s not, however, where we are: there is experience of expl3 available, and there is also code written (but in need of revision). Of course, the proof of that will be in delivering a working LaTeX3 format: on that, back to work!

Written by Joseph Wright

February 21st, 2012 at 9:17 am

Posted in LaTeX,LaTeX3

Tagged with ,

## Programming LaTeX3: Integers and integer expressions

with one comment

In the last entry, I talked about token list variables. As we’ve seen, these can be used to hold basically anything, but at the cost that there is no internal structure. I’ve also hinted that LaTeX3 provides a number of richer data types. One that we will need sooner rather than later is the int type for storing integers. At the same time, we can look more widely at what are called integer expression: calculations which work with whole numbers.

## Storing integers

Based on what we have already seen with token lists, it should be no surprise that we can create and set int variables with function names you might be able to guess:

\int_new:N \l_my_a_int
\int_set:Nn \l_my_a_int { 1 + 1 }
\int_show:N \l_my_a_int % => '2'

Creating and setting the variable should seem easy enough here, but you might wonder about the result of showing the content here: it’s not what we put in. That’s because LaTeX3 treats the second argument of \int_set:Nn as an integer expression: something to be evaluated to give an integer.

## Integer expressions

All LaTeX3 functions which work with integers are set up to evaluate integer expressions, so it’s important to understand what they do. Expressions can use the standard arithmetic operations +, -, * (times) and /, plus parentheses. There are also some functions available for additional more complicated mathematical operations (for example \int_mod:nn to calculate the remainder on division).

More significantly, we can include other functions which themselves yield integers. For example, we’ve seen that it’s possible to work out the length of a token list, which is an integer:

\int_set:Nn \l_my_a_int { \tl_length:n { Hello } * 2 } % => 10

We can’t use any function here: there are some restrictions. Clearly we need to get an integer out, but the functions also need to be expandable: that will be the topic of the next post!

## Integer conditionals

A key use of integers is in conditionals. Earlier, we saw that conditionals in LaTeX3 are defined so that we have distinct true and false branches to follow. That applies to integer conditionals in exactly the same way as anything else

\int_new:N \l_my_b_int
\int_set:Nn \l_my_b_int { 7 }
\int_compare:nTF { 1 = \l_my_a_int }
{ TRUE }
{ FALSE }
\int_compare:nNnTF { \l_my_a_int } = { \l_my_b_int }
{ TRUE }
{ FALSE }

You might wonder what is going on here: there are two different conditionals, both of which do a comparison. Well, there are two types of integer conditionals. The first type works out where the comparator is, and so only requires three arguments. The second type has to be given the two integer expressions to compare separately. It’s a bit more awkward to read, but the latter version is faster (it’s closer to the underlying TeX). You can pick whichever one you prefer: as I work on low-level code, I go for speed!

Closely related to conditionals are loops, and again these come pre-defined.

\int_zero:N \l_my_a_int % Hopefully obvious!
\int_while_do:nn { \l_my_a_int < 10 }
{
\int_use:N \l_my_a_int \\
\int_incr:N \l_my_a_int
}

Hopefully most of this code is clear: we zero the counter, then loop until it reaches 10. For each loop, I’ve printed (used) the value directly, then incremented it by one. (There are a whole family of these functions, with do_while in addition to while_do and nNn versions as for conditionals.)

## Integer expressions beyond \int_ functions

Integer expressions are not limited to \int_ functions. Indeed, we’ve already seen one in \prg_replicate:nn. This illustrates a general point: anywhere that LaTeX3 expects an integer, it’s coded to accept integer expressions.

One function that I can’t miss out here is \int_eval:n, which just works out the value of the expression and leaves it in the input. It underlies a lot of the higher-level use of integer expressions, and we are certain to meet it later.

Written by Joseph Wright

February 7th, 2012 at 8:07 pm

Posted in LaTeX3

Tagged with

## Programming LaTeX3: More on token list variables

In my previous post, I introduced the idea of a token list variable, the LaTeX3 term for a macro used to store ‘stuff’. Token list variables (tl vars) are the basis of many of the higher level data types in LaTeX3, and they also have arbitrary contents. As a result, there are a lot of generic functions to do things with tl vars.

A very common thing to do with stored material is either to add to it, which we can do either on the left or the right. The LaTeX3 functions to do this are called \tl_put_left:Nn and \tl_put_right:Nn (and so on), which makes it easy to build up complicated material quite quickly. So

\tl_new:N \l_my_a_tl
\tl_set:Nn \l_my_a_tl { stuff }
\tl_put_right:Nn \l_my_a_tl { ~here }
\tl_put_left:Nn \l_my_a_tl { My~ }
\tl_use:N \l_my_a_tl

will print ‘My stuff here’.

That’s easy enough to do without LaTeX3 coding, but find-and-replace is a bit more involved. So the functions \tl_replace_once:Nnn and \tl_replace_all:Nnn are working a little harder:

\tl_set:Nn \l_my_a_tl { stuff~to~change }
\tl_replace_once:Nnn \l_my_a_tl { change } { alter }
\tl_use:N \l_my_a_tl % 'stuff to alter'
\tl_replace_all:Nnn \l_my_a_tl { t } { q }
\tl_use:N \l_my_a_tl % 'squff to alqer'

## Adding one tl var to another

So far, I’ve added literal input to tl vars. That’s useful, but a very common task to to combine two or more variables together. To do that, we need a way to access the content of a variable. First, what doesn’t work is doing

\tl_new:N \l_my_b_tl
\tl_set:Nn \l_my_a_tl { stuff }
\tl_set:Nn \l_my_b_tl { ~more~stuff }
\tl_put_right:Nn \l_my_a_tl { \l_my_b_tl }

as what ends up inside \l_my_a_tl is stuff\l_my_b_tl.

This is where LaTeX3′s expansion control comes into play. So far, we’ve seen arguments of type N and n, but there are others. There are a number of other types, but I want here to introduce just one one: V. A V-type argument will pass the value of a variable, rather than its name. So the correct way to add the content of one token list variable to another is

\tl_set:Nn \l_my_a_tl { stuff }
\tl_set:Nn \l_my_b_tl { ~more~stuff }
\tl_put_right:NV \l_my_a_tl \l_my_b_tl

which results in \l_my_a_tl containing stuff more stuff.

Now, the LaTeX3 kernel does not provide every possible combination of argument types (although it does provide \tl_put_right:NV). That’s not a problem, as they can easily be created:

\cs_generate_variant:Nn \tl_put_right:Nn { NV }

This is a ‘soft’ process: if the variant requested already exists, nothing happens, but otherwise the variant is created. So provided the base function exists, you can always create any variants you need.

## Mappings

Another key idea when working with tl vars is the ability to map to each token they contain. For that, there are again a couple of useful functions, \tl_map_function:NN and \tl_map_inline:Nn. The two differ mainly in expandability, a concept we’ve not covered just yet! I’ll be coming back to that in a later post, so for the moment I’ll just use \tl_map_inline:Nn.

What does a mapping do? Try

\tl_set:Nn \l_my_a_tl { stuff }
\tl_map_inline:Nn \l_my_a_tl { I~saw~'#1'. \\ }

and you should get a listing of each separate token in the tl var:

I saw ‘s’.
I saw ‘t’.
I saw ‘u’.
I saw ‘f’.
I saw ‘f’.

As you can hopefully see, within the second argument of \tl_map_inline:Nn the place holder #1 is used to insert a single token from the tl var. For a more complicated tl var

\tl_set:Nn \l_my_a_tl { { stuff } ~ { which } ~ is ~ { complicated } }
\tl_map_inline:Nn \l_my_a_tl { I~saw~'#1'. \\ }

we get

I saw ‘stuff’.
I saw ‘which’.
I saw ‘i’.
I saw ‘s’.
I saw ‘complicated’.

So you’ll see that spaces are ignored by the mapping, and that a brace group counts as a single item.

I’ve not covered every token list and token list variable function, but hopefully the basic concepts are now laid out. In the next post, I’ll move on to some other concepts, so that we can being to put more structures together.

Written by Joseph Wright

January 22nd, 2012 at 10:34 am

Posted in LaTeX3

Tagged with

## Programming LaTeX3: Token list variables

with one comment

In the last post, I talked about the concept of a token list and some general functions which act on token lists. That’s fine if you just want to take some input and ‘do stuff’, but a very common requirement when programming is storing input, and for that we need variables. LaTeX3 provides a number of different types of variable: we’ll start with perhaps the most general of all, the token list variable.

## Token list variables

So what is a token list variable (‘tl’)? You might well guess from the name that its a way of storing a token list! As such, a tl can be used to hold just about anything, and indeed this means that several of the other variable types we’ll meet later are tls with a special internal structure.

Before we can save anything in a tl, we need to create the variable: this is a general principle of programming LaTeX3. We can then store something inside the variable by setting it:

\tl_new:N \l_mypkg_name_tl
\tl_set:Nn \l_mypkg_name_tl { Fred }

Hopefully, the analysis of this code is not too hard. First, \tl_new:N creates a new token list variable which I’ve called \l_mypkg_name_tl. (I’ll explain how the naming works in a little while.) The second line will set the new tl to contain the text Fred. Assuming that the surrounding code has done nothing strange, we’ve stored four letter tokens in \l_mypkg_name_tl.

As I said, a tl can contain anything: we are not limited to letters. So

\tl_new:N \l_mypkg_other_tl
\tl_set:Nn \l_mypkg_other_tl { \ERROR ^ _ # $! } is also perfectly-valid for the content of a token list variable (although whether we’ll be able to use it safely is a different matter). ## Variable naming and TeX’s grouping system From the earlier discussion of the way that functions are named in LaTeX3, it might be obvious that there is also a system to how variables are named. Skipping over the initial \l_, what we’ve got is a module name (mypkg), some further description of the nature of the variable (in this case name), and finally the variable type (tl), divided up by _ in exactly the same way we did for functions. We’ll see that other variables follow the same scheme. So what’s the leading \l_ about? This tells us about the scope that we should use when setting the variable. As TeX is a macro expansion language, variables are not local to functions. However, they can be local to TeX groups, which are created in LaTeX3 using \group_begin: % Code here \group_end:  Setting a variable locally means that any changes stay within a group \tl_new:N \l_mypkg_name_tl \tl_set:Nn \l_mypkg_name_tl { Fred } \group_begin: \tl_set:Nn \l_mypkg_name_tl { Ginger } \group_end: % \l_mypkg_name_tl reverts to 'Fred'  On the other hand, we sometimes need global variables which ignore any groups \tl_new:N \g_mypkg_name_tl \tl_gset:Nn \g_mypkg_name_tl { Fred } \group_begin: \tl_gset:Nn \g_mypkg_name_tl { Ginger } \group_end: % \g_mypkg_name_tl still 'Ginger'  So the \l_ or \g_ tells you what scope the variable contents have, and whether you should set or gset it. (You can probably work out that gset means ‘set globally’.) ## Using the content of token list variables Okay, putting stuff into token list variables is all very well and good, but unless we can do something with the content then it’s not really that useful. Of course, we can do things with the content of variables. The most basic thing to do is simply to insert the content of the tl into the input that TeX is working with \tl_use:N \l_mypkg_name_tl  That’s very handy, but we can also examine the content of a token list variable. For example, we saw before that \tl_length:n will produce the length of a token list, and we can do the same for a token list variable using \tl_length:N. \tl_set:Nn \l_mypkg_name_tl { Fred } \tl_length:N \l_mypkg_name_tl % '4'  There’s a lot more we can do with token list variables, but this post is already long enough, so I’ll come back to more that we can do with them in the next post. ## Tips for TeX programmers: the internals of token list variable Experienced TeX programmers are probably wondered about token list variables, and in particular exactly what the underlying TeX structure is. A tl is just a macro that we are using as a variable rather than function. That should not be too much of a surprise, as storing tokens in macros is very much basic TeX programming. So \tl_set:Nn is almost the same as the \def primitive. What might worry you slightly is that I said \tl_new:N \l_mypkg_other_tl \tl_set:Nn \l_mypkg_other_tl { \ERROR ^ _ #$ ! }

will work. That won’t work with \def, and you’d normally expect to need a token register (toks) for this. However, we don’t use toks for LaTeX3 programming at all, and that’s because we require e-TeX. So

\tl_set:Nn \l_mypkg_other_tl { \ERROR ^ _ # $! } is actually the same as \edef \l_mypkg_other_tl { \unexpanded { \ERROR ^ _ #$ ! } }

which will allow us to put any tokens inside a macro.

The other thing you might notice is that I’ve said that tls have to be declared, even though at a TeX level this is not the case. This is a principle of good LaTeX3 programming, and although it’s not enforced as standard any non-declared token list variables are coding errors. You can test for this using

\usepackage[check-declarations]{expl3}

which uses some slow checking code to make sure that all variables are declared before they are used.

Written by Joseph Wright

December 26th, 2011 at 12:19 pm

Posted in LaTeX3

Tagged with

## Programming LaTeX3: Category codes, tokens and token lists

Understanding LaTeX3 programming relies on understanding TeX concepts, and one we need to get to grips with is how TeX deals with tokens. Experienced TeX programmers will probably find the first part of this post very straight-forward, so might want to skim read the start!

## Category codes and tokens

When TeX reads input, it is not only the characters that are there that are important. Each character has an associated category code: a way to interpret that character. The combination of a character and it’s category code then sets how TeX will deal with the input. For example, when TeX read ‘a’ it finds that it’s (normally) a letter, and so tokenizes the input as ‘a, letter’. This seems pretty obvious: ‘a’ is a letter, after all. But this is not fixed, at least for TeX. I’ve already mentioned that within the LaTeX3 programming environment : and _ can be part of function names: that’s because they are ‘letters’ while we are programming! What’s of most importance now is that a control sequence (something like \emph or \cs_new:Npn) is stored as a single token. So most of the time it these can’t be divided up into their component characters: they act as a single item.

## Token lists

The fact that TeX works with tokens means that most of the time we carry out operations on a token-by-token basis, rather than as strings. In LaTeX3  terminology, an arbitrary set of tokens is called a token list, and which of has both defined content and defined order. To get a better feel for how token lists work, we’ll apply a few basic token list functions to some simple input:

\documentclass{article}
\usepackage{expl3}
\ExplSyntaxOn
\cs_new:Npn \demo:n #1
{
\tl_count:n {#1} ;
\tl_if_empty:nT {#1} { Empty! }
\tl_if_blank:nTF {#1}
{ Blank! }
{
Tail = \tl_tail:n {#1} ;
End
}
}
\cs_new_eq:NN \demo \demo:n
\ExplSyntaxOff
\newcommand*{\hello}{hello}
\begin{document}
\demo{Hello world}

\demo{ }

\demo{}

\demo{\hello}
\end{document}

Okay, what’s going on here? Well, as we saw

last time I’ve created a new function, in this case called \demo:n, which contains the code I want to use. In contrast to the last post, I’ve not used it directly but have instead used \cs_new_eq:NN to make a copy of this function but with a document-level name. This is a general LaTeX3 idea: the internals of your code should be defined separately from the interface (indeed, we’ll see later that there is a more formalised way of creating a document-level function). You can probably work out that \cs_new_eq:NN needs two arguments: the new function to create and the old one to copy. (For experienced TeX programmers, it will be no surprise that this is a wrapper around the \let primitive.) Moving on to what \demo:n is doing, the first thing to see is that I’ve defined it with one argument, agreeing with the :n part of its name. I’ve then done some simple tests on the argument. The first is \tl_count:n, which will count how many tokens are in the input and simply output the result. You’ll notice that it’s ignored the space in Hello world: it’s a common feature of TeX that spaces are often skipped over. You can also see the space-skipping behaviour in the line where I feed \demo a space: the result has a ‘length’ of zero. Also notice that as promised \hello is only a single token. (There is an experimental function in LaTeX3 to count the length of a token list including the spaces. Most of the time, we’ll actually want to ignore them so we won’t worry about that here!) We then have to conditionals, \tl_if_empty:nT and \tl_if_blank:nTF. First, we’ll look at what a conditional does in general, then at these two in particular. The LaTeX3 approach to conditionals is to accept either one or two arguments, which might read T, F or TF, so in general there are always three related functions:

\foo_if_something:nT
\foo_if_something:nF
\foo_if_something:nTF

The test is always the same for the three related versions, with the

T and F part tells us what code is used depending on the result of the test. So if we do a test and it’s true, the T code will be used if it’s there, and the F code will be skipped entirely, while if there is no T code then nothing happens. It’s of course the other way around when the test is false! So what’s happening with \tl_if_empty:nT and \tl_if_blank:nTF? In the first test, we only print { Empty! } if there is nothing at all in the argument to \demo:n. If the argument is no empty, then this test does nothing at all. On the other hand, the \tl_if_blank:nTF test will print { Blank! } if the argument is either entirely empty or is only made up of spaces (so it looks blank). However, if it’s not blank then we apply two more functions. The functions \tl_head:n and \tl_tail:n find the very first token and everything but the very first token, respectively. So \tl_head:n finds just the H of Hello world while \tl_tail:n finds ello world. I’ve only used them if the entire argument is not blank as they are not really designed to deal with cases where there is nothing to split up! You might wonder about the last test, where \demo{\hello} has Hello as the head part and nothing as the tail. That happens because what is tested here is \hello, a single token, which is then turned into the text we see by TeX during typesetting. That can be avoided, but at this stage we’ll not worry too much!

Written by Joseph Wright

December 21st, 2011 at 11:01 pm

Posted in LaTeX3

Tagged with