Some TeX Developments

Coding in the TeX world

Archive for the ‘LaTeX3’ Category

A LaTeX format beyond LaTeX2e

with 3 comments

The question of why LaTeX3 development is not focussed on LuaTeX came up yesterday on the TeX-sx site. I’ve added an answer there covering some of the issues, but I thought that something a bit more open-ended might also be useful on the same topic.

Before I look at the approaches that are available, it’s worth asking why a format is needed beyond LaTeX2e. There are a few reasons I feel it’s needed, but a few stand out.

The first, strangely, is stability. LaTeX2e is stable: there will be no changes other than bug fixes. That means that a document written 10 or more years ago should still give the same output when typeset today. That sounds great, but there is an issue here. While the kernel is stable, packages are not, and the limitations of the kernel mean that there are a lot of packages. So for a lot of real documents, stability in the kernel does not mean that they will still work after many years, at least without some effort. So we need a kernel which provides a lot more of the basics, and perhaps new approaches to providing stable code.

Secondly, and related, is the fact that most real documents need a lot of packages, and that is a barrier to new users. Again, stability is great but not if it means we don’t continue to attract new people to the LaTeX world. I think that the LaTeX approach is a good one, so that is important to me. So I feel that we need a format which works well and provides a lot more functionality as standard.

Thirdly, there are some fundamental issues which are hard to address, such as inter-paragraph spacing, the placement of floats and better separation of design from input. There all need big changes in LaTeX, and it’s not realistic to hope to bolt such changes on to LaTeX2e and have everything continue to work.

All of that tells me we need a new kernel. So the question is how to achieve that. There are at least four programming approaches I’ve thought about.

Two are closely related: stick with TeX macro programming and cross-engine working, but make things more systematic. Perhaps the simplest way to do this is to adopt an approach similar to the etoolbox package, and to essentially add to the structures already available. The more radical approach in the same area is to do what the LaTeX3 Project have to date, and define a new programming language from the ground up using TeX macros.  There are arguments in favour of both of these approaches: I’ve done some experiments with a more etoolbox-like method for creating a format. My take here is that if you really want something more systematic than LaTeX2e then you do have to go to something like the LaTeX3 method: dealing with expansion with names like \csletcs gets too unwieldy as you try to construct an entire format.

Moving to a LuaTeX-only solution, and doing a lot of the programming in Lua, is the method that the ConTeXt team has decided on. This brings in a proper programming language without any direct effort, but leaves open some issues Using Lua does not automatically solve the challenges in writing a better format, and using LuaTeX does not mean not that there is no TeX programming to do. So a LuaTeX-only approach would still need some TeX work.

Finally, there is the argument for parsing LaTeX-like input in an entirely new way. In this model, you don’t use TeX at all to read the user’s input: that’s done by another language, and TeX is only involved at all when you do the typesetting. That sound challenging, and the big issue here is finding someone who has the necessary programming skills (I certainly do not).

Of the four approaches, it seems to me that from where we are now, the LaTeX3 approach is not so bad. If you were starting today with no code at all, and not background in programming expl3 or Lua, you might pick the LuaTeX method. That’s not, however, where we are: there is experience of expl3 available, and there is also code written (but in need of revision). Of course, the proof of that will be in delivering a working LaTeX3 format: on that, back to work!

Written by Joseph Wright

February 21st, 2012 at 9:17 am

Posted in LaTeX,LaTeX3

Tagged with ,

Programming LaTeX3: Integers and integer expressions

without comments

In the last entry, I talked about token list variables. As we’ve seen, these can be used to hold basically anything, but at the cost that there is no internal structure. I’ve also hinted that LaTeX3 provides a number of richer data types. One that we will need sooner rather than later is the int type for storing integers. At the same time, we can look more widely at what are called integer expression: calculations which work with whole numbers.

Storing integers

Based on what we have already seen with token lists, it should be no surprise that we can create and set int variables with function names you might be able to guess:

\int_new:N \l_my_a_int
\int_set:Nn \l_my_a_int { 1 + 1 }
\int_show:N \l_my_a_int % => '2'

Creating and setting the variable should seem easy enough here, but you might wonder about the result of showing the content here: it’s not what we put in. That’s because LaTeX3 treats the second argument of \int_set:Nn as an integer expression: something to be evaluated to give an integer.

Integer expressions

All LaTeX3 functions which work with integers are set up to evaluate integer expressions, so it’s important to understand what they do. Expressions can use the standard arithmetic operations +, -, * (times) and /, plus parentheses. There are also some functions available for additional more complicated mathematical operations (for example \int_mod:nn to calculate the remainder on division).

More significantly, we can include other functions which themselves yield integers. For example, we’ve seen that it’s possible to work out the length of a token list, which is an integer:

\int_set:Nn \l_my_a_int { \tl_length:n { Hello } * 2 } % => 10

We can’t use any function here: there are some restrictions. Clearly we need to get an integer out, but the functions also need to be expandable: that will be the topic of the next post!

Integer conditionals

A key use of integers is in conditionals. Earlier, we saw that conditionals in LaTeX3 are defined so that we have distinct true and false branches to follow. That applies to integer conditionals in exactly the same way as anything else

\int_new:N \l_my_b_int
\int_set:Nn \l_my_b_int { 7 }
\int_compare:nTF { 1 = \l_my_a_int }
  { TRUE }
  { FALSE }
\int_compare:nNnTF { \l_my_a_int } = { \l_my_b_int }
  { TRUE }
  { FALSE }

You might wonder what is going on here: there are two different conditionals, both of which do a comparison. Well, there are two types of integer conditionals. The first type works out where the comparator is, and so only requires three arguments. The second type has to be given the two integer expressions to compare separately. It’s a bit more awkward to read, but the latter version is faster (it’s closer to the underlying TeX). You can pick whichever one you prefer: as I work on low-level code, I go for speed!

Closely related to conditionals are loops, and again these come pre-defined.

\int_zero:N \l_my_a_int % Hopefully obvious!
\int_while_do:nn { \l_my_a_int < 10 }
  {
    \int_use:N \l_my_a_int \\
    \int_incr:N \l_my_a_int
  }

Hopefully most of this code is clear: we zero the counter, then loop until it reaches 10. For each loop, I’ve printed (used) the value directly, then incremented it by one. (There are a whole family of these functions, with do_while in addition to while_do and nNn versions as for conditionals.)

Integer expressions beyond \int_ functions

Integer expressions are not limited to \int_ functions. Indeed, we’ve already seen one in \prg_replicate:nn. This illustrates a general point: anywhere that LaTeX3 expects an integer, it’s coded to accept integer expressions.

One function that I can’t miss out here is \int_eval:n, which just works out the value of the expression and leaves it in the input. It underlies a lot of the higher-level use of integer expressions, and we are certain to meet it later.

Written by Joseph Wright

February 7th, 2012 at 8:07 pm

Posted in LaTeX3

Tagged with

Programming LaTeX3: More on token list variables

with 3 comments

In my previous post, I introduced the idea of a token list variable, the LaTeX3 term for a macro used to store ‘stuff’. Token list variables (tl vars) are the basis of many of the higher level data types in LaTeX3, and they also have arbitrary contents. As a result, there are a lot of generic functions to do things with tl vars.

Adding content, changing content

A very common thing to do with stored material is either to add to it, which we can do either on the left or the right. The LaTeX3 functions to do this are called \tl_put_left:Nn and \tl_put_right:Nn (and so on), which makes it easy to build up complicated material quite quickly. So

\tl_new:N \l_my_a_tl
\tl_set:Nn \l_my_a_tl { stuff }
\tl_put_right:Nn \l_my_a_tl { ~here }
\tl_put_left:Nn \l_my_a_tl { My~ }
\tl_use:N \l_my_a_tl

will print ‘My stuff here’.

That’s easy enough to do without LaTeX3 coding, but find-and-replace is a bit more involved. So the functions \tl_replace_once:Nnn and \tl_replace_all:Nnn are working a little harder:

\tl_set:Nn \l_my_a_tl { stuff~to~change }
\tl_replace_once:Nnn \l_my_a_tl { change } { alter }
\tl_use:N \l_my_a_tl % 'stuff to alter'
\tl_replace_all:Nnn \l_my_a_tl { t } { q }
\tl_use:N \l_my_a_tl % 'squff to alqer'

Adding one tl var to another

So far, I’ve added literal input to tl vars. That’s useful, but a very common task to to combine two or more variables together. To do that, we need a way to access the content of a variable. First, what doesn’t work is doing

\tl_new:N \l_my_b_tl
\tl_set:Nn \l_my_a_tl { stuff }
\tl_set:Nn \l_my_b_tl { ~more~stuff }
\tl_put_right:Nn \l_my_a_tl { \l_my_b_tl }

as what ends up inside \l_my_a_tl is stuff\l_my_b_tl.

This is where LaTeX3′s expansion control comes into play. So far, we’ve seen arguments of type N and n, but there are others. There are a number of other types, but I want here to introduce just one one: V. A V-type argument will pass the value of a variable, rather than its name. So the correct way to add the content of one token list variable to another is

\tl_set:Nn \l_my_a_tl { stuff }
\tl_set:Nn \l_my_b_tl { ~more~stuff }
\tl_put_right:NV \l_my_a_tl \l_my_b_tl

which results in \l_my_a_tl containing stuff more stuff.

Now, the LaTeX3 kernel does not provide every possible combination of argument types (although it does provide \tl_put_right:NV). That’s not a problem, as they can easily be created:

\cs_generate_variant:Nn \tl_put_right:Nn { NV }

This is a ‘soft’ process: if the variant requested already exists, nothing happens, but otherwise the variant is created. So provided the base function exists, you can always create any variants you need.

Mappings

Another key idea when working with tl vars is the ability to map to each token they contain. For that, there are again a couple of useful functions, \tl_map_function:NN and \tl_map_inline:Nn. The two differ mainly in expandability, a concept we’ve not covered just yet! I’ll be coming back to that in a later post, so for the moment I’ll just use \tl_map_inline:Nn.

What does a mapping do? Try

\tl_set:Nn \l_my_a_tl { stuff }
\tl_map_inline:Nn \l_my_a_tl { I~saw~'#1'. \\ }

and you should get a listing of each separate token in the tl var:

I saw ‘s’.
I saw ‘t’.
I saw ‘u’.
I saw ‘f’.
I saw ‘f’.

As you can hopefully see, within the second argument of \tl_map_inline:Nn the place holder #1 is used to insert a single token from the tl var. For a more complicated tl var

\tl_set:Nn \l_my_a_tl { { stuff } ~ { which } ~ is ~ { complicated } }
\tl_map_inline:Nn \l_my_a_tl { I~saw~'#1'. \\ }

we get

I saw ‘stuff’.
I saw ‘which’.
I saw ‘i’.
I saw ‘s’.
I saw ‘complicated’.

So you’ll see that spaces are ignored by the mapping, and that a brace group counts as a single item.

I’ve not covered every token list and token list variable function, but hopefully the basic concepts are now laid out. In the next post, I’ll move on to some other concepts, so that we can being to put more structures together.

Written by Joseph Wright

January 22nd, 2012 at 10:34 am

Posted in LaTeX3

Tagged with

Programming LaTeX3: Token list variables

with one comment

In the last post, I talked about the concept of a token list and some general functions which act on token lists. That’s fine if you just want to take some input and ‘do stuff’, but a very common requirement when programming is storing input, and for that we need variables. LaTeX3 provides a number of different types of variable: we’ll start with perhaps the most general of all, the token list variable.

Token list variables

So what is a token list variable (‘tl’)? You might well guess from the name that its a way of storing a token list! As such, a tl can be used to hold just about anything, and indeed this means that several of the other variable types we’ll meet later are tls with a special internal structure.

Before we can save anything in a tl, we need to create the variable: this is a general principle of programming LaTeX3. We can then store something inside the variable by setting it:

\tl_new:N \l_mypkg_name_tl
\tl_set:Nn \l_mypkg_name_tl { Fred }

Hopefully, the analysis of this code is not too hard. First, \tl_new:N creates a new token list variable which I’ve called \l_mypkg_name_tl. (I’ll explain how the naming works in a little while.) The second line will set the new tl to contain the text Fred. Assuming that the surrounding code has done nothing strange, we’ve stored four letter tokens in \l_mypkg_name_tl.

As I said, a tl can contain anything: we are not limited to letters. So

\tl_new:N \l_mypkg_other_tl
\tl_set:Nn \l_mypkg_other_tl { \ERROR ^ _ # $ ! }

is also perfectly-valid for the content of a token list variable (although whether we’ll be able to use it safely is a different matter).

Variable naming and TeX’s grouping system

From the earlier discussion of the way that functions are named in LaTeX3, it might be obvious that there is also a system to how variables are named. Skipping over the initial \l_, what we’ve got is a module name (mypkg), some further description of the nature of the variable (in this case name), and finally the variable type (tl), divided up by _ in exactly the same way we did for functions. We’ll see that other variables follow the same scheme.

So what’s the leading \l_ about? This tells us about the scope that we should use when setting the variable. As TeX is a macro expansion language, variables are not local to functions. However, they can be local to TeX groups, which are created in LaTeX3 using

\group_begin:
% Code here
\group_end:

Setting a variable locally means that any changes stay within a group

\tl_new:N \l_mypkg_name_tl
\tl_set:Nn \l_mypkg_name_tl { Fred }
\group_begin:
  \tl_set:Nn \l_mypkg_name_tl { Ginger }
\group_end:
% \l_mypkg_name_tl reverts to 'Fred'

On the other hand, we sometimes need global variables which ignore any groups

\tl_new:N \g_mypkg_name_tl
\tl_gset:Nn \g_mypkg_name_tl { Fred }
\group_begin:
  \tl_gset:Nn \g_mypkg_name_tl { Ginger }
\group_end:
% \g_mypkg_name_tl still 'Ginger'

So the \l_ or \g_ tells you what scope the variable contents have, and whether you should set or gset it. (You can probably work out that gset means ‘set globally’.)

Using the content of token list variables

Okay, putting stuff into token list variables is all very well and good, but unless we can do something with the content then it’s not really that useful. Of course, we can do things with the content of variables. The most basic thing to do is simply to insert the content of the tl into the input that TeX is working with

\tl_use:N \l_mypkg_name_tl

That’s very handy, but we can also examine the content of a token list variable. For example, we saw before that \tl_length:n will produce the length of a token list, and we can do the same for a token list variable using \tl_length:N.

\tl_set:Nn \l_mypkg_name_tl { Fred }
\tl_length:N \l_mypkg_name_tl % '4'

There’s a lot more we can do with token list variables, but this post is already long enough, so I’ll come back to more that we can do with them in the next post.

Tips for TeX programmers: the internals of token list variable

Experienced TeX programmers are probably wondered about token list variables, and in particular exactly what the underlying TeX structure is. A tl is just a macro that we are using as a variable rather than function. That should not be too much of a surprise, as storing tokens in macros is very much basic TeX programming. So \tl_set:Nn is almost the same as the \def primitive.

What might worry you slightly is that I said

\tl_new:N \l_mypkg_other_tl
\tl_set:Nn \l_mypkg_other_tl { \ERROR ^ _ # $ ! }

will work. That won’t work with \def, and you’d normally expect to need a token register (toks) for this. However, we don’t use toks for LaTeX3 programming at all, and that’s because we require e-TeX. So

\tl_set:Nn \l_mypkg_other_tl { \ERROR ^ _ # $ ! }

is actually the same as

\edef \l_mypkg_other_tl { \unexpanded { \ERROR ^ _ # $ ! } }

which will allow us to put any tokens inside a macro.

The other thing you might notice is that I’ve said that tls have to be declared, even though at a TeX level this is not the case. This is a principle of good LaTeX3 programming, and although it’s not enforced as standard any non-declared token list variables are coding errors. You can test for this using

\usepackage[check-declarations]{expl3}

which uses some slow checking code to make sure that all variables are declared before they are used.

Written by Joseph Wright

December 26th, 2011 at 12:19 pm

Posted in LaTeX3

Tagged with

Programming LaTeX3: Category codes, tokens and token lists

with 2 comments

Understanding LaTeX3 programming relies on understanding TeX concepts, and one we need to get to grips with is how TeX deals with tokens. Experienced TeX programmers will probably find the first part of this post very straight-forward, so might want to skim read the start!

Category codes and tokens

When TeX reads input, it is not only the characters that are there that are important. Each character has an associated category code: a way to interpret that character. The combination of a character and it’s category code then sets how TeX will deal with the input. For example, when TeX read ‘a’ it finds that it’s (normally) a letter, and so tokenizes the input as ‘a, letter’. This seems pretty obvious: ‘a’ is a letter, after all. But this is not fixed, at least for TeX. I’ve already mentioned that within the LaTeX3 programming environment : and _ can be part of function names: that’s because they are ‘letters’ while we are programming!

What’s of most importance now is that a control sequence (something like \emph or \cs_new:Npn) is stored as a single token. So most of the time it these can’t be divided up into their component characters: they act as a single item.

Token lists

The fact that TeX works with tokens means that most of the time we carry out operations on a token-by-token basis, rather than as strings. In LaTeX3  terminology, an arbitrary set of tokens is called a token list, and which of has both defined content and defined order.

To get a better feel for how token lists work, we’ll apply a few basic token list functions to some simple input:

\documentclass{article}
\usepackage{expl3}
\ExplSyntaxOn
\cs_new:Npn \demo:n #1
  {
    \tl_length:n {#1} ;
    \tl_if_empty:nT {#1} { Empty! }
    \tl_if_blank:nTF {#1}
      { Blank! }
      {
        Head = \tl_head:n {#1} ;
        Tail = \tl_tail:n {#1} ;
        End
      }
  }
\cs_new_eq:NN \demo \demo:n
\ExplSyntaxOff
\newcommand*{\hello}{hello}
\begin{document}
\demo{Hello world}

\demo{ }

\demo{}

\demo{\hello}
\end{document}

Okay, what’s going on here? Well, as we saw last time I’ve created a new function, in this case called \demo:n, which contains the code I want to use. In contrast to the last post, I’ve not used it directly but have instead used \cs_new_eq:NN to make a copy of this function but with a document-level name. This is a general LaTeX3 idea: the internals of your code should be defined separately from the interface (indeed, we’ll see later that there is a more formalised way of creating a document-level function). You can probably work out that \cs_new_eq:NN needs two arguments: the new function to create and the old one to copy. (For experienced TeX programmers, it will be no surprise that this is a wrapper around the \let primitive.)

Moving on to what \demo:n is doing, the first thing to see is that I’ve defined it with one argument, agreeing with the :n part of its name. I’ve then done some simple tests on the argument. The first is \tl_length:n, which will count how many tokens are in the input and simply output the result. You’ll notice that it’s ignored the space in Hello world: it’s a common feature of TeX that spaces are often skipped over. You can also see the space-skipping behaviour in the line where I feed \demo a space: the result has a ‘length’ of zero. Also notice that as promised \hello is only a single token. (There is an experimental function in LaTeX3 to count the length of a token list including the spaces. Most of the time, we’ll actually want to ignore them so we won’t worry about that here!)

We then have to conditionals, \tl_if_empty:nT and \tl_if_blank:nTF. First, we’ll look at what a conditional does in general, then at these two in particular. The LaTeX3 approach to conditionals is to accept either one or two arguments, which might read T, F or TF, so in general there are always three related functions:

  \foo_if_something:nT
  \foo_if_something:nF
  \foo_if_something:nTF

The test is always the same for the three related versions, with the T and F part tells us what code is used depending on the result of the test. So if we do a test and it’s true, the T code will be used if it’s there, and the F code will be skipped entirely, while if there is no T code then nothing happens. It’s of course the other way around when the test is false!

So what’s happening with \tl_if_empty:nT and \tl_if_blank:nTF? In the first test, we only print { Empty! } if there is nothing at all in the argument to \demo:n. If the argument is no empty, then this test does nothing at all. On the other hand, the \tl_if_blank:nTF test will print { Blank! } if the argument is either entirely empty or is only made up of spaces (so it looks blank). However, if it’s not blank then we apply two more functions.

The functions \tl_head:n and \tl_tail:n find the very first token and everything but the very first token, respectively. So \tl_head:n finds just the H of Hello world while \tl_tail:n finds ello world. I’ve only used them if the entire argument is not blank as they are not really designed to deal with cases where there is nothing to split up! You might wonder about the last test, where \demo{\hello} has Hello as the head part and nothing as the tail. That happens because what is tested here is \hello, a single token, which is then turned into the text we see by TeX during typesetting. That can be avoided, but at this stage we’ll not worry too much!

Written by Joseph Wright

December 21st, 2011 at 11:01 pm

Posted in LaTeX3

Tagged with

Programming LaTeX3: Creating functions

with 6 comments

Teaching a programming language traditionally starts with a method to print ‘Hello World’. For programming LaTeX3, we can’t quite start there as

\documentclass{article}
\begin{document}
Hello world
\end{document}

will happily do that without needing any programming. So I’ll start by printing ‘Hello World’ lots of times!

Our first function

LaTeX3 has a built-in method for creating multiple copies of text, which we could use directly. However, that would mean using a code-level macro in the document itself, and so I’ll create a wrapper macro. For this first example, I’ll include all of the document:

\documentclass{article}
\usepackage{expl3}
\ExplSyntaxOn
\cs_new:Npn \SayHello #1
  { \prg_replicate:nn {#1} { Hello~World!~ } }
\ExplSyntaxOff
\begin{document}
\SayHello{100}
\end{document}

This will give you, as promised, 100 copies of ‘Hello World!’.

So what is going on here? As you might work out, I’ve defined a new command called \SayHello which prints as many copies of ‘Hello World!’ as requested. Later on we’ll see that this is usually not how I’d choose to create a ‘document command’, but for the moment I’ll pass over that point so we can get some basics established.

The structure of function names

Getting down to detail, I’ve introduced two LaTeX3 functions here: \cs_new:Npn and \prg_replicate:nn. As promised, these use : and _ as ‘letters’ in their names. But what do they do? As you might guess from the names, \cs_new:Npn is used to create a new control sequence, while \prg_replicate:nn makes lots of copies of something (it replicates stuff). The naming convention for LaTeX3 is that the first part of the name (\cs_… or \prg_…) refers to the module the function comes from. So \cs_new:Npn is from the module for control sequences, which we abbreviate as cs, while \prg_replicate:nn is from the general programming utilities module, which is abbreviated as prg. For programmers working outside of the LaTeX3 kernel, a module is probably going to be the same as a LaTeX2e package. So the module part of the name is used to divide up code into related blocks: each module should use a unique prefix, and I’ll tend to use \mypkg… for demonstration purposes.

Up to the :, the rest of the name is up to the programmer and should help you understand what a function does. So \cs_new:Npn tells us that the function makes new a control sequence, and so is pretty similar to LaTeX2e’s \newcommand. We can have multiple parts to the name divided by _ for ease of reading. For example \cs_new_nopar:Npn is available for creating new functions which will give an error if they pick up a \par: this is similar to \newcommand*. You can probably work out the analysis of \prg_replicate:nn yourself!

The part of the name after the : is perhaps one of the most confusing ideas for new LaTeX3 programmer, especially if they are used to other languages. It’s called the argument specification or signature of the function, and tells us about the number and type of arguments a function takes. If you have experience in other programming languages, you’re probably wondering why we include this information in the function name. As we’ll see as we look in more detail at LaTeX3, this approach works as it reflects how TeX works.

So what do the different letters mean? Each letter (usually) represents one argument for a function. So \prg_replicate:nn with two letters after the : needs two arguments. (For those of you who haven’t come across arguments before, something like \maketitle takes no arguments, \emph needs one argument, \setlength takes two arguments, and so on.) The letter itself then tells us about the type of argument: n means tokens in braces (a ‘normal’ argument).  In \cs_new:Npn, the n-type argument is the code which we are creating. An N means that the argument has to be a single token without any braces: in our current case this will be the name of the new function. The p is a bit more complicated: it means that the second argument here is a parameter specification. Here, we can use #1, #2, etc., to represent the arguments for the new function, in exactly the same way we do in the code. So when we use \SayHello, it will expect to find one argument, and will insert that into the place marked as #1 in the code part.

Analysis \prg_replicate:nn

The same analysis applies to \prg_replicate:nn, which we can now see needs two arguments, both in braces. The first one is the number of times to repeat, and the second argument is what to repeat. So in \SayHello the number of repetitions is supplied by the user (this will replace #1), but the text is fixed by the programmer.

The reference for finding out what functions are available, and what arguments they take, is interface3. I’ll only be covering a selection of what is available, so over time you’ll need to get familiar with the formal documentation to find out what you can do. If you take a look there, you’ll see that the first argument for \prg_replicate:nn is an integer expression. That means that we don’t have to use a number directly here, but can also use something that will result in a number once TeX has worked it out. That will carry through to our user function, so

\SayHello{ 10 - 3 + 4 }

will be valid input.

Functions or macros?

Experienced TeX programmers will probably be worried that I’m talking about ‘functions’ and not about ‘macros’. TeX is a macro expansion language, which means that when it reads \SayHello, it replaces it by the code we’ve defined as the meaning of \SayHello, then reads the start of the inserted code, replaces it as necessary and so on until it has something to typeset (such as a letter) or execute (a ‘primitive’). That means that programming TeX is very different from programming using true functions.

The LaTeX3 programming approach allows us to treat many macros as if they were functions, but there are places where we’ll need to think about macros being expanded. Throughout the LaTeX3 documentation, programming is described in terms of functions, and so I’ll stick to that approach. Bear in mind that underlying everything is a set of macros, and that this will show up from time to time.

Written by Joseph Wright

December 14th, 2011 at 9:38 pm

Posted in LaTeX3

Tagged with

Programming LaTeX3: The programming environment

with 3 comments

In the previous post, I mentioned that programming LaTeX3 today really means programming using LaTeX3 ideas but on top of LaTeX2e. To do that, we are going to need to load the appropriate code, and then access the LaTeX3 programming environment. The exact detail depends on whether we are programming in the preamble of a LaTeX document or creating a package. I’ll look at both of these before taking a closer look at the LaTeX3 programming environment in general.  What you should notice is that the use of a separate programming environment very much separates out the process of creating code from creating documents: that is quite deliberate and is something that we’ll see again in the series.

In the preamble of a document

The LaTeX3 programming code usable with LaTeX2e is available as a package called expl3 (which for various reasons is distributed as part of l3kernel). This is loaded in the usual way

\documentclass{article}
\usepackage{expl3}

That loads the code, but does not get us into the programming environment. To do that, we need to use a couple of new macros

\ExplSyntaxOn
% Code goes here
\ExplSyntaxOff

In some ways, this is similar to the LaTeX2e \makeatletter … \makeatother idea, but as we’ll see it’s a bit more advanced.

In a LaTeX2e package

In exactly the same way as in a document, the first stage in using LaTeX3 programming in a package is to load the code.

\RequirePackage{expl3}

Once again, that loads the code but does not switch the syntax on. We could use \ExplSyntaxOn here, but for packages a more flexible alternative is to declare the package as being LaTeX3-based:

\ProvidesExplPackage
  {mypkg}               % Package name
  {2011-12-11}          % Release date
  {1.0}                 % Release version
  {Some things I wrote} % Description

This is a special version of the standard \ProvidesPackage macro, which will automatically turn on LaTeX3 programming syntax and more importantly turn it off at the end of the package. It also deals properly with nested package loading, and so is the recommended way to use LaTeX3 syntax inside LaTeX2e packages.

The coding environment

Whether you’re using LaTeX3 syntax in a document or a package, the basic ideas are the same. The first thing to notice is that white space (spaces, tabs and new lines) are ignored inside the programming environment. This means we can use it to lay out our code more clearly, but you might wonder how to actually include a space. This is handled by defining ~ as a ‘normal’ space, rather than as the usual non-breaking version.

The programming environment also makes it possible to use : and _ inside the names of commands, which are more formally called control sequences. TeX decides what is a valid control sequence name based on something called the category code of each character. I’ll be explaining more about category code as we go along, but for the moment the key is to understand that that a control sequence is \ followed either by exactly one non-‘letter’ or by one or more ‘letters’. Inside the code environment : and _ are treated as letters by TeX: this is the same idea as using @ as an extra ‘letter’ in LaTeX2e code.

Not only are : and _ available for use in control sequences but they are required by the conventions of LaTeX3 programming. In contrast to LaTeX2e’s sometimes haphazard use of @ in names, there are guidelines for applying both : and _ in LaTeX3 names. Rather than give a formal list now, I’ll bring in the system in the next couple of posts using some examples.

One difference between programming in a document and in a package is the status of @. LaTeX2e automatically makes it a letter in package code, but in a document this does not happen. LaTeX3 does not assign any special meaning to @, and so these difference are not affected by loading LaTeX3 support.

A standard document

As we’ll be needing the basics here for everything from now on, I’ll assume that you are using a short testing document for LaTeX3 programming:

\documentclass{article}
\usepackage{expl3}
\ExplSyntaxOn
% Code will go here
\ExplSyntaxOff
\begin{document}
\end{document}

Written by Joseph Wright

December 11th, 2011 at 6:01 pm

Posted in LaTeX3

Tagged with

Programming LaTeX3: Background

with one comment

Before the series on programming LaTeX3 can really get started, it’s going to be important to establish some background, basic concepts and indeed what the aims are. So in this post I’m going cover some of these issues: we won’t be seeing any code just yet! The approach I’m aiming to take is to bring in concepts as they are needed: this may mean a few simplifications in the beginning to allow ideas to be developed.

LaTeX3: What is available now?

The very first thing to cover is what the current status of LaTeX3 is, and what the aim of this series is. Anyone following LaTeX3 development will know that at the moment it’s not ready for creating documents independent of LaTeX2e. What is available now is a programming layer: l3kernel. At the same time, one of the aims with LaTeX3 is to clearly separate out programming, design decisions and actually using LaTeX. So what I will be covering here is programming. At the same time, I’ll aim to highlight concepts which are not necessarily tied to LaTeX3 programming but which the LaTeX3 Project feel are part of the overall aims of LaTeX3 development.

The target audience

I have two distinct audiences in mind in writing this series. The first is experienced (La)TeX programmers who want to see what ideas LaTeX3 introduces. These people will be familiar with many basic TeX concepts, and will want to see the relationship between what they are used to and the ‘LaTeX3 way’. The second group is experience LaTeX2e users who want to learn to program LaTeX, and have decided to miss out learning to program TeX first. It’s important that the latter group are included: another key aim for LaTeX3 is to provide a complete set of documentation and support without having to say ‘read The TeXbook’ as a requirement to make progress.

What both of these groups have in common is lots of experience with LaTeX2e. So I’m going to expect familiarity with LaTeX2e’s user syntax, concepts and so on. So that will very much be the baseline: I do hope that the more experienced LaTeX programmers will bear with me.

Requirements

As I’ve indicated, programming LaTeX3 currently means works on top of LaTeX2e. So to get started you need a LaTeX2e installation, which for most people means either TeX Live or MiKTeX. Most of the code in the programming layer of LaTeX3 has been moving to a stable situation for some time, but there are refinements going on all of the time. As a result, I’ll be assuming that readers have the latest CTAN releases of l3kernel and l3packages installed. That can be done by downloading them from CTAN directly, or using the package managers in TeX Live 2011 or MiKTeX 2.9.

Written by Joseph Wright

December 7th, 2011 at 10:14 pm

Posted in LaTeX3

Tagged with

Programming LaTeX3: Introduction

with 4 comments

Development of LaTeX3 has attracted interest from other TeX programmers for a while. One of the big barriers to new entrants is that programming LaTeX3 is distinct from programming LaTeX2e or plain TeX. So what is needed is a ‘Programming LaTeX3’ guide. The problem is getting one written: these things take time, and what to write is also something of a challenge.

To make a start on tackling this, I thought it would be useful to write a series of short blog posts, taking one area of LaTeX3 at a time and looking at it from the point of view of beginner in programming LaTeX3. The idea is that by keeping things short I can divide the problem into manageable chunks (both for readers and for me), and get feedback on each part before taking on the next one. If I make decent progress, I’ll then have some material to edit into something like an article for TUGBoat.

Now, to do a reasonable job I will have to cover some things I’ve looked at before: sorry if it turns out to be repetitive in places. I’m planning to start by looking at how you can actually start programming LaTeX3 today, covering the idea of ‘LaTeX3 in 2e’, for example. Then it will be on to the basics of the language, before we even get to creating any macros. Ideas for topics to cover are very welcome!

Written by Joseph Wright

December 6th, 2011 at 10:00 pm

Posted in LaTeX3

Tagged with

Font schemes and LaTeX3

with 10 comments

There was a question recently on the TeX.sx site about font selection and LaTeX3. At the moment, there is not a LaTeX3 font system set up, and there are issues outstanding, so this is not something with a single answer. What I can do, though, is look at what seems likely and what some of the areas to consider are.

(New) Font Selection Scheme

TeX’s font mechanism is pretty basic. There is no relationship between one text font and another: they are all simply set up using the \font primitive. So with plain TeX

{\bf Some {\it text}}

will have ‘Some’ in bold, but ‘text’ in mid-weight italics. LaTeX2e introduced the ‘New Font Selection Scheme’ (NFSS), which provides a method for managing fonts in a way that is likely to be more logical for the user. Thus

{\bfseries Some {\itshape text}}

will have the inner text both bold and italic. At the same time, the NFSS provides a system for loading font files in an organised way and substituting fonts when a particular shape combination is unavailable.

Over all, the NFSS is one the key successes of LaTeX2e compared with LaTeX2.09. There are also a lot of existing .fd files about for using fonts with LaTeX2e, and supporting those is important. So something like the NFSS is definitely needed: the ‘New’ is rather anachronistic nowadays, so the working title is just FSS.

The NFSS is not perfect, and so LaTeX3′s FSS cannot be simply a clone of NFSS. Perhaps the most common complaint about the NFSS is that \textsc is treated as a shape, which makes it impossible to combine it with \itshape to have italic small caps. Other areas which need addressing are for example flexible sizing and proportional/fixed width numbers for tables. This is all evolutionary, and so the plan is to port the existing NFSS first, tidy it up to fit better with LaTeX3 coding approaches, then add new abilities.

Font face loading

The second area to think about is loading fonts in the first place. The traditional LaTeX2e approach to this to set up a small(ish) package to select a font family, for example lmodern or mathptmx, which will then use the NFSS to load the appropriate TeX font files. For users of XeTeX or LuaTeX, the standard method is to use the fontspec package, which provides an interface between the extended \font primitives in these engines and the NFSS.

There are a few things to think about here. First, while XeTeX and LuaTeX can load system fonts directly, pdfTeX cannot. Secondly, even if you are using XeTeX or LuaTeX access to traditional TeX fonts cannot be ignored. There is a lot of MetaFont material on CTAN which is not available in any other format, so simply dropping support for these is not an option.

What I feel we need is a single font-loading interface at the user level which is capable of dealing with these requirements. Clearly, fontspec is going to provide inspiration on how to proceed, but some mechanism for working with pdfTeX will also be needed. My personal take on this is we’ll need a mapping layer, which will mean that at the user level you choose a font by name (as you would in a GUI application), and which then does the appropriate translation to the engine layer.

There are also math mode fonts to worry about. OpenType maths fonts are very much in development, but that doesn’t help with pdfTeX and again does not cover all cases. So again we need to continue to support TeX’s traditional math mode fonts. That will probably be the last part of this particular jigsaw to be tackled, simply because it’s the one with the least clear path at present.

Written by Joseph Wright

November 27th, 2011 at 11:48 am

Posted in LaTeX3

Tagged with , ,