% !TEX TS-program = pdflatex % !TEX encoding = UTF-8 Unicode % \documentclass{ltugboat} \usepackage[T1]{fontenc} \usepackage{url,booktabs,underscore} \usepackage[final]{microtype} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Some (very simple) new commands are defined. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \providecommand*{\pkg}[1]{\textsf{#1}} \providecommand*{\opt}[1]{\texttt{#1}} \providecommand*{\file}[1]{\texttt{#1}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Meta-data for this paper %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \title{\LaTeX3: An outsider's overview} \author{Joseph Wright} \address{% Morning Star\\ 2, Dowthorpe End\\ Earls Barton\\ Northampton NN6 0NH\\ United Kingdom} \netaddress{joseph.wright@morningstar2.co.uk} \begin{document} \maketitle \begin{abstract} The current experimental \LaTeX3 packages provide a new, documented programming interface for \TeX. The key ideas implemented in this new interface are highlighted in this article. \end{abstract} \section{Introduction} Modifying the behaviour of \LaTeXe\ often requires a combination of user macros, internal \LaTeX\ macro and \TeX\ primitives. This makes even trial modifications of document layout potentially difficult, even for the experienced \LaTeX\ user. The differing syntax used by \TeX\ primitives and the \LaTeX\ kernel only add to the confusion here. The first step to develop a new \LaTeX\ kernel is therefore to address how the underlying system is programmed. Rather than the current mix of \LaTeX\ and \TeX\ macros, the experimental \LaTeX3 system provides its own consistent interface to all of the functions needed to control \TeX. A key part of this work is to ensure that everything is documented, so that \LaTeX\ users can work efficiently without needing to be familiar with the internal nature of the kernel or with plain \TeX. The current kernel also suffers from the mixing of design commands with structural code. Thus changing a layout element often requires modifying a kernel code block (or loading a package which provides an interface to achieve this). The second challenge for \LaTeX3 is therefore separation of the basic tools of the kernel from the design of documents. This short overview article highlights the key developments to date in \LaTeX3. It is based on my own experience working with the new tools for writing packages, and a talk given recently to the UK \TeX\ Users Group. \section{The components of \LaTeX3} Currently, the experimental \LaTeX3 packages are designed to be used ``on top of'' \LaTeXe. This avoids needing to wait for the entire kernel to be finished before testing what is written. The most developed part of the code is the \pkg{expl3} bundle, the core of the new kernel providing the new programming interface. The new language is fully documented in the file \file{source3.pdf}, which contains some notes for the experienced \AllTeX\ programmer. Built on top of \pkg{expl3} is the \pkg{xparse} package. This is meant to be a ``bridge'' between the internal and user parts of the new kernel. The \pkg{xparse} package is used to create new user macros, in a much more controlled way than is possible using \cs{newcommand}. More experimental than \pkg{xparse} are various other ``\pkg{xpackages}''. These are designed to explore new approaches to layout and document design for \LaTeX3. The most complete part of \LaTeX3 is the \pkg{expl3} bundle. The rest of this article is focussed mainly on the new internal syntax introduced in \pkg{expl3}. \section{A new internal syntax} \LaTeX3 does not use \texttt{@} as a ``letter'' for defining internal macros. Instead, the symbols \texttt{_} and \texttt{:} are used in internal macro names to provide structure. In contrast to the plain \TeX\ format and the \LaTeXe\ kernel, these extra letters are used only between parts of a macro name (no strange vowel replacement). \LaTeX3 separates macros which do something (functions) from ones which only store data. The general form of an internal function in \LaTeX3 is \cs{_:}. The \meta{module} prefix is applied to almost all macros. For a package, it will typically be the package name; the kernel is split into a number of modules, each with its own name. The name of the \meta{function} should give a good description of what it does: this may contain one or more \texttt{_} characters to divide the name into logical units. The concept of the \meta{arg-spec} is potentially confusing to existing \AllTeX\ programmers. This \emph{argument specifier} describes the arguments expected by the function. In most cases, each argument is represented by a single letter. The letter and its case then conveys information about the type of argument required. The use of the \meta{arg-spec} is illustrated later in this article. \subsection{Primitives renamed} All of the \TeX\ primitives are given new names by \pkg{expl3}, although many are not intended to be used outside of the \LaTeX3 kernel. Instead, a number of \LaTeX\ wrappers for primitives are provided, so that the argument syntax is consistent. At the most basic level, the \cs{fi} primitive becomes \cs{fi:}, indicating that no arguments are required. A more complex example is \cs{ifx}, which becomes \cs{if_meaning:wN}. \begin{verbatim} \if_meaning:wN \Macro_One \Macro_Two % Do Stuff \fi \end{verbatim} Here, the \meta{arg-spec} contains two letters, showing that two arguments are required. The first argument here is \texttt{w}, meaning ``weird'', as there are rather exacting requirements for the first token in the comparison. The second argument is of type \texttt{N}, meaning that it should be single token \emph{not} surrounded by braces. \subsection{Example kernel functions} Renaming primitives helps to keep the new syntax consistent, but does not show why the argument specifier is useful. This is perhaps best seen by looking at some of the functions provided by \pkg{expl3}. By using the argument specifier, the new kernel provides families of related functions which avoid the need for complex \cs{expandafter} runs. For example, the \TeX\ primitive \cs{let} can only be used with two macro names. In \LaTeX3, the family of \cs{let}-like macros contains: \begin{verbatim} \cs_set_eq:NN \Macro_One \Macro_Two \cs_set_eq:Nc \Macro_One {Macro_Two} \cs_set_eq:cN {Macro_One} \Macro_Two \cs_set_eq:cc {Macro_One} {Macro_Two} \end{verbatim} where the argument specified as \texttt{c} be given in braces and should expand to a csname. This is much clearer than the equivalent plain \TeX\ constructions; taking \cs{cs_set_eq:Nc} as an example: \begin{verbatim} \expandafter\let\expandafter\Macro_One \csname Macro_Two\endcsname. \end{verbatim} The specifiers \texttt{n} (no expansion), \texttt{o} (expand once) and \texttt{x} (\cs{edef}-like expansion) allow large families of related functions to be created easily, so that using the results is easy. Thus we can create a macro \cs{Macro_One:nn}, then create \cs{Macro_One:no}, \cs{Macro_One:xn} and so on very rapidly. Later, we will see how the \texttt{v} and \texttt{V} argument specifiers add even more power to this concept. The argument specifier concept also makes testing much easier. As an example, the new kernels provides three tests related to the \cs{@ifundefined} macro: \begin{verbatim} \cs_if_exist:cT {csname} {true} \cs_if_exist:cF {csname} {false} \cs_if_exist:cTF {csname} {true} {false} \end{verbatim} In all three cases, the first argument will be converted to a csname (the \texttt{c} specifier). The first two functions then require one more argument, either \texttt{T} or \texttt{F}. As might be expected, these are executed if the test is true or false, respectively. The third function (ending \texttt{:cTF)} has both a true and false branch. By providing tests with the choice of \texttt{T}, \texttt{F} and \texttt{TF} arguments, empty groups in code can be avoided and meaning is much more obvious. \section{Data storage} In \LaTeX3, macros which carry out some process are called functions, and all contain an argument specifier. Macros used for storage are handled separately, to help to make code cleaner and easier to read. To further aid the programmer, \pkg{expl3} defines several new data types: \begin{itemize} \item Token list pointers (\texttt{tlp}); \item Comma lists (\texttt{clist}); \item Property lists (\texttt{prop}); \item Sequences (\texttt{seq}). \end{itemize} in addition to the existing types, which are renamed: \begin{itemize} \item Boolean switches (\texttt{bool}); \item Counters (\texttt{int}); \item Skips (\texttt{skip}); \end{itemize} and so on. The name ``token list pointer'' may cause confusion, and so some background is useful. \TeX\ works with tokens and lists of tokens, rather than characters. It provides two ways to store these token lists, within macros and as token registers (toks). \LaTeX3 retains the name ``toks'' for the later, and adopts the name token list pointer for macros used to store tokens. In most circumstances, the tlp data type is more convenient for storing token lists. The other new variable types are all essentially lists of items separated by a special token. The nature of the separator determines the type of variable and what functions apply. For example, a comma list is, rather obviously, a set of tokens separated by commas. These are all created explicitly as either local or global. For example, a tlp may be named \begin{verbatim} \l___tlp \end{verbatim} (local) or \begin{verbatim} \g___tlp \end{verbatim} (global). The other variable types follow the same pattern, with the appropriate type identified in the variable name. As well as the new data types, \pkg{expl3} provides a range of functions for manipulating data. Often, these have to have been coded by hand when using \LaTeXe. For example, \cs{tlp_elt_count:N} is available, to count the number of elements (often characters) in a tlp. \section{Expanding variables} When coding in \AllTeX, the need to access data in variables is made more complicated by the different possibilities for recovering information later. For example, if three macros are defined as \begin{verbatim} \def\tempa{Some text} \def\tempb{\tempa} \def\tempc{\tempb} \end{verbatim} then there are two likely scenarios for using the information in \cs{tempc}: \begin{itemize} \item Use of the value that \cs{tempc} contains (in this case \cs{tempb}); \item Exhaustive expansion of \cs{tempc} to use the unexpanable token list it represents (in this case ``Some text''). \end{itemize} The situation is further complicated as macros do not need an accessor function, whereas other \TeX\ variables (toks, counts, skips) do. This leads to the need for carefully-constructed \cs{expandafter} runs in \AllTeX, in order to get the content the programmer intents. To avoid this, \LaTeX3 provides two argument specifiers which will always return the content of a variable. The \texttt{V} specifier requires the name of a variable, and returns the content. For example, if we define two variables \begin{verbatim} \toks_set:Nn \l_my_toks { Text \mymacro } \tlp_set:Nn \l_my_tlp { Text \mymacro } \end{verbatim} and pass them to some function \cs{foo_bar:V} \begin{verbatim} \foo_bar:V \l_my_toks \foo_bar:V \l_my_tlp \end{verbatim} both sets of input will result in ``\texttt{Text \cs{mymacro}}'' being passed as the argument to the underlying function \cs{foo_bar:n}. The \texttt{V} specifier also works in the same way with other \LaTeX3 variables. The second ``variable'' specifier is \texttt{v}. This converts its argument to a csname, then recovers the content of the resulting variable and passes the content. Thus we might have \cs{foo_bar:v} and use it as \begin{verbatim} \foo_bar:v { l_my_toks } \foo_bar:v { l_my_tlp } \end{verbatim} with the same result as the previous example. The two variable specifiers are very powerful. By using them, the programmer can avoid almost entirely the need to worry about the order of expansion when using stored information. \section{Other key features} The new kernel will require the \eTeX\ extensions. This means that the new primitives are definitely available when working with \LaTeX3. For example, \cs{unexpanded} is part of the expansion module, as \cs{exp_not:n}. Boolean switches in \TeX\ and \LaTeXe\ use the \cs{iftrue} and \cs{iffalse} primitives. This can lead to problems with nesting (\texttt{Incomplete \cs{if}\ldots}). to avoid this, \LaTeX3 does not create switches in the same way. This means that all of the switches use exclusively \LaTeX\ syntax, and require an ``access'' function. \begin{verbatim} \bool_if:NT \l_example_bool { true code } \bool_if:NF \l_example_bool { false code } \bool_if:NTF \l_example_bool { true code } { false code } \end{verbatim} One of the most useful features of the new coding syntax is the treatment of white space. The literal space character (~) is ignored inside code block, meaning that the text can be laid out to aid ease of reading. When a space is required in the output, the hard space (\verb|~|) is used. The ability to finish lines without needing \verb|%| is highly welcome! \section{Conclusions} The current \LaTeX3 modules provide a new and powerful programming language for \TeX. The full details of the language are collected in one place, and the language is much more logical than the current mix of \TeX\ and \LaTeXe. \LaTeX3 is therefore ready for serious use by \AllTeX\ programmers. At this stage, the document level of \LaTeX3 is much less defined. It seems likely that good separation of programming and document design will be made available. The new code syntax means that a number of ideas currently implemented as independent packages will need to be re-implemented either in the new kernel or as supported tools. My own experience with \LaTeX3 convinces me that the kernel team need outsiders to use the code. The team have done a very good job so far, but everyone will bring new approaches using to the code. With the involvement of the wider \TeX\ community, \LaTeX3 has the potential to be a major step forward for \LaTeX. \makesignature \end{document}