% !TEX TS-program = pdflatex % !TEX encoding = UTF-8 Unicode % \documentclass{ltugboat} \usepackage[T1]{fontenc} \usepackage{url,booktabs,underscore} \usepackage[final]{microtype} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Some (very simple) new commands are defined. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \providecommand*{\pkg}[1]{\textsf{#1}} \providecommand*{\opt}[1]{\texttt{#1}} \providecommand*{\file}[1]{\texttt{#1}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Meta-data for this paper %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \title{\LaTeX3: An outsider's overview} \author{Joseph Wright} \address{% Morning Star\\ 2, Dowthorpe End\\ Earls Barton\\ Northampton NN6 0NH\\ United Kingdom} \netaddress{joseph.wright@morningstar2.co.uk} \begin{document} \maketitle \begin{abstract} The current experimental \LaTeX3 packages provide a new, documented programming interface for \TeX. The key ideas implemented in this new interface are highlighted in this article. \end{abstract} \section{Introduction} Modifying the behaviour of \LaTeXe\ often requires a combination of user macros, internal \LaTeX\ macro and \TeX\ primitives. This makes even trial modifications of document layout potentially difficult, even for the experienced \LaTeX\ user. The differing syntax used by \TeX\ primitives and the \LaTeX\ kernel only add to the confusion here. The first step to develop a new \LaTeX\ kernel is therefore to address how the underlying system is programmed. Rather than the current mix of \LaTeX\ and \TeX\ macros, the experimental \LaTeX3 system provides its own consistent interface to all of the functions needed to control \TeX. A key part of this work is to ensure that everything is documented, so that \LaTeX\ users can work efficiently without needing to be familiar with the internal nature of the kernel or with plain \TeX. The current kernel also suffers from the mixing of design commands with structural code. Thus changing a layout element often requires modifying a kernel code block (or loading a package which provides an interface to achieve this). The second challenge for \LaTeX3 is therefore separation of the basic tools of the kernel from the design of documents. This short overview article highlights the key developments to date in \LaTeX3. It is based on my own experience working with the new tools for writing packages, and a talk given recently to the UK \TeX\ Users Group. \section{The components of \LaTeX3} Currently, the experimental \LaTeX3 packages are designed to be used ``on top of'' \LaTeXe. This avoids needing to wait for the entire kernel to be finished before testing what is written. The most developed part of the code is the \pkg{expl3} bundle, the core of the new kernel providing the new programming interface. The new language is fully documented in the file \file{source3.pdf}, which contains some notes for the experienced \AllTeX\ programmer. Built on top of \pkg{expl3} is the \pkg{xparse} package. This is meant to be a ``bridge'' between the internal and user parts of the new kernel. The \pkg{xparse} package is used to create new user macros, in a much more controlled way than is possible using \cs{newcommand}. More experimental than \pkg{xparse} are various other ``\pkg{xpackages}''. These are designed to explore new approaches to layout and document design for \LaTeX3. The most complete part of \LaTeX3 is the \pkg{expl3} bundle. The rest of this article is focussed mainly on the new internal syntax introduced in \pkg{expl3}. \section{A new internal syntax} \LaTeX3 does not use \texttt{@} as a ``letter'' for defining internal macros. Instead, the symbols \texttt{_} and \texttt{:} are used in internal macro names to provide structure. In contrast to the plain \TeX\ format and the \LaTeXe\ kernel, these extra letters are used only between parts of a macro name (no strange vowel replacement). \LaTeX3 separates macros which do something (functions) from ones which only store data. The general form of an internal function in \LaTeX3 is \cs{_:}. The \meta{module} prefix is applied to almost all macros. For a package, it will typically be the package name; the kernel is split into a number of modules, each with its own name. The name of the \meta{function} should give a good description of what it does: this may contain one or more \texttt{_} characters to divide the name into logical units. The concept of the \meta{arg-spec} is potentially confusing to existing \AllTeX\ programmers. This \emph{argument specifier} describes the arguments expected by the function. In most cases, each argument is represented by a single letter. The letter and its case then conveys information about the type of argument required. The use of the \meta{arg-spec} is illustrated later in this article. \subsection{Primitives renamed} All of the \TeX\ primitives are given new names by \pkg{expl3}. Many are also given new \LaTeX-like wrappers, so that the argument syntax is consistent. Many basic primitives have names which are little altered from the \TeX\ original. At the most basic level, the \cs{fi} primitive becomes \cs{fi:}, indicating that no arguments are required. A more complex example is \cs{ifx}, which becomes \cs{if_meaning:NN}. \begin{verbatim} \if_meaning:NN \Macro_One \Macro_Two % Do Stuff \fi \end{verbatim} Here, the \meta{arg-spec} contains two letters, showing that two arguments are required. Both arguments are shown to be of type \texttt{N}, meaning that they should be single tokens \emph{not} surrounded by braces. \subsection{Example kernel functions} Renaming primitives helps to keep the new syntax consistent, but does not show why the argument specifier is useful. This is perhaps best seen by looking at some of the functions provided by \pkg{expl3}. By using the argument specifier, the new kernel provides families of related functions which avoid the need for complex \cs{expandafter} runs. For example, the \TeX\ primitive \cs{let} can only be used with two macro names. In \LaTeX3, the family of \cs{let} macros contains: \begin{verbatim} \let:NN \Macro_One \Macro_Two \let:Nc \Macro_One {Macro_Two} \let:cN {Macro_One} \Macro_Two \let:cc {Macro_One} {Macro_Two} \end{verbatim} where the argument specified as \texttt{c} be given in braces and should expand to a csname. This is much clearer than the equivalent plain \TeX\ constructions; taking \cs{let:Nc} as an example: \begin{verbatim} \expandafter\let\expandafter\Macro_One \csname Macro_Two\endcsname. \end{verbatim} The specifiers \texttt{n} (no expansion), \texttt{o} (expand once) and \texttt{x} (\cs{edef}-like expansion) allow large families of related functions to be created easily, so that using the results is easy. Thus we can create a macro \cs{Macro_One:nn}, then create \cs{Macro_One:no}, \cs{Macro_One:xn} and so on very rapidly. The argument specifier concept also makes testing much easier. As an example, the new kernels provides three tests related to the \cs{@ifundefined} macro: \begin{verbatim} \cs_if_free:cT {csname} {true} \cs_if_free:cF {csname} {false} \cs_if_free:cTF {csname} {true} {false} \end{verbatim} In all three cases, the first argument will be converted to a csname (the \texttt{c} specifier). The first two functions then require one more argument, either \texttt{T} or \texttt{F}. As might be expected, these are executed if the test is true or false, respectively. The third function (ending \texttt{:cTF)} has both a true and false branch. By providing tests with the choice of \texttt{T}, \texttt{F} and \texttt{TF} arguments, empty groups in code can be avoided and meaning is much more obvious. \section{Data storage} In \LaTeX3, macros which carry out some process are called functions, and all contain an argument specifier. Macros used for storage are handled separately, to help to make code cleaner and easier to read. To further aid the programmer, \pkg{expl3} defines several new data types: \begin{itemize} \item Token list pointers (\texttt{tlp}); \item Comma lists (\texttt{clist}); \item Property lists (\texttt{plist}); \item Sequences (\texttt{seq}). \end{itemize} in addition to the existing types, which are renamed: \begin{itemize} \item Boolean switches (\texttt{bool}); \item Counters (\texttt{int}); \item Skips (\texttt{skip}); \end{itemize} and so on. The name ``token list pointer'' may cause confusion, and so some background is useful. \TeX\ works with tokens and lists of tokens, rather than characters. It provides two ways to store these token lists, within macros and as token registers (toks). \LaTeX3 retains the name ``toks'' for the later, and adopts the name token list pointer for macros used to store tokens. In most circumstances, the tlp data type is more convenient for storing token lists. The other new variable types are all essentially lists of items separated by a special token. The nature of the separator determines the type of variable and what functions apply. For example, a comma list is, rather obviously, a set of tokens separated by commas. These are all created explicitly as either local or global. For example, a tlp may be named \begin{verbatim} \l___tlp \end{verbatim} (local) or \begin{verbatim} \g___tlp \end{verbatim} (global). The other variable types follow the same pattern, with the appropriate type identified in the variable name. As well as the new data types, \pkg{expl3} provides a range of functions for manipulating data. Often, these have to have been coded by hand when using \LaTeXe. For example, \cs{tlp_elt_count:N} is available, to count the number of elements (usually letters) in a tlp. \section{Other key features} The new kernel will require the \eTeX\ extensions. This means that the new primitives are definitely available when working with \LaTeX3. For example, \cs{unexpanded} is part of the expansion module, as \cs{exp_not:n}. Boolean switches in \TeX\ and \LaTeXe\ use the \cs{iftrue} and \cs{iffalse} primitives. This can lead to problems with nesting (\texttt{Incomplete \cs{if}\ldots}). to avoid this, \LaTeX3 does not create switches in the same way. This means that all of the switches use exclusively \LaTeX\ syntax, and require an ``access'' function. \begin{verbatim} \bool_if:NT \l_example_bool {true code} \bool_if:NF \l_example_bool {false code} \bool_if:NTF \l_example_bool {true code} {false code} \end{verbatim} One of the most useful features of the new coding syntax is the treatment of white space. The literal space character (~) is ignored inside code block, meaning that the text can be laid out to aid ease of reading. When a space is required in the output, the hard space (\verb|~|) is used. The ability to finish lines without needing \verb|%| is highly welcome! \section{Conclusions} The current \LaTeX3 modules provide a new and powerful programming language for \TeX. The full details of the language are collected in one place, and the language is much more logical than the current mix of \TeX\ and \LaTeXe. \LaTeX3 is therefore ready for serious use by \AllTeX\ programmers. At this stage, the document level of \LaTeX3 is much less defined. It seems likely that good separation of programming and document design will be made available. The new code syntax means that a number of ideas currently implemented as independent packages will need to be re-implemented either in the new kernel or as supported tools. My own experience with \LaTeX3 convinces me that the kernel team need outsiders to use the code. The team have done a very good job so far, but everyone will bring new approaches using to the code. With the involvement of the wider \TeX\ community, \LaTeX3 has the potential to be a major step forward for \LaTeX. \makesignature \end{document}