A LaTeX format beyond LaTeX2e

The question of why LaTeX3 development is not focussed on LuaTeX came up yesterday on the TeX-sx site. I’ve added an answer there covering some of the issues, but I thought that something a bit more open-ended might also be useful on the same topic.

Before I look at the approaches that are available, it’s worth asking why a format is needed beyond LaTeX2e. There are a few reasons I feel it’s needed, but a few stand out.

The first, strangely, is stability. LaTeX2e is stable: there will be no changes other than bug fixes. That means that a document written 10 or more years ago should still give the same output when typeset today. That sounds great, but there is an issue here. While the kernel is stable, packages are not, and the limitations of the kernel mean that there are a lot of packages. So for a lot of real documents, stability in the kernel does not mean that they will still work after many years, at least without some effort. So we need a kernel which provides a lot more of the basics, and perhaps new approaches to providing stable code.

Secondly, and related, is the fact that most real documents need a lot of packages, and that is a barrier to new users. Again, stability is great but not if it means we don’t continue to attract new people to the LaTeX world. I think that the LaTeX approach is a good one, so that is important to me. So I feel that we need a format which works well and provides a lot more functionality as standard.

Thirdly, there are some fundamental issues which are hard to address, such as inter-paragraph spacing, the placement of floats and better separation of design from input. There all need big changes in LaTeX, and it’s not realistic to hope to bolt such changes on to LaTeX2e and have everything continue to work.

All of that tells me we need a new kernel. So the question is how to achieve that. There are at least four programming approaches I’ve thought about.

Two are closely related: stick with TeX macro programming and cross-engine working, but make things more systematic. Perhaps the simplest way to do this is to adopt an approach similar to the etoolbox package, and to essentially add to the structures already available. The more radical approach in the same area is to do what the LaTeX3 Project have to date, and define a new programming language from the ground up using TeX macros.  There are arguments in favour of both of these approaches: I’ve done some experiments with a more etoolbox-like method for creating a format. My take here is that if you really want something more systematic than LaTeX2e then you do have to go to something like the LaTeX3 method: dealing with expansion with names like \csletcs gets too unwieldy as you try to construct an entire format.

Moving to a LuaTeX-only solution, and doing a lot of the programming in Lua, is the method that the ConTeXt team has decided on. This brings in a proper programming language without any direct effort, but leaves open some issues Using Lua does not automatically solve the challenges in writing a better format, and using LuaTeX does not mean not that there is no TeX programming to do. So a LuaTeX-only approach would still need some TeX work.

Finally, there is the argument for parsing LaTeX-like input in an entirely new way. In this model, you don’t use TeX at all to read the user’s input: that’s done by another language, and TeX is only involved at all when you do the typesetting. That sound challenging, and the big issue here is finding someone who has the necessary programming skills (I certainly do not).

Of the four approaches, it seems to me that from where we are now, the LaTeX3 approach is not so bad. If you were starting today with no code at all, and not background in programming expl3 or Lua, you might pick the LuaTeX method. That’s not, however, where we are: there is experience of expl3 available, and there is also code written (but in need of revision). Of course, the proof of that will be in delivering a working LaTeX3 format: on that, back to work!

3 thoughts on “A LaTeX format beyond LaTeX2e

  1. Joseph well said. There is also one additional method of parsing LaTeX-like input. Provided some work is done to standardize the author interface as much as possible (LaTeX3 or similar can open up a programmer’s API) and let programmers use any other language to preprocess and massage input before it is sent to a middle layer. This way there is no need to parse the full LaTeX input but only interface commands.

  2. Certainly there is a need to think carefully about the user interface. Things like xparse provide at least a guide here, as there is a defined interface that is then separate from the implementation, and can be read in one go. That ties into the idea that as you say the code ‘layer’ should be independent of the interface ‘layer’, as far as possible

  3. Just few toughs:

    1) As you said, ∼etoolbox (i.e etoolbox and similar) and Latex3 are closely related
    and I consider ∼etoolbox ⊂ Latex3 from the point of view of functionality;

    2) With luatex I believe it’s possible to have only a lua
    (i.e. a luc) format. The user input can be in lua and/or in tex;

    3) parsing tex input require a tex-like program. I think that Luatex is the best choice here, so basically it’s a sub-case of 2) .

    In general we think at Latex3 or ConTeXt as something of “complete” (both at least for the macro developer, ConTeXt also for end user)
    — as consequence they tends to be “big” .

    But it’s also reasonable to image that one day we will have libluatex (we already have mplib)
    so that a html5 engine can use it as plugin to translate the html into a pdf — maybe a pdf-a2, or a tagged pdf only.
    In this case a kind of API is required — we already have a low-level API, and that is enough. A mid/high level API can be then the ConTeXt format, or the full Latex3 format, or something in between like ∼etoolbox or a bunch of Latex3 packages, but it should be an application decision
    — maybe at runtime: if the content consists only of many tables (i.e. product catalogs) maybe just few plain tex macros are enough
    and the gain of speed is not negligible. On the other side, a content with many layers can be rendered with context, and a “look-like Latex book” with latex.

    On the other side, while context mkiv is rooted on luatex, Latex3 uses a etex compliant engine, and maybe this can be a penalty because
    etex ⊂ luatex. I’m incline to think that a high level macro format influences the engine — that’s was happened with etex and still happens with mkiv today.

Leave a Reply