Additional primitives for XeTeX

XeTeX has, over the past few years, made using TeX with multiple fonts and UTF-8 input easy. The work-flow using XeTeX is very much more accessible than the routes needed using pdfTeX or TeX82. So I’m sure that many people, like me, use XeTeX whenever they want to use arbitrary fonts or to write anything which doesn’t use western European characters.

XeTeX is based on ε-TeX, which means it has a number of primitives which were not present in TeX82, but are present in ε-TeX itself or in pdfTeX (which also includes the ε-TeX primitives). However, ε-TeX was finalised over ten years ago, and since then the pdfTeX team have added a number of new primitives, many related directly to PDF output. At the same time, XeTeX includes its own new or extended primitive functions, in this case focussed on UTF input. For the most part this does not concern people as things work fine.

Recently, there has been some testing of the current LaTeX3 code with XeTeX (and older versions of pdfTeX, which don’t have all of the newer primitives). LaTeX3 requires the ε-TeX extensions, which are as I said available with any modern TeX engine. However, when it’s available LaTeX3 also uses the \pdstrcmp primitive: this is only present in newer versions of pdfTeX. For those people not familiar with \pdstrcmp, it allows you to do string comparisons of text (not token comparisons), and in an expandable manner. This is very useful, and much better than doing things without it; with no \pdfstrcmp, comparisons are not expandable. It became clear that there is a danger of some things working when using newer versions of pdfTeX, but failing with older ones or with XeTeX. Older versions of pdfTeX is one thing (the advice can simply be “sorry, you’ll have to update your pdfTeX”), but failing with XeTeX is simply no acceptable. After a bit of discussion, the best solution seemed to be to talk to Jonathan Kew about getting a very small number of “new” pdfTeX primitives into XeTeX.

At the moment, things are still under discussion, but the list of additional primitives is going to be small (somewhere between 2 and 5 seems likely). I think it’s giving nothing away to say that \pdfstrcmp is one that really is needed (although the name might be an issue!). Another likely candidate is \ifincsname, which looks handy and also not too complex to implement. There are a few other suggestions, but I’m not sure just yet what will be really needed, as opposed to nice to have. What is clear is that this is a one-off request. Once these small gaps are filled, LaTeX3 will not be using other primitives for general functions. I’m not sure how long it will take to finalise things, both for the team to agree on what is needed and for Jonathan Kew to do the hard work, but I’d imagine weeks not longer.

A XeTeX snag

Testing the current LaTeX3 code means worrying about what primitives are available. Recent releases of pdfTeX have included a number of new primitives, and the expl3 system currently uses \pdfstrcmp if it finds it. This has already caused one issue, as things were broken if it was not available. So I’ve been modifying the test system a little, to run the tests with pdfLaTeX and XeLaTeX. The “new” primitives are not available with XeTeX, so this is a good test of our code: it’s supposed to work with both. I’ve found one odd thing going on with the LateX3 code, but also one thing that seems to be a XeTeX snag.

The snag can be tracked down (thanks to Morten Høgholm) to a minimal case:

\message{\ifcat\par\relax T\else F\fi}

If you try this in TeX or pdfTeX, you get T in the log (as The TeXbook says you should). With XeTeX, you get F. Whether this is a bug or a feature, I don’t know. I’m sure Jonathan Kew will elaborate, but it reminds me of the need to test using different set ups. You can imagine some very odd errors appearing with this type of thing!

LuaTeX: new primitives, new names

There has been quite a bit of dicussion recently on the LuaTeX mailing list about primitive names. The basic set of primitive names provided by TeX has already been extended by e-TeX, pdfTeX and XeTeX. While e-TeX primitives have names which express only the function (such as \detokenize), both pdfTeX and XeTeX have tended to add the engine name to all new primitives. So for example pdfTeX provides the \pdfprimitive function, which always carries out the orginial function of a name, even if it has later been redefined:

% \everypar{Some stuff} % Gives an error
\pdfprimitive\everypar{Some stuff}

will, for example, add material to \everypar despite the redefinition.

In LuaTeX, the idea of adding an engine-specific prefix has been abandoned. The example above shows one reason why: \pdfprimitive makes more sense with the name \primitive (which is what LuaTeX calls it). The problem with this is that name clashes are marginally more likely without a prefix (for example, \expanded is used in ConTeXt as a macro, but this is planned as a primitive for pdfTeX 1.50).

On the other hand, LuaTeX is a fundamental break with previous engines, as the internal changes mean that it may give different line breaks and so on to earlier systems. Thus the argument for new, logical, primitive names is that moving to LuaTeX is not the same as moving from TeX to pdfTeX or pdfTeX to XeTeX. There will be changes, and the user (or package developer) will need to think about them before diving in.

Overall, I think the LuaTeX idea is the best in the long term. Primitives which have well specified names help everyone, and some times change is necessary. I’ll be interest to see if the \directlua primitive for actually using Lua ends up with a shorter name (just \lua, anyone?).

Improving LaTeX for the user

I’ve been discussing some points about the future of (La)TeX with various people, and some key issues come to mind. Most LaTeX users do not want to meddle with the internal parts of LaTeX or TeX. In an ideal world, I suspect most users would like to need little beyond the correct document class to get things “just right” in their layout. Perhaps a few simple settings, but really little more than that.

With the correct packages and class loaded, you can do many things in LaTeX. However, you really shouldn’t need to load specific support for hyperlinks, T1 encoding, basic font changing, creating new float types and so on in 2009 (let alone 2010, 2011, etc.). The efforts of the LaTeX3 team have to date focussed on programming and to a lesser extent document design. How things will work at the user level is much less clear.

I’d suggest that a real focus on getting something for users would be the best way forward. This might mean less improvement internally, but I’d think that a LaTeX kernel which could do everything in The LaTeX Companion would be pretty successful, even with few changes “under the hood”. This would mainly be a re-coding excercise from existing packages, which in a way is similar to what I’ve tried to do with siunitx. Much of the basics in siunitx are taken from other packages (at least in terms of user interface), but it brings several ideas together in one place. The same idea could easily be applied to the kernel. Of course, this might leave some of the clever ideas for LaTeX3 out of the code at this stage, but I’d hope would get momentum behind a more regularly updated system.

One particular area to think about is fonts. With both XeTeX and LuaTeX able to handle system fonts directly, the basic LaTeX system seems very antiquated. At present, LaTeX3 only requires e-TeX, not LuaTeX (in contrast to ConTeXt Mark IV). Should the LaTeX team say something like:

For current testing purposes, only the e-TeX extensions are needed, but this is likely to change. XeTeX or LuaTeX will be required to run the release version of LaTeX3 with full functionality.

I’d say yes, as I think that it’s time to move on from complex font installation and usage restrictions. I’d also be very tempted to say that LaTeX3 will assume UTF-8 input unless otherwise specified (as both XeTeX and LuaTeX are native UTF-8 systems).

This type of approach will make LaTeX easier to use, and I’d hope to see it arrive! After all, users are the TeX community.

XeTeX and LuaTeX: Directions

The XeTeX and LuaTeX engines both offer exciting ideas to the TeX user and the (La)TeX programmer. Using XeTeX, UTF-8 input is easy: no more awkward escape sequences. Font handling in XeTeX also makes it trivial to use any font installed on your system. On the other side of the equation, LuaTeX is going to bring a lot of useful programming tools to the TeX world. This should make some things a lot easier (handling floating point manipulation seems an obvious one), and make other things possible that are not currently.

The issue for me is the gaps between the two. XeTeX doesn’t have things like microtypography (try loading the microtype package and doing a XeLaTeX run), and it’s not clear to me how LuaTeX will work out with font handling. There’s also the DVI versus PDF issue: will we see a system where EPS, PNG, JPEG and PDF graphics can all be included without worrying about the engine in use? In many ways, I suppose these questions are more for the LuaTeX team to address: XeTeX has essentially delivered what it set out to do (a Unicode TeX which can use system fonts natively).