Standard font loading in LaTeX2e with XeTeX and LuaTeX

The LaTeX Project have been making efforts over the past few years to update support in the LaTeX2e kernel for XeTeX and LuaTeX. Supporting these Unicode-enabled engines provide new features (and challenges) compared to the ‘classical’ 8-bit TeX engines (probably pdfTeX for most users). Over recent releases, the team have made the core of LaTeX ‘engine-aware’ and pulled a reasonable amount of basic Unicode data directly into the kernel. The next area we are addressing is font loading, or rather the question of what the out-of-the-box (text) font should be.

To date, the LaTeX kernel has loaded Knuth’s Computer Modern font in his original ‘OT1’ encoding for all engines. Whilst there are good reasons to load at least the T1-encoded version rather than the OT1 version, using an 8-bit engine using the OT1 version can be justified: it’s a question of stability, and nothing is actually out-and-out wrong.

Things are different with the Unicode engines: some of the basic assumptions change. In particular, there are some characters in the upper-half of the 8-bit range for T1 that are not in the same place in Unicode. That means that hyphenation will be wrong for words using some characters unless you load a Unicode font. At the same time, both LuaTeX and XeTeX have changed a lot over recent years: stability in the pdfTeX sense isn’t there. Finally, almost all ‘real’ documents using Unicode engines will be loading the excellent fontspec package to allow system font access. Under these circumstances, it’s appropriate to look again at the standard font loading.

After careful consideration, the team have therefore decided that as of the next (2017) LaTeX2e release, the standard text font loaded when XeTeX and LuaTeX are in use will be Latin Modern as a Unicode-encoded OpenType font. (This is the font chosen by fontspec so for almost all users there will no change in output.) No changes are being made to the macro interfaces for fonts, so users wanting anything other than Latin Modern will continue to be best served by loading fontspec. (Some adjustments are being made to the package to be ready for this.)

It’s important to add that no change is being made in math mode: the Unicode maths font situation is not anything like as clear as the text mode case.

There are still some details being finalised, but the general approach is clear and should make life easier for end users.

XeTeX 0.9999: Moving to HarfBuzz (and lots of other goodies)

Khaled Hosny has announced on the XeTeX mailing list that XeTeX 0.9999 has just been released. The list of changes is pretty long, as XeTeX has had quite a backlog of issues. Probably the biggest single change is

Port OpenType layout from ICU LayoutEngine to HarfBuzz. HarfBuzz is actively maintained and generally have much wider support for
OpenType spec, the switch fixes a number of OpenType bugs:

  • Support version 2 OpenType Indic specs.
  • Many other Indic OpenType bugs, and support for the latest additions to OpenType spec.
  • Incorrect application of contextual features.
  • Incorrect kerning in fonts that has both old “kern” table and new GPOS “kern” feature.
  • Allow suppressing Latin ligatures with ZWNJ.
  • Support for variation selectors.
  • Support for user-specified features with complex scripts.

If you are familiar with layout engines, you’ll know that while ICU has worked very well for XeTeX from day one, it’s no longer being developed while HarfBuzz is being developed. More importantly, HarfBuzz is supported by the open source community well beyond the TeX world, so by moving in this direction XeTeX gets the benefits of the efforts of many other people: part of the point of open source software. I’m sure it’s been a big effort making this change: I’m looking forward to testing it out.

The other headline change, at least for Mac users, is moving to Core Text rather than ATS/ATSUI. Apple have dropped support for the latter, so there was a worry about building XeTeX on the Mac in the future. That’s now sorted, and means XeTeX should work as a 64-bit application on the Mac in future.

If you read the full announcement you’ll see there are lots of other changes and bug fixes. Congratulations to Khaled on this: it’s great to see that XeTeX continues to develop, and that several features have been added to make working with XeTeX and LuaTeX seamlessly are now there.

Clipping support in XeTeX

Clipping boxes is something that TeX does not do: it simply places them on the page. That means that clipping graphics (a pretty common requirement) is actually done by the driver rather than by TeX. The LaTeX graphics package and the driver support that come with it cover quite a lot of cases, and over the years support for a number of other drivers have been written based on the same ideas. However, things are still not 100% identical over all back-ends. A particular gap at the moment is that that XeTeX support code does not offer clipping, because the XeTeX engine does not do this (pdfTeX and LuaTeX both do). Users of pgf might have noticed that it manages to do clipping perfectly happily with XeTeX (or rather they might have wondered why graphics doesn’t when pgf does). Martin Scharrer and I looked at this a while ago for his adjustbox package, and worked out what is actually needed: some PostScript specials in a xdvipdfmx wrapper. The same basic idea is now being integrated into xetex.def, the driver support code used by graphics. This will go to CTAN soon, but some testing would be good. The updated file is available now, so I’d encourage intrepid readers to download and test it!

XeTeX, chemstyle and chemscheme

There have been a few queries recently about using my chemstyle package with XeTeX. The problems arise when people attempt to use the \schemeref macro, which is defined by the ‘low-level’ chemscheme package, which is loaded by chemstyle. For those of you not familiar with the package, it’s for chemistry graphics (which are usually called ‘schemes’), and is for putting in reference numbers automatically.

The problems arise because chemstyle ultimately relies on psfrag for the graphic manipulations. To the best of my knowledge, psfrag can’t be made to work with XeTeX, which means that neither can my \schemeref macro. I’ve just uploaded a version of chemstyle to CTAN which says this, and issues a warning if used with XeTeX. I do hope that helps a little.

siunitx: Getting the micro symbol right

I get a few e-mails about siunitx and the micro symbol. People tend to be surprised that the symbol ‘sticks’ to a look very much like Computer Modern. The reason is that picking a proper upright (not italic) μ is not so easy in TeX. You don’t get one in Computer Modern, so siunitx takes one from the TS1 (text support) set in the absence of a better plan. I’ve set up some auto-detection for a few obvious alternatives (such as the upgreek package), but that doesn’t really work for XeTeX users.

XeTeX users are likely to load system fonts, and I’d hope be using UTF-8 input. That makes it hard to auto-detect what they are doing, but should make life easier for them to get things right. A lot of more comprehensive fonts include Greek letters in the main font, so getting the μ right is simple:

  mathsmu = \text{μ},
  textmu  = μ

or for people testing version 2 of siunitx:

  maths-micro = \text{μ},
  text-micro  = μ

There may be a bit of testing required: this will not work if, for example, you are using the Latin Modern font.

Testing MiKTeX 2.8 and TeX Live 2009

Both MiKTeX and TeX Live have new versions in the offing. I’ve been testing out both MiKTeX 2.8 and TeX Live 2009, to keep up to date with what is happening. In the past, I’ve tended to stick with MiKTeX as it is designed for Windows, and so can make some platform-specific decisions and be more focussed. However, the TeX Live team have done a lot of work to make TeX Live usable across platforms, and there are advantages to that approach.

Looking through the feature lists, a lot of the new features are common to the two systems, for example:

  • TeXworks installed as a distribution-maintained editor.
  • XeTeX version 0.9995 (which includes the new primitives that the LaTeX3 team asked for).
  • Some \write18 functions enabled without turning on full \write18 support: this is used to allow “safe” functions.

There are, of course, also differences. For example, only TeX Live includes LuaTeX at present. I also notice that MiKTeX 2.8 is adding the full path of files to the log, whereas in the past you got the relative path. I’m not so sure this is a good idea: it makes things rather wordy, and also the log will vary between systems: not so great. On the other hand, MiKTeX 2.8 does provide user-specific texmf directories. For multi-user systems, this is a real bonus: you can use the auto-install system without needing to be the Administrator.

As I said, I’ve tended to use MiKTeX to date as it’s been the best “fit” on Windows. The latest version of TeX Live makes this a pretty tight call, I think. If you are happy installing a full TeX system (which I do), then there is very little in it. MiKTeX still has the edge for small installations, as the auto-install system really pays off there.

Regular expressions

Regular expressions are very popular as a quick and powerful way to carry out searches and replacements in text of all sorts. Traditionally, TeX handles tokens and not strings or characters. This means that doing regex searches using TeX82 is pretty much impossible. To solve this, recent versions of pdfTeX adds the \pdfmatch primitive to allow real string matching inside TeX. The LuaTeX team have decided not to take all of the existing “new” primitives forward from pdfTeX, and as I understand it \pdfmatch will not be implemented in LuaTeX. However, Lua itself has regular expression matching, and so the functionality will still be around.

I’ve recently talked about adding new primitives to XeTeX, and you’ll see that \pdfmatch was not on the list for adding to XeTeX. The reason is that a XeTeX implementation would have to be slightly different from pdfTeX, as it is natively UTF-8, but also would be different to LuaTeX, as it would still be a TeX primitive and not a Lua function. So here “the prize wasn’t worth the winning”, in my opinion. As it is, using \pdfmatch is not widespread, and the idea of having three different regex methods inside TeX didn’t seem like a great idea!

Talking of regex implementations, I’ve been reading Programming in Lua, and also working with TeXworks to try to get syntax highlighting the way I like it. Both systems are slightly different, and it seems both are different from the Perl implementation. It seems that every time you want to use a regex system you have to read the manual to see which things are different from every other implementation!

More on XeTeX primitives

There has been a bit more work on the idea of adding primitives to XeTeX to match those available in pdfTeX.The list of pdfTeX primitives which look interesting has grown slightly, and now reads:

  • \ifincsname
  • \ifpdfprimitive
  • \pdfprimitive
  • \pdfshellescape
  • \pdfstrcmp

At the same time, it would be useful to include the “extended” version of \vadjust which pdfTeX makes available. This is something that has been asked about before, and as with the rest of the changes the main issue is not the idea of doing it but the time for actual implementation.

The real need to have \pdfstrcmp available for LaTeX3 work means that some effort has actually gone into this. I’ve got no experience with either Pascal or the WEB format, but I’ve managed but dint of determination to get something passable to Jonathan Kew. There will need to be some adjustments, as XeTeX works with UTF-8 internally, which pdfTeX does not do. However, I’m hopeful that we will see new primitives in XeTeX soon.

Quite how the primitives will be named is still to be decided. The existing \pdf... naming does not really make sense with these non-PDF related functions. So they could end up as \XeTeX... or may just be given generic names. I’m leaving that to Jonathan!