Fixing LaTeX2e

When LaTeX2e was first released in 1994 a lot of work had been done to avoid breaking existing LaTeX2.09 documents but allowing changes such as the package and font selection systems. The stability of LaTeX as demonstrated by that approach is one reason it’s been a success. However, there is also a need to allow for change: the world does not stand still. While the LaTeX2e kernel is not about to alter radically, the team are looking to address some areas where the needs today mean that change (or at least adaptation) is the right approach. David Carlisle talked about this at the UK-TUG meeting in November: here I’m going to try to look at the same issues in my own way. An important note before I start: the fixes I’m talking about here are all important but they are not about to change LaTeX2e into something else!

Kernel modifications

Over the years various bugs and issues have come up in the LaTeX2e kernel. Out-and-out bugs get fixed, but issues which are more about ‘code design’ are more tricky. There’s a tension between sorting these out and having the kernel ‘stable’, so not altering existing documents at all. The approach the team have taken to this to date is a package called fixltx2e. It contains ideas that really should go into the kernel but haven’t as they might alter existing documents. The idea is then that most people should really use these fixes in the form

\RequirePackage{fixltx2e}
\documentclass...

The problem: most people don’t do that, or load fixltx2e half-way through a preamble, or use it with packages that were not tested both with and without the fixes. That’s not a great position.

What we are looking at now is moving to a situation where the fixes are in the kernel as standard but with a mechanism to back them out. The details still need to be finalised, but the general plan is that once we make the change people will get the fixes without needing to take any action. If a document really has to be completely unchanged we’ll provide an ‘undo’ package with a way of setting the date that the kernel should be rolled-back to: that way you’ll be able to say ‘I always want the kernel as it was on … even if any fixes at all are made later’. We hope that will be a good balance.

Register allocation

Classical TeX provides 256 registers of each type. That limit was raised by the e-TeX extensions, which were finalised in 1999 and give us 32768 of the main register types (more on that nuance in a bit). While the team have used the extensions for many years in some packages, the LaTeX kernel itself still uses the classical TeX allocation system. That means that you can run into the

No room for a new ....

error even though there is lots of space. Loading the etex package

\RequirePackage{etex}
\documentclass...

modifies the allocation system to use those extra registers, but a lot of non-expert users don’t know this. So again we have a situation where a change in the kernel is the best plan.

What we are looking at here is what is the obvious solution: extending the register allocators in the LaTeX2e kernel ‘out of the box’ as long as the e-TeX extensions are available. That should be a transparent change for almost everyone, and will still allow etex to be loaded.

One minor wrinkle is inserts. e-TeX doesn’t extend how many inserts TeX has: there are still only 256. LaTeX2e doesn’t actually need many inserts as floats are handled without them (or without needing one insert per float), but at present the code for making floats does allocate inserts. The best solution here is to change what the kernel does so it no longer uses \newinsert to make floats: that will let us provide more float storage with basically no ‘cost’.

Unicode Engines

The Unicode engines XeTeX and LuaTeX have been with us for a few years now, and quite a lot of what they need to do at the format level is well-established. At the moment, the format-building routines make some changes ‘around’ the core latex.ltx file to accommodate these requirements: the code supplied by the team doesn’t ‘know’ about these newer engines. We’re therefore looking to address that by adding some conditional code.

The first area to tackle overlaps with the point above: LuaTeX extends the register allocation again beyond e-TeX, while XeTeX needs an allocator for \XeTeXinterchartoks. Both of these can readily be added to an updated allocation system.

The bigger impact of Unicode engines is that they have a different requirement from 8-bit engines in setting up the codes TeX uses for case changing. The LaTeX2e kernel sets up the \lccode and \uccode for the 8-bit range and assumes T1 encoding. With the newer engines, that’s not really great as they use Unicode code points and (almost certainly) Unicode (EU1/2) encodings. The format builders alter these assumptions using something of a hack, so we are looking to add the appropriate conditionals to the format itself. For end users that won’t really show, but it will mean that the format itself will be ‘in control’ here: something we are keen to work on.

LuaTeX extras

As well as the issues it shares with XeTeX, LuaTeX introduces ideas such as Lua callbacks and \attribute allocation. These areas are still somewhat ‘in flux’: the team currently feel that we need to get some consensus from the community (particularly active package authors) before adding anything here. However, it’s important that we get people thinking.

Conclusions

The changes we are looking at for LaTeX2e should help keep things ‘ticking over’ in the kernel will help us keep things working and offer some new abilities to end users. At the same time, they should move more of the kernel people see ‘in the wild’ back into the control of the team: something we are keen on as we need to be able to fix the bugs. We’re hoping to check in the code for these changes soon: expect requests for testing!

TUG Membership

While TeX and all of the supporting ideas are free (both in monetary terms and intellectually), supporting that is a lot of effort from a range of volunteers and hard cash for parts of the infrastructure behind it. A key component of making all of that work is TUG: the worldwide TeX user group. TUG is the central point for co-ordinating a range of activities: running the TUG conference series, supporting TeX development, producing TeX Live and hosting mailing lists, to name a few.

Those of us in TUG have recently had a mail from the President pointing to a slightly concerning trend: a slow but perceptible drop in membership. That doesn’t mean there are fewer TeX users about: the accessibility of modern TeX systems means that there are a lot of TeX users (see for example the popularity of the TeX StackExchange site). That accessibility means that users don’t need to join a user group to use TeX, so there is something of a challenge.

To encourage people to take up membership, and of course take advantage of the benefits, TUG have launched a membership campaign. The aim is to encourage existing members to look out for new recruits, and of course to remind us that TUG is only as strong as its membership. So if you are a member, remind your fellow TeX users to join TUG, and if you are not in TUG: why not?

River Valley videos on the move

Many readers will be familiar with River Valley, a typesetting company with a long-standing interested in TeX and related technologies. One of the things they do is great work videoing meetings in the area of publishing, technology, XML and all kinds of related things. I had an e-mail a couple of days ago from Kaveh Bazargan, the head of River Valley, to let me know that videos are ‘on the move’ to a new site: http://river-valley.zeeba.tv/: I’ll be altering my links in the blog.

LuaTeX: Manipulating UTF-8 text using Lua

Both the XeTeX and LuaTeX engines are natively UTF-8, which makes input of non-ASCII text a lot easier than with pdfTeX (certainly for the programmer: inputenc hides a lot of complexity for the end user!). With LuaTeX, there is the potential to script in Lua as well as program in TeX macros, and that of course means that you might well want to do manipulation of that UTF-8 input in Lua. What might then catch you out is that it’s not quite as simple as all that!

Lua itself can pass around arbitrary bytes, so input in UTF-8 won’t get mangled. However, the basic string functions provided by Lua are not UTF-8 aware. The LuaTeX manual cautions

The string library functions len, lower, sub, etc. are not UNICODE-aware.

As a result, applying these functions to anything outside the ASCII range is not a good idea. At best you might get unexpected output, so

tex.print (string.lower ("Ł"))

simply prints in Ł (with the right font set up). Worse, get an error as for example

tex.print (string.match ("Ł","[Ł]"))

results in

! String contains an invalid utf-8 sequence.

which is not what you want!

Instead of using the string library, the current correct approach here is to use slnunicode. Again, the LuaTeX manual has some advice:

For strings in the UTF-8 encoding, i.e., strings containing characters above code point 127, the corresponding functions from the slnunicode library can be used, e.g., unicode.utf8.len, unicode.utf8.lower, etc.

and indeed

tex.print(unicode.utf8.lower("Ł"))

does indeed print ł. There are still a few things to watch, though. The LuaTeX manual warns that unicode.utf8.find returns a byte range and that unicode.utf8.match and unicode.utf8.gmatch fall back on non-Unicode behaviour when an empty capture (()) is used. Both of those can be be allowed for, of course: they should not be big issues.

There’s still a bit of complexity for two reasons. First, there’s not really much documentation on the slnunicode library, so beyond trying examples it’s not so easy to know what ‘should’ happen. For example, case-changing in Unicode is more complex than a simple one-to-one mapping, and can have language-dependencies. I’ll probably return to that in another post for a TeX (or at least XeTeX/LuaTeX) take on this, but in the Lua context the problem is it’s not so clear quite what’s available! In a way, the second point links to this: the LuaTeX manual tells us

The slnunicode library will be replaced by an internal UNICODE library in a future LuaTeX version.

which of course should lead to better documentation but at the price of having to keep an eye on the situation.

Over all, provided you are aware that you have to think, using Lua with Unicode works well, it’s just that it’s not quite as obvious as you might expect!

A place for (PGF) plot examples

Not content with running The LaTeX Community, TeXample and texdoc.net, blogging on TeX matters and being a moderator on TeX Stack Exchange, Stefan Kottwitz has now started a new site: pgfplots.net. The idea for the new site is simple: it’s a place to collect great examples of plots, (primarily) made using using the excellent pgfplots package. Why do this? Plots are just graphics, but they are a very special form of graphic with particular requirements. As a working scientist, I really appreciate the need for well-presented, carefully-constructed plots: they can make (or break) a paper.

At the moment, the selection of plots is of course quite small: the site is new and the room for ‘artistic’ work is perhaps a little more limited than in the TeXample gallery. I’m sure it will soon grow, and we can all pick up a trick or two! (Don’t worry: there will certainly be a few plots for chemists. Indeed, you might already spot some.)

Presenting visual material

I’ve recently been thinking about how best to present ‘visual’ material: presentations, lectures and academic posters. As a ‘LaTeX regular’ I tend to do that in LaTeX (using beamer), but in many ways what I’ve been thinking about is entirely independent of the tool I use. Looking around the TeX world, I found a couple of very interesting articles in the PracTeX journal, one on posters and one on presentations. As you might expect, they contain some ‘technical’ advice but are worth a read whatever tool you use to make your visual material. (Many people who use LaTeX for articles prefer more visual tools for posters in particular.)

Presentations

The core idea in the paper on presentations is I guess ‘rolling your own’ in terms of producing your slides. On the the authors is Marcus Kohm, so it’s no surprise that there is a strong focus on using KOMA-Script as a general-purpose class, and making design changes to suit the screen rather than print. There are a couple of reasons suggested for doing this. The first is that, like any ‘pre-build’ approach, it’s easy to just use the defaults of say beamer or powerdot and make a presentation that is very similar to lots of other ones. If you do the design yourself as a ‘one off’ that’s much less likely to be a concern. The other argument was that in some ways dedicated presentation classes make it too easy to use lots of ‘effects’ (such as the beamer overlay concept I looked at recently).

I’m not sure I’d want to make my own slides from scratch every time I make a presentation: there are reasons that there are dedicated classes. However, I’d say the points about design and ‘effects’ are both valid. I increasingly use pretty ‘basic’ slides, which don’t have much in the way of ‘fancy’ design or dynamic content, and find that these work just as well if not better than more complex ones. Overlays and similar do have a use, and I do use them when they make good sense, but that’s actually pretty rare.

Posters

The message in the article on posters is in some ways the same as the presentation one: the standard designs don’t work that well. Academic posters tend to be very text-heavy, and a multi-column design with a few small graphics is one you see repeated a lot. The article suggests a radically-different approach: essentially no words and just graphical elements. That’s not necessarily LaTeX’s strength, but the authors do a goo d job using TikZ to showcase their argument.

I’ve never quite had the nerve to make a poster with essentially no text. However, I do see the point that mainly graphical posters in many ways work better than putting your entire paper on the wall. There’s always that worry that once a poster goes up, you can’t be sure you’ll be there to talk to anyone interested and so a few words are in some ways a ‘safety net’.

Conclusion

Both articles give you something to think about. Even if you do all of your slides and posters in visual tools (PowerPoint, KeyNote, Illustrator, etc.), the core messages are still valid. I’d say we can all learn a little here: worth a read!

The LaTeX Companion as an eBook

Many long-term LaTeX users have on their bookcase a copy of The LaTeX Companion, an excellent guide to ways to tackle a wide variety of problems in LaTeX. Having it available electronically has been something that many people have wanted, so I was very pleased when I heard from the lead author, Frank Mittelbach, that this was in the offing. The electronic version, as a PDF or in eBook format (ePub and Mobi), is now available from InformIT, the publisher’s online store.

The price is very reasonable: $23.99, with a discounted price ($14.99) available until the end of the year using code LATEXT2013. For that, you get all three formats in DRM-free form: the PDF is watermarked but otherwise identical to the current print version (the 2nd edition). It’s not a new edition: the (excellent) text is that written by Frank and the rest of the team in 2004. For many purposes, that makes very little difference as LaTeX is generally very stable, but if you are interested in biblatex, TikZ, LuaTeX or other ‘new’ developments in the LaTeX world then perhaps it’s not the book for you.

As the PDF is identical to the print version, it works best on bigger screens where you can give it the full width and size it needs. The eBook forms work better on dedicated readers, but at the cost that the code examples are inserted there as pictures. There’s a good reason for that: only in the PDF is the typography done by TeX, so to see the real results in the eBook forms means that pictures are the only way to go. You get all the internal links you’d expect in all of the formats: the table of contents to chapters, references to the bibliography and so on. Having all three formats for one price means you can both take advantage of the flexibility of eBooks and have a copy with high quality typography all available to you where ever you go. Being electronic, you can also search the text (only the PDF lets you search the examples as only there are they not pictures.)

There’s very little downside to the electronic copy: the cost is good, the restrictions are minimal and the text itself is of course excellent.

LaTeX Tutorial videos from ShareLaTeX

Learning LaTeX without a ‘local guide’ can be a challenge: it’s one of the reasons I’m involved in running training courses for UK-TUG. The people behind ShareLaTeX have decided to make a series of videos aimed at newer LaTeX users, covering the basics of LaTeX use, writing a thesis and also some more advanced topics (TikZ and beamer, for example).

There are currently about 25 videos, and I’ve watched all of the ‘basic’ ones (the LaTeX beginners series and the thesis series). The quality and presentation is pretty good: as well as well produced videos there are also transcripts for all of them, and of course the demos are available on ShareLaTeX. Of course, there are a few things I’d tackle differently, but the overall picture is pretty impressive. They’ve put a lot of work into the videos, and if you work through carefully (and take time to try the demos yourself) them I think you’ll get a good grounding in using LaTeX.

TeX Live 2013 released

Browsing the TeX Live site today, I see that TeX Live 2013 has been released. There are as usual a few changes to note. My highlights:

  • XeTeX now uses the HarfBuzz shaper rather than the older ICU engine (which is no longer being developed): see my earlier post about this change
  • LuaTeX is updated to Lua 5.2 (the latest Lua release)
  • Microtype now supports protrusion in XeTeX and LuaTeX

I’ve been using the pretest version of TeX Live for a while, and am very happy that all seems to be working just fine. Of course, many people will want the DVD version, which will be a while, but for the downloaders you can grab it now.

TeX Welt: A new (German) TeX blog

If you are involved in (La)TeX for any length of time, you notice that using TeX is very popular in German-speaking countries. DANTE, the German-speaking TeX user group, is big, and there are several German-language TeX websites out there. They’ve now been joined by a new German-language TeX blog, TeX Welt. This has popped-up following some discussion on the TeX-sx chat system, and is being hosted by Stefan Kottwitz (a man with a server farm in his house!). Of course, we don’t all speak fluent German, but I’ll certainly be keeping an eye on the new site: always more to learn!