Reworking and exposing siunitx internals

I’ve been talking for a while about working on a new major version of siunitx. I’ve got plans to add some new features which are difficult or impossible to deliver using the v2 set up, but here I want to look at perhaps what’s more important: the back end, programming set up and related matters.

I’ve now made a start on the new code, working first on what I always think of as the core of siunitx: the unit processor. If you take a look at the new material and compare it with the existing release the first thing that should be obvious is that I’ve finally made a start on splitting everything up into different sub-parts. There are at least a couple of reasons for this. First, the monolithic .dtx for v2 is simply too big to work with comfortably. More importantly, though, the package contains a lot of different ideas and some of them are quite useful beyond my own work. To ensure that these are available to other people, it would seem best to make the boundaries clear, and separate sources helps with that.

That leads onto the bigger picture change that I’m aiming for. As regular readers will know, I wrote the first version of siunitx somewhat by accident and in an ad hoc fashion. Working on v2, I decided to make things more organised and also to use expl3, which I’d not really looked at before. So the process of writing the second version was something of a learning experience. At the same time, expl3 itself has firmed up a lot over the time I’ve been working with it. As such, the current release of siunitx has rather a lot of rough edges. In the new code, I’m working from a much firmer foundation in terms of conventions, coding ideas and testing implementations. So for v3 I’m aiming to do several things. A key one for prospective expl3 programmers is the idea of defined interfaces. Rather than making everything internal, this time I’m documenting code-level access to the system. That means doing some work to have clearly defined paths for information to pass between sub-modules, but that’s overall a good thing. I’m also using the LaTeX3 teams new testing suite, l3build, to start setting up proper code tests: these are already proving handy.

The net result of the work should be a better package for end users but also extremely solid code that can be used by other people. I’m also hopeful that the ideas will be usable with little change in a ‘pure’ LaTeX3 context. Documenting how things work might even have a knock-on effect in emulating siunitx in say MathJax. Beyond that, I’ve viewed siunitx as something of a sales pitch for expl3, and providing a really top-class piece of code is an important part of that. If I can get the code level documentation and interfaces up to the standard of the user level ones, and improve the user experience at the same time, I think I’ll be doing my job there.

Biblatex: more versatile shorthand lists

One of the useful features of biblatex is shorthands, which can be defined in a BibTeX database and listed in the bibliography. A long-standing request has been to make this even more powerful by allows several different kinds of shorthand, for example for journal abbreviations, general abbreviations, etc. This ability has now been added to the development version of the package, by generalising shorthands to ‘biblists’. Of course, new features always need testing, so it would be great if interested users would grab the code and try it out!

Work on siunitx v3

I recently posted a few ‘notes to myself’ about future directions in siunitx development. With them down in print, I’ve been giving them some serious thought and have made a proper start on work on version 3 of the package. I’m starting where I’m happiest: the unit parser and related code, and am working on proper separation of different parts of the code. That’s not easy work, but I think it should give me a good platform to build on. I’m also working hard to make the new code show ‘best practice’ in LaTeX3 coding: the plan is to have much richer documentation and some test material to go with the new code. Looking forward, that should make creating a ‘pure’ LaTeX3 units module pretty easy: it will be a minor set of edits from what I’m working on now.

I’ve got a good idea of the amount of work I need to do: there are about 17k lines in the current siunitx.dtx, which comes out to around 7.5k lines of code. That sounds like a lot, but as much of what I need to do is more editing that writing from scratch I’m hoping for an alpha build of version 3 some time this summer.

siunitx development: Notes for my future self

Working on siunitx over the years, I’ve learnt a lot about units and how people want to typeset them. I’ve also realised that I’ve made a few questionable choices in how I’ve tackled various problems. With one eye to future LaTeX3 work, and another to what I might still improve in siunitx, I thought it would be interesting to make a few notes about what I’ve picked up.

  1. Sticking to math mode only would be a good idea. The flexibility that the package offers in terms of using either math or text mode is very popular, but it makes my life very tricky and actually makes some features impossible (TeX only allows us to reliably escape from math mode by using a box). It’s also got some performance implications, and the more I’ve thought about it, the more I realise that it was probably not the best plan to allow both choices.

  2. A different approach to the \boldmath issue would be sensible. Currently, one of the reasons I use a switch from text to math mode internally is that it allows ‘escaping’ from \boldmath. However, that’s probably not the best plan, as it again introduces some restrictions and performance hits, and I think is very unlikely to actually be helpful!

  3. The default number parser should be simple and quick: complex parsing should be an option. As it stands, the parser in siunitx is quite tricky as it does lots of things. A better approach would be to only deal with digit separation ‘out of the box’ (so not really parsing at all), and to allow things like uncertainties, complex numbers and the like as add-ons.

  4. Tables really need a separate subpackage. Dealing with tables of numbers was never really my aim, and I think much clearer tack would be to have some of the internals of the number parse accessible ‘publicly’, then build the table functionality up separately.

  5. The unit parser works very well: don’t change it! Although people ask me mainly about numbers and tables, the real business end of siunitx is the unit parser/formatter. It’s basically spot-on, with only really minor tune-ups needed.

Probably most of this has to wait for a ‘real’ LaTeX3 numbers/units bundle: I can’t break existing documents. However, I’ve got a few ideas which can be implemented when I get the time: watch this space.

The +-overlay syntax and \pause in beamer

In a recent post I looked at how to use the + syntax to create flexible overlays in beamer. The key concept of that syntax is to allow dynamic slides to be created without having to hard-code slide numbers. The classic example is to reveal a list an item at a time:

\begin{frame}
  \begin{itemize}
    \item<+-> This is on the first and all following slides
    \item<+-> This is on the second and all following slides
    \item<+-> This is on the third and all following slides
    ...
  \end{itemize}
\end{frame}

As I discussed in the earlier post, this is a very powerful way to create overlays (dynamic slides from the same frame source). However, a classic problem people have is combining this with the \pause command. For example, the following creates four slides:

\begin{frame}
  \begin{itemize}
    \item<+-> This is on the first and all following slides
    \item<+-> This is on the second and all following slides
    \item<+-> This is on the third and all following slides
    ...
  \end{itemize}
  \pause
  Text after the list
\end{frame}

Why? If you read the beamer manual, it’s all about the value of \beamerpauses, but if we skip to the key point, you should not use \pause on the same slide as <+-> (or similar).

Beyond the power of \pause: \onslide

The reason people get into trouble is I think because they imagine \pause as the best way to break ‘running text’ in a frame into overlays. However, \pause is really just the most basic way of breaking up frames and is meant just for the simplest cases

\begin{frame}
  Some content
  \pause
  Some more content
\end{frame}

The moment you introduce other dynamic behaviour, you need more control than \pause offers. Indeed, this is pretty clear in the beamer manual: what people are actually looking for is \onlside.

Unlike \pause, which only knows some basic stuff about slide numbers, \onslide works with the full power of the flexible overlay specification (indeed, an overlay specification is required). So to get text after a list, what is needed is

\begin{frame}
  \begin{itemize}
    \item<+-> This is on the first and all following slides
    \item<+-> This is on the second and all following slides
    \item<+-> This is on the third and all following slides
    ...
  \end{itemize}
  \onslide<+->
  Text after the list
\end{frame}

As we are then using the special + syntax for all of the overlays, everything is properly tied together and will give the (probably) expected result: three slides.

The beamer manual covers other more complex effects using \only, \uncover, \alt and so on, but using \onslide you can do everything you think you can do with \pause but actually have it work when using the + syntax on the slide too!

beamer development

As many readers will know, I’m a member of the two-man team in charge of maintaining the beamer class (the other team member is Vedran Miletić). Vedran and I took over looking after beamer when it was unmaintained and some important bugs cropped up: lots of people rely on it, so fixes were important. There was a question on the TeX Stackexchange site recently asking about the status of maintenance. It’s a tricky one to tackle in a Q&A, but it does make a good topic for the blog!

Vedran and I are committed to keeping beamer working, and that means fixing bugs as and when we can. At the same time, we are not likely to add much in the way of new feature: small changes over time only. There are a few reasons for that, the single biggest one of which is stability. The beamer class is very widely used, and does a lot of stuff. Making significant changes is therefore tricky, particularly as we don’t have any automated tests. The internal beamer structure contributes a bit here: it’s a complex set up, partly due to some issues in LaTeX2e (why I work on LaTeX3), partly because it has to be and partly as an ‘overhaul’ might have been useful at some stage. (It’s far too late for the latter idea now: any big change would break too many documents.)

The second issue is of course time: both Vedran and I are busy, in my case not only with ‘real life’ but also with other (La)TeX projects! Then of course there is trying to stick to what beamer does: the original design quite deliberately doesn’t do some things, so as ‘auto-flowing’ text.

If you watch the BitBucket site for beamer development, you will see changes, both to fix bugs and (slowly) add new features. That’s not about to change: small changes, ‘little and (relatively) often’, are the order of the day here. Of course, if you have a patch you really want applying, we are always happy to take a look!

The beamer slide overlay concept

There was a question recently on the TeX StackExchange site about the details of how slide overlays work in the beamer class. The question itself was about a particular input syntax, but it prompted me to think that a slightly more complete look at overlays would be useful.

A word of warning before I start: don’t overdo overlays! Having text or graphics appear or disappear on a slide can be useful but is easy to over-use. I’m going to focus on the mechanics here, but that doesn’t mean that they should be used in every beamer frame you create.

Overlay basics

Before we get into the detail of how beamer deals with overlays, I’ll first give a bit of background to what they are. The beamer class is built around the idea of frames:

\begin{frame}
  \frametitle{A title}
  % Frame content
\end{frame}

which can produce one or more slides: pages of output that will appear on the screen. These separate slides within a frame are created using overlays, which is the way the beamer manual describes the idea of having the content of individual slides varying. Overlays are ‘contained’ within a single frame: when we start a new frame, any overlays from the previous one stop applying.

The most basic way to create overlays is to explicitly set up individual items to appear on a particular slide within the frame. That’s done using the (optional) overlay argument that beamer enables for many document components: this overlay specification is given in angle brackets. The classic example is a list, where the items can be made to appear one at a time.

\begin{frame}
  \begin{itemize}
    \item<1-> This is on the first and all following slides
    \item<2-> This is on the second and all following slides
    \item<3-> This is on the third and all following slides
    ...
  \end{itemize}
\end{frame}

As you can see, the overlay specification here is simply the first slide number we want the item to be on followed by a - to indicate ‘and following slides’. We can make things more specific by giving only a single slide number, giving an ending slide number and so on.

\begin{frame}
  \begin{itemize}
    \item<1> This is on the first only
    \item<-3> This is on the first three slides
    \item<2-4,6> This is on the second to fourth slides and the sixth slide
  \end{itemize}
\end{frame}

The syntax is quite powerful, but there are at least a couple of issues. First, the slide numbers are hard-coded. That means that if I want to add something else in before the first item I’ve got to renumber everything. Secondly, I’m having to repeat myself. Luckily, beamer offers a way to address both of these concerns.

Auto-incrementing the overlay

The first tool beamer offers is the the special symbol + in overlay specifications. This is used as a place holder for the ‘current overlay’, ans is automatically incremented by the class. To see it in action, I’ll rewrite the first overlay example without any fixed numbers.

\begin{frame}
  \begin{itemize}
    \item<+-> This is on the first and all following slides
    \item<+-> This is on the second and all following slides
    \item<+-> This is on the third and all following slides
    ...
  \end{itemize}
\end{frame}

What’s happening here? Each time beamer finds an overlay specification, it automatically replaces all of the + symbols with the current overlay number. It then advances the overlay number by 1. So in the above example, the first + is replaced by a 1, the second by a 2 and the third by a 3. So we get the same behaviour as in the hard-coded case, but this time if I add another item at the start of the list I don’t have to renumber everything.

There are of course a few things to notice. The first overlay in a frame is number 1, and that’s what beamer sets the counter to at the start of each frame. To get the second item in the list to appear on slide 2, we still require an overlay specification for the first item: although I used one, I could have skipped the <1-> in the hard-coded example and nothing would have changed. The second point is that every + in an overlay specification gets replaced by the same value. We’ll see later there are places you might accidentally add a + to mean ‘advance by 1’: don’t do that!

Reducing redundancy

Using the + approach has made our overlays flexible, but I’ve still have to be repetitive. Handily, beamer helps out there too by adding an optional argument to the list which inserts an overlay specification for each line:

\begin{frame}
  \begin{itemize}[<+->]
    \item This is on the first and all following slides
    \item This is on the second and all following slides
    \item This is on the third and all following slides
    ...
  \end{itemize}
\end{frame}

Notice that this is needs to be inside the ‘normal’ [ ... ] set up for an optional argument. Applying an overlay to every item might not be exactly what you want: you can still override individual lines in the standard way.

\begin{frame}
  \begin{itemize}[<+->]
    \item This is on the first and all following slides
    \item This is on the second and all following slides
    \item This is on the third and all following slides
    \item<1-> This is on the first and all following slides
    ...
  \end{itemize}
\end{frame}

Remember not to overdo this effect: just because it’s easy to reveal every list line by line doesn’t mean you should!

Repeating the overlay number

The + syntax is powerful, but as it always increments the overlay number it doesn’t allow us to remove the hard-coded numbers from a case such as

\begin{frame}
  \begin{itemize}
    \item<1-> This is on the first and all following slides
    \item<1-> This is also on the first and all following slides
    \item<2-> This is on the second and all following slides
    \item<2-> This is also on the second and all following slides
    ...
  \end{itemize}
\end{frame}

For this case, beamer offers another special symbol: ..

\begin{frame}
  \begin{itemize}
    \item<+-> This is on the first and all following slides
    \item<.-> This is also on the first and all following slides
    \item<+-> This is on the second and all following slides
    \item<.-> This is also on the second and all following slides
    ...
  \end{itemize}
\end{frame}

What happens here is that . can be read as ‘repeat the overlay number of the last +‘. So the two + overlay specifications create two slides, while the two lines using . in the specification ‘pick up’ the overlay number of the preceding +. (The beamer manual describes the way this is actually done, but I suspect that’s less clear than thinking of this as a repetition!)

Depending on the exact use case, you might want to combine this with the ‘reducing repeated code’ optional argument, with <.-> as an override.

\begin{frame}
  \begin{itemize}[<+->]
    \item This is on the first and all following slides
    \item<.-> This is also on the first and all following slides
    \item This is on the second and all following slides
    \item<.-> This is also on the second and all following slides
    ...
  \end{itemize}
\end{frame}

Offsets

A combination of + and . use can be used to convert many ‘hard-coded’ overlay set ups into ‘relative’ ones, where the slide numbers are generated by beamer without you having to work them out in advance. However, there are cases it does not cover. To allow even more flexibility, beamer has the concept of an ‘offset’: and adjustment to the number that is automatically inserted. Offset values are given in parentheses after the + or . symbol they apply to, for example

\begin{frame}
  \begin{itemize}
    \item<+(1)-> This is on the second and all following slides
    \item<+(1)-> This is on the third and all following slides
    \item<+-> This is also on the third and all following slides
  \end{itemize}
\end{frame}

Notice that in this adjustment only applies to the substitution, so both the second and third lines above end up as <3-> after the automatic replacement. If you try the demo, you’ll also notice that none of the items appear on the first slide!

Perhaps a more realistic example for where an offset is useful is the case of revealing items ‘out of order’, where the full list makes sense in some other way. With hard-coded numbers this might read

\begin{frame}
  \begin{itemize}
    \item<1-> This is on the first and all following slides
    \item<2-> This is on the second and all following slides
    \item<1-> This is on the first and all following slides
    \item<2-> This is on the second and all following slides
    ...
  \end{itemize}
\end{frame}

which can be made ‘flexible’ with a set up such as

\begin{frame}
  \begin{itemize}
    \item<+-> This is on the first and all following slides
    \item<+-> This is on the second and all following slides
    \item<.(-1)-> This is on the first and all following slides
    \item<.-> This is on the second and all following slides
    ...
  \end{itemize}
\end{frame}

or the equivalent

\begin{frame}
  \begin{itemize}
    \item<+-> This is on the first and all following slides
    \item<.(1)-> This is on the second and all following slides
    \item<.-> This is on the first and all following slides
    \item<+-> This is on the second and all following slides
    ...
  \end{itemize}
\end{frame}

As shown, we can use both positive and negative offsets, and these work equally well for + and . auto-generated values. You have to be slightly careful with negative offsets, as while beamer will add additional slides for positive offsets, if you offset below a final value of 0 then errors will crop up. With this rather advanced set up, which version is easiest for you to follow will be down to personal preference.

Notice that positive offsets do not include a + sign: remember what I said earlier about all + symbols being replaced. If you try something like <+(+1)>, your presentation will compile but you’ll have a lot of slides!

Summary

The beamer overlay specific can help you set up complex and flexible overlays to generate slides with dynamic content. By using the tools carefully, you can make your input easier to read and maintain.

Beamer and \subsubsection

I’m hoping to address a few bugs in beamer over the next few days. One category that is always tricky is things linked to using \subsubsection. If you’ve read the beamer manual carefully, you’ll know that the original author of the class really didn’t want people to use \subsubsection in talks. However, he also didn’t ban it entirely, leaving me with a tricky situation. The problem is that while \subsubsection works, many of the things you might expect to happen from the relationship between \section and \subsection fail with \subsubsection, and from the code that may well be ‘by design’. Of course, I can change the ‘rules’, but beamer has been around a long time and it’s also somewhat complex code. As such, I’m always having to make judgements on how to deal with these bugs. My advice: don’t use \subsubection in beamer documents! Certainly don’t be surprised if when you ignore that advice odd things happen.

Extending biblatex to support multiple scripts

As regular readers will know, I’ve taken an interest in biblatex since it was first developed. Since the original author disappeared, I’ve been at least formally involved in maintain the code. So far, that’s been limited to tackling a few tricky low-level TeX issues, but there are some bigger issues to think about.

Philip Kime, lead Biber and biblatex developer, is keen to extend the LaTeX end to supporting multiple scripts. The Biber end is already done (in the ‘burning edge’ version), and writes to the .bbl file in the format:

 \field{form=original,lang=default}{labeltitle}{Title}
 \list{form=original,lang=default}{location}{1}{%
   {Москва}%
 }
 \list{form=romanised,lang=default}{location}{1}{%
   {Moskva}%
 }

However, that presents a big issue: how to do that without breaking every existing style. Supporting scripts means we need an additional argument for a very large number of commands: some of them need to have two optional arguments, and some of them need to be expandable:

\iffieldundef[form=original,lang=default]{....}

Reading (two) optional arguments and working through keyval options expandably is tricky, which is where I come in. The natural way for me to solve the first problem is to use LaTeX3, and the xparse package. However, that’s a big change for biblatex, so before I (and the rest of the biblatex team) go for this I though it would be worth raising the issue and looking for opinions. The alternative is to write the code into biblatex directly, but it’s complicated and as I’ve already done the job once I’m reluctant to do this!

So, what I want to know is ‘What do users think?’ Is it reasonable to require xparse as part of `biblate