siunitx performance

I had an e-mail today about using siunitx when there are a lot of calls to the package. As you might expect, things can get a bit slow, and the person who contacted me felt that things get rather too slow. There are differences between the current release version of siunitx and the development code (version 2), and I’ve also added a few features to help speed things up where appropriate using version 2. So I thought I’d put a bit of information on the comparison in the public domain.

First, a baseline is not to use siunitx at all, and to simply test everything by hand. For that, I tried the simple test file:

\documentclass{article}
\usepackage{siunitx,xparse}
\ExplSyntaxOn
\DeclareDocumentCommand \repeated { m m }{
  \prg_replicate:nn {#1} {#2}
}
\ExplSyntaxOff
\begin{document}

\repeated{10000}{$1.23\,\text{m}$ }

\end{document}

This repeats the same text 10 000 times: boring but handy for testing. Using the command-line time program, I get an overall time of 1.714 s for this.

A very slight change of the file lets me test with siunitx version 1:

\documentclass{article}
\usepackage{siunitx,xparse}
\ExplSyntaxOn
\DeclareDocumentCommand \repeated { m m }{
  \prg_replicate:nn {#1} {#2}
}
\ExplSyntaxOff
\begin{document}

\repeated{10000}{\SI{1.23}{\metre} }

\end{document}

With the latest release version of siunitx (v1.3g), I get a time of 80.878 s for this on the same system.

In siunitx version 2, I’ve recoded all of the loops and parsing code, and so things are faster using the standard settings: 58.944 s. With the very latest code (SVN 243), I’ve included two options to make things move faster: parse-numbers and parse-units. Of course, these do mean that you get less of the power of siunitx. But for many people they might be useful. Turning both parsing systems off, the time needed for the test file drops to 14.975 s (just turning off the number parser gives a time of 18.803 s).

I may take another look at trying to improve the performance of the number parser. The problem is at least in part that making the code faster will either mean making some of it less powerful or, more likely, a lot harder to read and maintain. I hope that for most people, most of the time, the performance is acceptable. Of course, at some point I’ll try to do some Lua-based code for the parsers, at least. But that won’t help for most users now.

Ultimately, there is a limit to how fast things can work. Whether the performance hit of using siunitx is worthwhile is something is down to users. I think it’s worth it, as the better logic in the mark-up more than makes up for the extra time required. But then I would say that!

siunitx: Getting the micro symbol right

I get a few e-mails about siunitx and the micro symbol. People tend to be surprised that the symbol ‘sticks’ to a look very much like Computer Modern. The reason is that picking a proper upright (not italic) μ is not so easy in TeX. You don’t get one in Computer Modern, so siunitx takes one from the TS1 (text support) set in the absence of a better plan. I’ve set up some auto-detection for a few obvious alternatives (such as the upgreek package), but that doesn’t really work for XeTeX users.

XeTeX users are likely to load system fonts, and I’d hope be using UTF-8 input. That makes it hard to auto-detect what they are doing, but should make life easier for them to get things right. A lot of more comprehensive fonts include Greek letters in the main font, so getting the μ right is simple:

\sisetup{
  mathsmu = \text{μ},
  textmu  = μ
}

or for people testing version 2 of siunitx:

\sisetup{
  maths-micro = \text{μ},
  text-micro  = μ
}

There may be a bit of testing required: this will not work if, for example, you are using the Latin Modern font.

TeXworks v0.3 snapshot

Jonathan Kew has posted new ‘snapshots’ for the experimental (v0.3) trunk of TeXworks. As usual, these are for Windows and the Mac, with Linux users to compile themselves from the SVN (not usually difficult). Looking through the change list, it looks like mainly small refinements rather than any big changes. Everything looks like it’s working, which is the main thing!

MathTime Lite fonts for free

A while ago, I had some interaction with the people at PCTeX. They let me have a copy of MathTime Lite for testing, which was very good of them. I now see that they’ve extended this to the wider world: MathTime Lite is now available for free. I’m sure there is good commercial sense to this decision (you still have to pay for the Pro version), but for people who want maths fonts beyond Computer Modern it’s certainly worth knowing.

siunitx version 2: snapshot four

I’m continuing to work on version 2 of siunitx, and the code has now reached the point where the basic macros (\num, \SI, \si, \numrange and \SIrange) work at least as well as the version one code. There are probably still some bugs, but I’m using the new code for my own work and at the moment all seems good. The internal improvements mean that while there are still things to add this should not be too hard.

If you want to try things out, as before you can grab things here:

What is on the list next is tackling the tables issues. That is going to be hard work, as there are some complicated things to sort out. So I will probably add a few bits and pieces to the rest of the code at the same time.

As always, feedback by e-mail or to the BerliOS site is very welcome.

Active characters again

A while ago I wrote about avoiding active characters. There was a question on the LaTeX3 mailing list recently, where this came up again. So I thought I’d talk about it again here.

ε-TeX provides the primitive \scantokens, which can be used to re-assign the category codes of (most) input. This can be used to make some tokens in the input active, and then swap them for something else. For example:

\begingroup
  \catcode`\:=13\relax
  \gdef\example#1{%
    \begingroup
      \catcode`\:=13\relax
      \def:{[colon]}%
      \xdef\temp{\scantokens{#1}}%
    \endgroup
    \temp
  }

This will replace every “:” in #1 with “[colon]”. As this is done by the engine, it is pretty fast. With the characters only made active locally, it also looks safe. However, I’ve found that this does not necessarily follow. For example, in siunitx (version 1), there is a problem using htlatex under some circumstances because both want to make ^ active in this way. The other problem is that making characters active in this way makes it impossible to “protect” them from replacement.

The alternative is to look through the input for each “:” and replace it one at a time: this is done in LaTeX3 using \tl_replace_all_in:Nnn. At first sight, this does not look desirable as it is never going to be as fast as using TeX primitives. However, if the code is well written (and \tl_replace_all_in:Nnn certainly is), then there is no need to loop over every token to do the replacement. Whatever code is used for the replacement, the key advantage is that there is no chance of a clash with different packages doing the same thing. It also leaves open the possibility of protecting some tokens from being changed. So I’d always favour avoiding active characters, if at all possible.