Reworking and exposing siunitx internals

I’ve been talking for a while about working on a new major version of siunitx. I’ve got plans to add some new features which are difficult or impossible to deliver using the v2 set up, but here I want to look at perhaps what’s more important: the back end, programming set up and related matters.

I’ve now made a start on the new code, working first on what I always think of as the core of siunitx: the unit processor. If you take a look at the new material and compare it with the existing release the first thing that should be obvious is that I’ve finally made a start on splitting everything up into different sub-parts. There are at least a couple of reasons for this. First, the monolithic .dtx for v2 is simply too big to work with comfortably. More importantly, though, the package contains a lot of different ideas and some of them are quite useful beyond my own work. To ensure that these are available to other people, it would seem best to make the boundaries clear, and separate sources helps with that.

That leads onto the bigger picture change that I’m aiming for. As regular readers will know, I wrote the first version of siunitx somewhat by accident and in an ad hoc fashion. Working on v2, I decided to make things more organised and also to use expl3, which I’d not really looked at before. So the process of writing the second version was something of a learning experience. At the same time, expl3 itself has firmed up a lot over the time I’ve been working with it. As such, the current release of siunitx has rather a lot of rough edges. In the new code, I’m working from a much firmer foundation in terms of conventions, coding ideas and testing implementations. So for v3 I’m aiming to do several things. A key one for prospective expl3 programmers is the idea of defined interfaces. Rather than making everything internal, this time I’m documenting code-level access to the system. That means doing some work to have clearly defined paths for information to pass between sub-modules, but that’s overall a good thing. I’m also using the LaTeX3 teams new testing suite, l3build, to start setting up proper code tests: these are already proving handy.

The net result of the work should be a better package for end users but also extremely solid code that can be used by other people. I’m also hopeful that the ideas will be usable with little change in a ‘pure’ LaTeX3 context. Documenting how things work might even have a knock-on effect in emulating siunitx in say MathJax. Beyond that, I’ve viewed siunitx as something of a sales pitch for expl3, and providing a really top-class piece of code is an important part of that. If I can get the code level documentation and interfaces up to the standard of the user level ones, and improve the user experience at the same time, I think I’ll be doing my job there.

Work on siunitx v3

I recently posted a few ‘notes to myself’ about future directions in siunitx development. With them down in print, I’ve been giving them some serious thought and have made a proper start on work on version 3 of the package. I’m starting where I’m happiest: the unit parser and related code, and am working on proper separation of different parts of the code. That’s not easy work, but I think it should give me a good platform to build on. I’m also working hard to make the new code show ‘best practice’ in LaTeX3 coding: the plan is to have much richer documentation and some test material to go with the new code. Looking forward, that should make creating a ‘pure’ LaTeX3 units module pretty easy: it will be a minor set of edits from what I’m working on now.

I’ve got a good idea of the amount of work I need to do: there are about 17k lines in the current siunitx.dtx, which comes out to around 7.5k lines of code. That sounds like a lot, but as much of what I need to do is more editing that writing from scratch I’m hoping for an alpha build of version 3 some time this summer.

siunitx development: Notes for my future self

Working on siunitx over the years, I’ve learnt a lot about units and how people want to typeset them. I’ve also realised that I’ve made a few questionable choices in how I’ve tackled various problems. With one eye to future LaTeX3 work, and another to what I might still improve in siunitx, I thought it would be interesting to make a few notes about what I’ve picked up.

  1. Sticking to math mode only would be a good idea. The flexibility that the package offers in terms of using either math or text mode is very popular, but it makes my life very tricky and actually makes some features impossible (TeX only allows us to reliably escape from math mode by using a box). It’s also got some performance implications, and the more I’ve thought about it, the more I realise that it was probably not the best plan to allow both choices.

  2. A different approach to the \boldmath issue would be sensible. Currently, one of the reasons I use a switch from text to math mode internally is that it allows ‘escaping’ from \boldmath. However, that’s probably not the best plan, as it again introduces some restrictions and performance hits, and I think is very unlikely to actually be helpful!

  3. The default number parser should be simple and quick: complex parsing should be an option. As it stands, the parser in siunitx is quite tricky as it does lots of things. A better approach would be to only deal with digit separation ‘out of the box’ (so not really parsing at all), and to allow things like uncertainties, complex numbers and the like as add-ons.

  4. Tables really need a separate subpackage. Dealing with tables of numbers was never really my aim, and I think much clearer tack would be to have some of the internals of the number parse accessible ‘publicly’, then build the table functionality up separately.

  5. The unit parser works very well: don’t change it! Although people ask me mainly about numbers and tables, the real business end of siunitx is the unit parser/formatter. It’s basically spot-on, with only really minor tune-ups needed.

Probably most of this has to wait for a ‘real’ LaTeX3 numbers/units bundle: I can’t break existing documents. However, I’ve got a few ideas which can be implemented when I get the time: watch this space.

siunitx: v2.5 and beyond

Anyone who watches the BitBucket site for siunitx development will have noticed that I’ve been adding a few new features. As I’ve done for every release in the 2.x series, new options means a new minor revision, and so these will all be in v2.5. I’ve also revised some of the behaviour concerning math mode, so there are now very few options which automatically assume math mode.

Looking beyond v2.5, I have some bigger changes I’d like to make to siunitx. When I initially developed the package, it was very much a mixture of things that seemed like a good idea. The work for version 2 meant a lot of changes, and a lot more order. However, I’ve learnt more about units, LaTeX and programming since then, and that means that there are more changes to think about.

The internal structure is quite good, but I need to work on some parts of the code again. For users, of course, that won’t show up, but it is important to me. It’s also not so straight-forward: the .dtx is about 17 000 lines long! However, there are also some issues at the user level. In particular, I think I’ve offered too many options in some areas, for example font selection. Revising those will alter behaviour, but it will also improve performance and the clarity of some edge cases. However, that is not such easy work and will take a while. I’ve got lots of other TeX commitments (plus of course a life beyond LaTeX), so these changes will wait a while yet. So once v2.5 is finalised I’d expect to have little change in siunitx for some time: probably until at least the autumn, and quite possibly the end of the year.

siunitx v2.4 beta

Development of the next release of siunitx has gone quite smoothly: I’ve added a few new features, and there is now nothing outstanding for v2.4. So it is time to ask for some volunteers to test the code.

In terms of new features, I have added the a choice of rounding modes modes the ability to compress down exponents in ranges and lists, both long-standing feature requests. In response to a recent TeX.sx question, siunitx can now also turn exponents into unit prefixes. At a lower level, I’ve also altered some of the options internally so fewer of the assume math mode.

To test, please download the ready to install TDS-style .zip file and install it locally. You should then be good to go. Feedback as a bug report or by e-mail welcome, as always. Assuming there are no problems, I’d expect to upload to CTAN by the end of the month.

Which siunitx options to set globally?

On the TeX.SX site recently, there was some discussion about locally over-riding the round-mode = places setting in my siunitx package. One thing this highlights for me is the need to think about which settings to apply globally.

Some siunitx settings are about consistency of appearance, and seem to apply naturally to entire documents. A classic example would be output-decimal-marker: if you are using , as a decimal marker, it should apply everywhere!

However, this is not so clear-cut for many of the options related to number-manipulation. The rounding options in particular are really intended for the case where you have some auto-generated data (say a long list from an instrument), and the real accuracy is not as great as the apparent precision. Instruments are great at providing lots of numbers, but it takes a bit of human thought to decide how many of these are really relevant. So for these cases, setting an appropriate rounding scheme is perfectly sensible.

On the other hand, for a number you’ve typed in yourself I’d hope that you’ve done the thinking part when the number is typed, so rounding by the computer is not needed. That suggests to me that most of the time rounding should not be set as a global option.

Of course, it will depend on the exact nature of the document in question. If all of the data in a document is in tables, all of which need rounding, then there is a performance gain from setting the rounding once globally. So the best I can say, guidance-wise, is ‘think about your document’!

A roadmap for siunitx

My siunitx package continues to attract feature requests, even though for me and many people it is essentially ‘feature complete’. Many of these requests are as a result quite complex, and probably somewhat esoteric, but I do aim to be as accommodating as I can. At the same time, I want to avoid breaking things for the majority. So my approach is to take a few issues at a time and to work on them for each 2.x release, plus of course the regular bug fixes. At the same time, I’m keen to review my own work and to tighten up on some parts of the code. That’s particularly important in adding new features, as there are places where it turns out a bit more structure is needed.

For version 2.3, I’m focussing on the mechanics of tabular alignment. This entire part (around 1500 lines of code) is going to be rewritten, hopefully making things more reliable, improving performance and also ease of maintenance. For version 2.4, I’ll probably look again at how tablular material is collected up by siunitx,  and also at the parser for numbers: I’ve had some ideas to improve performance. Beyond that, I have a few general thoughts, for example the idea of ‘multiple error‘ parsing: that looks tricky!

Anther area to work on is the option interface. Some of these are not perfect, and v2.3 will see some revisions. I have further revisions already in mind, but don’t want to do too much at once so again have some ideas for v2.4. At the same time, I think there are some performance enhancements available by recoding parts of the option system. I’ll be doing that as part of the general review, and so again we should see evolution in that area. I’ll return to this general issue in another post.

That is quite a list, and so I’ll certainly be kept busy. I have other things to do as well, both in the TeX world and outside, so at the moment my thinking is v2.3 in July, with v2.4 in the autumn (perhaps October or November) and v2.5 vaguely pencilled in for ‘first half of 2012’. Hopefully that will keep everyone happy.

siunitx v2.3: consolidation

I’m making a start on the next release of siunitx: v2.3. There are a number of issues in the database targeting this release, and these are mainly about dealing with things behind the scenes. Some options need revision, and I need to improve the table code somewhat. However, I doubt that there will be much to excite users. That’s not necessarily a bad thing: there seem to be a lot of siunitx users, and I don’t want to break the code! Of course, if there is a particular issue that needs addressing then the usual rule applies: make a case to me and I’ll see what I can do.

siunitx v2.2 released

As I detailed a little while ago, I’ve been working on v2.2 of siunitx. I’ve now released the latest version, v2.2, to CTAN. There are a number of small changes, introducing new features, but I thought I would highlight a few.

A long-standing feature request has been to be able to use the cancel package to show how units cancel out. This is useful for teaching, although it’s not of course part of the usual typesetting of units for publication. It turns out not to be too hard to allow this, so that you can now use input such as

\si[per-mode = fraction]{\cancel\kg\m\per\s\cancel\kg}

and have it come out properly. At the same time, I’ve made it possible to highlight particular units

\si{\highlight{green}\square\metre\candela\second}

again for teaching-related purposes.

A second long-standing request is to be able to parse uncertainties given in the form

\num{1.23 +- 0.15}

which was something more of challenge, but again is now working properly. So you can get the same output from the above and from

\num{1.23(15)}.

A final highlight is the new \tablenum macro. This is needed for aligning numbers inside \multicolumn and \multirow, which otherwise does not work. (At a technical level, both \multicolumn and \mutirow use the \omit primitive, and so the code inserted by the S column is not used. The \tablenum macro effectively makes the same approach available as a stand-alone function.)