LaTeX2e kernel development moves to GitHub

The LaTeX team have two big jobs to do: maintaining LaTeX2e and working on LaTeX3 (currently as new packages on top of LaTeX2e). For quite a while now the LaTeX3 code has been available on GitHub as a mirror of the master repository. At the same time, the core LaTeX2e code was also available publicly using Subversion (SVN) via the team website. At least in the web view, the latter has always been a bit ‘Spartan’, both in appearance and in features (only the most recent revision could be seen).

Coupled to viewing the code for any project is tracking the issues. For LaTeX2e, the team have used GNATS for over twenty years. GNATS has served the team well, but like the web view is Subversion is showing its age.

We’ve now decided that the time is right to make a change. Eagle-eyed users will already have spotted the new LaTeX2e GitHub page, which is now the master repo for the LaTeX kernel. We’ve not yet frozen the existing GNATS database, but new bugs should be reported on GitHub. (For technical reasons, the existing GNATS bugs list is unlikely to be migrated to GitHub.)

Frank Mittelbach (LaTeX team lead developer) has written a short article on the new approach, which will be appearing in TUGboat soon. As Frank says, we hope that most users don’t run into bugs in the kernel (it is pretty stable and the code has been pushed pretty hard over the years), but this new approach will make reporting that bit easier and clearer.

Accompanying the move of LaTeX2e to GitHub, the LaTeX3 Subversion repository has also been retired: the master location for this is also now on GitHub. So everything is in a sense ‘sorted’: all in one place.

Of course, the team maintain only a very small amount of the LaTeX ‘ecosystem’: there are over 5000 packages on CTAN. To help users know whether a bug should be reported to the team or not, we have created the latexbug package.  An example using it:

\RequirePackage{latexbug} \documentclass{article} \begin{document} Problems here \end{document}

will give a warning if there is any code that isn’t covered by the team (and so should be reported elsewhere). We hope this helps bugs get to the right places as easily as possible.

I handled most of the conversion from Subversion to Git, and I’d like to acknowledge SubGit from TMate Software for making the process (largely) painless. As LaTeX is an open source project, we were able to use this tool for free. We used SubGit for the ‘live’ mirroring of LaTeX3 to GitHub for several years, and it worked flawlessly. The same was true for the trickier task of moving LaTeX2e: the repo history had a few wrinkles that we slightly more difficult to map to Git, but we got there.

Moving from Mercurial to Git

Over the years of working with LaTeX, I’ve picked up a bit about version control systems for code: this post is more about general programming than TeX.

I started out with Subversion, then moved to Mercurial when I got involved in beamer maintenance. The idea is the same whatever system you are using: by keeping a track of changes in the code you help yourself in the long term, and make it easier for other people to help too. Mercurial is one of several ‘distributed’ version control systems (DCVS) that have been developed over the last few years. The idea is that each ‘repository’ (copy of the code) has the history with it, and so is independent of any server. You can still send your changes to a server, and that is very popular, but you don’t have to. Sending code to a public server makes it easy to let other people get involved, report issues and so on, and there are lots of sites that will help you do this.

I picked Mercurial over the other leader, Git, mainly because the other guy involved in looking after beamer went this way and put the code on BitBucket. At the time, BitBucket did Mercurial while GitHub did Git. BitBucket changed hands a little while ago now, and they brought in Git support. They’ve now moved to make Git the ‘standard’ repository type. That tells me that Git is likely to ‘win’ as the most popular DCVS (it’s looked that way for a while), and so it’s time to reconsider my use of Mercurial.

It turns out that moving from Mercurial to Git is pretty easy: there is a script called fast-export that does the job. Converting the code itself is therefore easy: run the script (you need a Unix system, so on Windows I’m using a virtual Ubuntu machine with VirtualBox). Life gets a bit more interesting, though, if you want to keep your issues database. BitBucket does offer issue import and export, but no easy way to convert from Mercurial to Git. At the same time, the way that the two systems refer to individual commits means that if you don’t edit your issues, any links to the commits will be messed up. That means that its as easy to move to GitHub as it is to stay on BitBucket. So that’s what I’ve decided to do (GitHub is pretty popular with other LaTeX developers). I’m working through my repositories, converting to Git and uploading to GitHub, then copying the issue information by hand and doing minor edits. That includes making sure that I keep the links which show how I fixed things. Apart from siunitx, my packages don’t have a lot of issues (no more than a dozen each), so I can do that by hand without too much work. I’d a bit surprised no-one has written a script to do this, but at least it will not take too long. I’d expect everything except siunitx to be moved by the weekend, and even this ‘big job’ to be done within a couple of weeks.

Moving code to BitBucket

As the number of packages I’ve written has grown keeping a track of everything has got more complex for me. Not having a background in programming, I’ve very much had to learn things ‘on the job’. One thing I’ve been doing for a while now is using a version control system for the new version of siunitx and working as part of the LaTeX3 Project. The LaTeX3 Project uses the Subversion system (also known as ‘SVN’), and so I’ve been using the same system for siunitx version 2 (hosted by BerliOS). I’ve now decided to get a bit more systematic, using the service provided by BitBucket.


There are a few different things that I wanted to get sorted out with all of my packages. First, I think it is useful to make the code (and code changes) publicly available in one place. That is what version control systems provide: you get a list of changes, with hopefully some notes on what was going on. That also makes it possible for other people to easily suggest patches (if they want to, of course!). Second, tracking bugs and feature requests really requires some kind of structure. I currently have a long list of e-mails that list things to think about: making these both publicly available and organised is a good idea. That helps me, and also lets user see what has already been logged. Third, it is useful if there is a way of having on-line documentation, for example using a wiki.

Moving to BitBucket

I had a look at various approaches to doing all of that. I’ve had a few issues with BerliOS, and as I’ve used it I’ve realised that the interface is rather awkward. Two services which look rather more helpful are BitBucket and GitHub. Both of these are based around distributed version control systems: Mercurial and Git, respectively. I’m not going to go into the details of either of these (or the differences between them), but Mercurial seemed a bit easier to use to me so I decided to go that way. BitBucket includes all of the bug tracking and wiki features on my list of ideas, and so far I’ve found the interface clear and powerful.

Over the last couple of days I’ve uploaded basically all of the current versions of my packages to BitBucket. As most of these were written without formal version control, I’ve had to ‘reconstruct’ the historical changes from my archive. I’ve aimed for a balance between providing information and my time: the history goes back to the first version of the current releases (for example, from v3.0 for achemso, which is currently on v3.4f). Of course, any new versions will appear on BitBucket. I’m now working on moving all of the issues into the bug databases. I’ll also look at the wiki side of things: I’ll try to put some basic installation and use information, and perhaps some frequently asked questions. BitBucket includes RSS feeds, so anyone interested can follow what is going on.

The beamer connection

I should add that one of the reasons for looking at all of this was the recent news that the beamer package has a new maintainer. I’ve taken a bit of interest in this, and as the code has been moved to BitBucket, took a look at the facilities on offer there. As a result, I’ve been given access to the new repository. I have a few vague ideas about areas to look at, but at the moment nothing definite!

Get involved

Anyone interested is of course free to contribute. Adding any bugs, enhancements or ideas to the databases is one of the easiest way to do that. If anyone wants access to edit the wikis or add code, drop me a line.