siunitx 2: more numbers, more tables, SVN?

The issue of input formats for siunitx and numbers has been mentioned. It seems every time I think I’ve sorted broadly what I need to do, other ideas come up. I’ve thought about scientific notation before, but haven’t had a go at coding anything. The idea of “compressing” error input such as 1.23\pm 0.02 into 1.23(2) was mentioned to me before, but that one slipped off my radar. I guess I should re-visit both of these areas and see what I can do. Of course, that will delay me with other things, but I’d really like siunitx 2 to cover a lot of things that version 1 does less well.

Tables have also been raised (again). I’ve had a quick look at pgfplotstable in the past, but I suppose I should read things in detail. I mention it a lot, but tables were not on the original “manifesto” for siunitx, and have rather crept up on me. It seems that this is a key area for users, and so I’ll also need to look at this area again. I’ve already said I need to work on more table ideas, so this is not really that much of a surprise. Over all, my aims in siunitx are probably slightly different to pgfplotstable, but I’ll see what I can learn.

All of this leads me to worrying about public information and releases. As a developer, I’d love to keep things simple for users, and only release working material. On the other hand, advanced users often provide good feedback well before things are done. I think I need some kind of public repository for the version 2 code, and as I currently use BerliOS for my LaTeX3 ideas, I’ll look at settings something up there over the weekend. They seem to be quite happy with the LPPL, which not every free open-source hosting service is.

Work on siunitx version 2

Progress on siunitx 2 was slower over Christmas than I’d hoped. However, I should still manage a first snap-shot of what I’m doing some time this month.

I’m working on three different targets at once:

  1. Making the existing code clearer. This means improving the internal naming of functions, making each section more independent of the others and trying to use some LaTeX3 ideas in the internal code. This is the fastest part of the job.
  2. Improving the efficiency of the current code. LaTeX3 has lots of good ideas about loops and expandable tests, and I’m using this to make the existing code more efficient. I’ve removed a lot of loops from the current number parsing system, which makes for a faster package. Once everything works properly, I’ll try to get some testing done on this to see what difference it makes.
  3. Adding new features. This is the slowest job, unsurprisingly. There are various things I’m still working out how to do with numbers, before I even look at some of the unit-based problems.

Each of the areas takes time, and and have other things to do as well! I’m still hoping to get something done in time for TeXlive 2009: it’s always best to have some kind of target.

Universal UTF-8

The existence of editors such as TeXworks make it very easy to work with UTF-8 source documents. However, there are still a number of issues to thing about before deciding to use UTF-8 for all of your work.

First, there is the issue of other users. If you are writing things that will not need to be edited by others, then the choice is down to you. The moment you have collaborators, you need to ensure that they are also okay with UTF-8. They might be using an editor where this is not going to work (WinEdt, for example). If you are preparing stuff for a publisher, you have to be even more careful, as they may have quite a “traditional” TeX system. I know that the American Chemical Society don’t even have the e-TeX extensions, for example.

Then there are more technical issues. If you are a LaTeX user, you might well also use BibTeX. BibTeX is old, and as yet there is no real UTF-8 aware replacement. So at least in a database of references you may have to stick with escape sequences or some other encoding.

There are also choices to make about the engine you use. XeTeX is the obvious choice for UTF-8 documents, but that means missing out on the pdfTeX extensions to TeX, for example micro-typography. LuaTeX might help here, but if you are a MiKTeX user this is still some way off being available.

All in all, UTF-8 input is not quite the universal standard for TeX, just yet. New editors and engines mean that things are almost there, but a few awkward issues remain.

The LaTeX3 “template” concept

The “template” concept, implemented in the LaTeX3 template module, has confused me for a while. I’ve had a few attempts at reading the documentation, but have consistently failed to fully understand things. A recent post on the LaTeX3 mailing list has prompted me to have another go at understanding things: I think that this time I might have got it.

The idea of templates is to separate the design decisions about document elements from both the user interface and the underlying coding. A template is a generalised description of a type of document element; a specific instance of a template is created for each specific use.

For example, if we have a template “SectionHead” as a generalised section-starting function, then specific instances might be Plain, Indented, Fancy, and so on. Notice that this is not the same as the user interface (where we’d expect to see \section, \subsection, etc.: this mapping is handled separately).

The current method for implementing templates uses a key–value method. So in our example, there will be keys for things like the font weight of the text, the surrounding whitespace and so on. When an instance of a template is created, so of these values are set, so that the instance will work more rapidly. Other values are left until run-time, and so can be set by the user.

Once you understand the concept, this looks like a very clever way of keeping design and code more-or-less independent. Of course, this does depend on how many template settings are available to the designer. I’m not too sure about the method for creating keys of different types, as it is not quite classical keyval and is therefore something I’m not used to (but perhaps as I’m a big fan of pgfkeys I would say that). Perhaps it will grow on me.

TeX and namespaces

A question on the LaTeX3 mailing list has got me thinking abut namespaces. Plain TeX users tend to have their own set of macros, plus those from the plain format, and so are pretty much in control of everything. On the other hand, ConTeXt users can rely on the small focussed development team to keep naming sensible. That leaves LaTeX, where things are complicated.

The current LaTeX situation is rather a hodge-podge of approaches. Internal macros follow the plain TeX conventions, and include one or more @ symbols. However, this is all rather a mess, as there is no real system: \@tf@r, from the kernel, for example. What is it for (no peeking)? User macros are little better, with some including package names, some with captials, others defined only in certain places, etc. This means developing a new package is something of a risk: it is very easy to end up getting e-mails saying

Your package XXX clashes with package YYY because both define \SomeObsureMacroName. Sort it out!

or words to that effect.

LaTeX3 approaches

The LaTeX3 “module” concept helps to some extent. This formalises what many package authors do in LaTeX2e, so that all internal macros for a module (a package for LaTeX3) start with \module (or some fixed abbreviation for the module name). However, this leaves two issues:

  1. How are module names managed?
  2. What about user macros?

It we imagine that LaTeX3 will eventually have a large community of developers, in the same way as LaTeX2e does, then this needs to be addressed. The ConTeXt approach of of small, focussed team doesn’t really apply in the LaTeX world.

Ideas

I’d suggest a two-part solution to the issue, first at the LaTeX level and then pre-emptively using a database. At the LaTeX level, I’d suggest that each module should include two special functions, one to reserve the module name and a second to reserve user macro names. Something like:

\module_details:n { % Note _ and : are "letters"
  prefix     = <prefix>,
  full~name  = <Long macro name>, % ~ is a space
  version    = <version>,
  date       = <date>,
  maintainer = <Whoever>,
  e-mail     = <contact e-mail>, % and so on
}
\module_reserve_names:n  {
  <function-name-1>,
  <function-name-2>, % etc.
}

You’d do this at the start of the module, and a check could then be made with other modules that had already been loaded. In the event of a clash, LaTeX could then give a useful error message including the name of the clashing module, and hopefully contact details.

The second part of the system would be to encourage people to submit this information to a central database, so that developers can check in advance of writing anything. I’d imagine you’d put the details above in a separate file, and upload only this data (lets call it a .mod file). It should then be relatively easy to parse the information out into a mySQL database, and hopefully some PHP would produce a simple interface for checking. Two methods would be available: check against the database (hopefully early on in the process of writing a module) and submitting to the database for inclusion. I’d hope most of this could be automated (he says with no experience at all!).

Conclusion

TeX doesn’t help out much in keeping a large namespace in order. So you either have to have a very small team (the ConTeXt approach), keep the namespace small (the plain TeX approach) or seek for your own system (where LaTeX3 are going). LaTeX3 doesn’t quite cover everything at the moment, but the potential is there.

Using LaTeX with WordPress

A meander around Google blogsearch took me to http://sixthform.info/steve/wordpress/, which has some very interesting details about using LaTeX with WordPress. I’ll be looking at this myself (if nothing else, it would be cool), but it really looks useful for people running larger sites (most obviously the LaTeX Project itself). I’d not seen this blog before, so I wonder if others were aware of it.

Section numbers in achemso

When I wrote the achemso class for submissions to American Chemical Society journals, I did my best to get the style of each journal correct. Of course, this is not easy as there are a lot of journals and they are not necessarily consistent in applying the style rules! One issue that comes up a lot is section numbering. Most of the journals do not number sections, most of the time. However, sometimes authors want to include section numbers. I need to look at this again for version 3.2, but in version 3.1 you need to do:

\makeatletter
\acs@restsecnums
\makeatother

somewhere in the preamble to restore numbering.

TeX Questions

There are three main places to ask (La)TeX-related questions in English:

Each has a different mix of people, and I wonder how much cross-over there is. The LaTeX Community forums seem best for the newer user, as there is a lot less complex information than in the newgroup or on texhax. I’d say that the newgroup is the most active of the three, with texhax a relativity quite list. Of course, there are also ConTeXt-specific places to talk. I wonder how much regulars of each “place” know about the other ones?