File operation limitations

In my previous post, I looked at the concept of register allocation in TeX. What I did not talk about there was how many registers are available, as this is a little more complex. When Knuth wrote TeX, he gave 256 register of each type for storing data. So you can use \count0 to \count255, \toks0 to \toks255, etc. Some of these are ‘special’, so the number available for general use is slightly less, but the general concept is clear. Subsequent engines have extended the number of these registers a lot: the e-TeX extensions give us 32768 registers of each type! Using these requires a modified allocation system, which for LaTeX is provided by the etex package.

You’ll notice I’ve not mentioned file reading and writing in the above. There is a reason for this: they are different. TeX provides only 16 read and write streams, and this limit is retained in later engines. There is a reason for this: having lots of files open is not generally regarded as a good idea.

Having only 16 file streams available is clearly a bit limiting, particularly because the TeX \newwrite allocation never frees anything back up. Once a file stream has been used for one file it is never available for anything else. Many LaTeX users will be familiar with the potential result of this

  No room for a new \write

(The limitation on reading files does not show up so often in real cases, but is is also there.)

I’ve had a couple or requests recently asking what can be done about this. First, a bit more background. When TeX opens a file for writing, it wipes out any previous content. So you cannot keep opening and closing a file you need to add to: once it’s open, you don’t want to close it until you are done. This sounds awkward, but there are alternative approaches.

In LaTeX3, what we’ve implemented is a pool system. The idea is that rather than permanently allocating a stream, the programmer only uses one while it is needed, and then closes it. We’ve also strongly suggested that most file operations can happen in ‘one shot’, with all of the information saved in TeX’s memory until the end of the document. This can be done for anything which is written using \immediate\write, but for stuff written at \shipout then the file does have to be open. I’m hoping that most file writing operations fall into the first category! The biggest problem with this change is that it will only work with rewritten code, which does not help with existing packages now.

An alternative approach is implemented by the rvwrite package. It uses a single file, which is marked up in different parts for different uses. After the LaTeX run, this special file is then reprocessed to split out all of the separate parts. So there is no limit on how many files can be created. However, existing code still needs to be rewritten to use this new mechanism, and of course the user has to do an extra step.

One obvious question is ‘what about LuaTeX?’. It keeps the 16 file limit at the TeX end, but Lua has no hard limit and so can easily get around this. However, the same issue of needing to rewrite code applies as with any other solution.

So it seems we are stuck with a 16 file limit for a while yet. All of the improved approaches need new code to be written, as simply changing the way \newwrite works will simply break existing packages. However, there are better alternatives, and if people know about them then hopefully things will improve.

Local register allocation

There have been a few occasions in recent weeks where the question of locally-allocated registers has come up: the most recent is here. I thought it might be useful to look at this issue: we’ve recently looked at this for LaTeX3 and have decided against it, at least for the moment.

TeX registers

First, a quick bit of background on variables in TeX. At the engine level, TeX provides an umber of different register types, for example counts, token registers (toks), dimensions, etc. These can be referred to by number, for example

\count100=123
\toks100={Tokens}

However, this would be pretty awkward with any significant number of registers. To help out, the engine provides a number of primitives to give these registers more useful names:

\countdef\mycount=100
\toksdef\mytoks=100
\mycount=123
\mytoks={Tokens}

This is clearly better, as at the point of use the register is used by name rather than by number.

There is still a need to know which number to allocate in the first place: we don’t want to give the same register two (or more) names, and accidentally overwrite it. To solve this, the plain TeX format, LaTeX, etc., set up an allocation system, in which the numbers already used are kept a track of. This is wrapped up inside \newcount, \newtoks and so on.

Allocating locally

Okay, how does this relate to local variables? Well, this higher-level tracking mechanism works globally, so once a register is marked as used, it never becomes available again for re-use. So if I do

\begingroup
  \newcount\mycount
  ...
\endgroup

the register number for \newcount is not available at the end of the group. As there are only 256 registers of each type in Knuth’s TeX, this could soon lead to a serious issue.

The etex package provides for both global and local allocation of registers. This means that you can do

\usepackage{etex}
\begingroup
  \loccount\mycount
  ...
\endgroup

and have the register free up as you would expect.

What does local mean?

So does that mean that local registers are a good idea? I’d say probably not, because of what is meant here by local. In most languages, a local variable is local to some function, and nested functions have there own independent local variables. In TeX, things are different, as it is a macro language and only grouping makes things local. So something like

\def\BadIdea{%
  \loccount\mycount
  ...
}

will not destroy \mycount at the end of the material inserted by \BadIdea. On the other hand, things will work within a group, so doing

\def\BetterIdea{%
  \begingroup
    \loccount\mycount
    ....
  \endgroup
}

will destroy \mycount as expected.

For me, this is still not enough to mean that local allocation is a good way to work. There is always the need to track grouping, and there is not really a great gain over

\newcount\mycount
\def\BetterIdea{%
  \begingroup
    ....
  \endgroup
}

as the TeX group is still keeping the allocation of \mycount local.

As I said at the start, we’ve examined this for LaTeX3, and decided that the danger of misleading people is too much to put up with, despite some gains in code clarity. So while it’s an interesting area to look at, I think local allocation of registers does not really make TeX coding any easier.

Installing achemso and siunitx

A question that comes up from time to time is how to install one or other of my packages, usually either achemso or siunitx. While both are essentially standard LaTeX packages (no weird files or binaries needed), there are still soem stumbling blocks that cause issues. So I thought a few notes by be useful here.

Installing as part of an up to date TeX system

By far the easiest way to install my LaTeX packages is to get them as part an up to date TeX system. Both MikTeX 2.9 and TeX Live 2010 include all of my general packages. MiKTeX is of course Windows-only, but TeX Live can be installed on Windows, Mac OS X and Linux. After installation, doing an on-line update should grab all of the latest packages from CTAN. Both MiKTeX and TeX Live include graphical update programs, so this is not such a difficult process nowadays.

Mac users may well prefer MacTeX over plain TeX Live, but MacTeX is built on top of TeX Live and so the same ideas apply. You can install either TeX Live or MacTeX and get the same basic functionality.

For Linux users, it’s worth noting that popular Linux distributions tend to include old versions of TeX Live (or even teTeX), rather than TeX Live 2010. So if you want an up-to-date TeX system you’ll be better off ignoring your Linux package manager and grabbing TeX Live directly.

One thing to do if you update your TeX system is to check any locally-installed files you might have (see the next section for more about local installation). These will be in ~/texmf on Linux, ~/Library/texmf on a Mac and (probably) %USERPROFILE%\texmf on Windows. One problem I see from time to time is that users of achemso have installed some of the BibTeX styles locally, then update the main package and all sorts of things go wrong. So do check carefully on any local files: they might be outdated by a new TeX system.

Installing using the TDS zip files

The method above is fine if you are happy installing an entirely new TeX system, but if all you need is access to one of my packages then it is probably over-kill. For these users, I provide ready-to-install zip files on CTAN. For achemso, you need achemso.tds.zip, while for siunitx users you probably need

The idea with these files is that I have set them up with documentation, ready to use LaTeX styles and all of the support files. All that needs to happen with them is to unzip them inside your local TeX directory and tell TeX about them.

Where the files should go depends a little on your operating system. The local directory (folder) is usually ~/texmf on Linux, ~/Library/texmf on a Mac and (probably) %USERPROFILE%\texmf on Windows. Here, ~ and %USERPROFILE% represent your home directory (folder). So on my Windows 7 PC, I have a folder

C:\Users\joseph\texmf

while on my Mac there is one at

/Users/joseph/Library/texmf

Whichever system you use, copy the appropriate zip files there and unzip. The result should be a structure which looks like

texmf/tex/latex/achemso/achemso.sty
...
texmf/tex/latex/siunitx/siunitx.sty

and so on. Of course, the exact structure will depend on which packages you install! What is important for installing siunitx is to also install expl3 and xpackages. If the versions do not match then trouble will not be far away.

To tell TeX about the new files, you need to run the program texhash. There is a graphical interface for this in both MiKTeX (Update File Name Database) and TeX Live. I find it easiest just to start a Command Prompt/Terminal and type

texhash

[For users with recent versions of TeX Live (2009 and 2010, I think), running texhash is actually not needed. However, it will not do any harm so you may as well run it.)

Installing from the dtx file

The traditional method to install a package is to unpack it from the dtx source. I’ve got to say that I only recommend this for experienced LaTeX users. While both achemso and siunitx are designed to be easy to unpack, life is more complex for expl3 and xpackages. So I’d strongly recommed using the TDS zip files unless you know a bit more about LaTeX!

In praise of TeX by Topic

I’ve been hard at work on various LaTeX3 questions (more on which in another post), and little things about TeX come up all of the time. These are often rather technical, and so what I need is a good reference work which includes all of the detail. Now, The TeXbook is an obvious place to look, but it’s not available electronically. I do a lot of my coding when on the move (on the train, at work, various places around the house), and so carrying a book about is not always so easy. There’s also a lot more to The TeXbook than just a reference work, which doesn’t always make it quite so easy to quickly look up a particular primitive.

On the other hand, TeX by Topic is available electronically (and for free), and is a focussed reference work. As well as being able to download it from the author’s website,  it’s probably installed with your TeX system:

texdoc texbytopic

at the command line should open it up. I always find the content in TeX by Topic to be excellent: enough to help me out, but not too much that I get lost. For a TeX programmer, I think there is no better resource for those fiddly questions. Being available for free is of course an added bonus, although I would strongly encourage making a contribution to the author (I certainly have). For those people who want a printed copy, TeX by Topic is out of print but is available from Lulu. So all round it’s an excellent choice: great work!