The TDS and TDS zip files

In my post about working with dtx files, I made the somewhat throw-away statement

The idea of a TDS-ready zip is also popular …

I’ve been pulled up on that by e-mail, so I thought I’d consider this a bit more before looking at automating working with dtx sources.

First, what does ‘TDS’ mean? The short answer is TeX Directory Structure, but that does not really help. The idea of the TDS is to have a standard layout for the location of TeX files, so for example LaTeX .sty files go inside tex/latex/ (or more likely tex/latex/<package>/), while BibTeX style files (.bst) go inside bibtex/bst.

That’s important as both TeX Live and MiKTeX use a database to locate files, and only scan the ‘correct’ parts of the TeX installation when building the database. So if you do a local file installation and put the file in the wrong place TeX won’t find it. The result is that you need to lay things out properly if you’re installing TeX files, which can be confusing if you’re not familiar with the idea.

The TDS zip idea is very simple: rather than having just the source files in a zip, set up the documentation, extracted package, source and so on in the correct layout. The end user can then just unzip the TDS zip inside their local TeX directory to install a package (of course, they’ll still need to run texhash). As a package author, sending a user a file they can just unzip and use is much faster in the long run than sending them a bunch of files with a list of installation instructions on where everything needs to go.

The downside of the system is that the TDS zip has to be built in the first place, by the package author. With a batch script or Makefile for doing releases, that’s not an issue, but it is if everything is being done by hand. Mistakes can be a problem (even with a script, if you get it wrong!).

One argument against needing TDS zips is that both TeX Live and MiKTeX update very regularly, and so it’s easy to install a package from CTAN using the package manager of your TeX system. However, not everyone is using a burning edge system. Many Linux systems are still on TeX Live 2007, and on multi-user systems the TeX system is probably fixed by the administrator. So doing local installations is still a reality for a lot of people.

Over all, I think the idea has merit, but as a package author you do need to put in some effort to get things ‘right’. I’ll be talking about some scripting in my next post, and hopefully the ideas there will show how things can be done reliably. The overall aim is to make life easier for non-experts, and if that means a bit more effort for the more experienced TeX users then I think the trade-off is not too bad. I don’t expect everyone to agree with me!

Working with dtx files

I’ve talked a bit about the dtx file format and given an example of a skeleton dtx file. I thought I’d talk next about working with sources, which will be a bit of a scattered post but will hopefully be interesting!

Editing dtx files

As with any TeX source, you don’t have to have a special editor to work with dtx files, but it can be helpful. Many people say that Emacs (using AUC-TeX) has the best dtx editing mode of all: I’ve only tried it briefly, but the AUC-TeX homepage has the details. On Windows, WinEdt has a pretty strong DTX Submode which does similar things. As I mainly use TeXworks, I’ve made a few modifications to get something similar to those two ‘leaders’: at the moment I use the following settings for syntax highlighting:

[LaTeX DTX]

# comments
red        Y    \^\^A.*

# Guards
orange        N    %<(?:[A-Za-z0-9!\|]+|.)>
limegreen    N    %<\*(?:[A-Za-z0-9!\|]+|.)>
crimson        N    %</(?:[A-Za-z0-9!\|]+|.)>

# special characters
darkred        N    \^\^\^\^\^[0-9a-z]{5}
darkred        N    \^\^\^\^[0-9a-z]{4}
darkred        N    \^\^\^[0-9a-z]{3}
darkred        N    \^\^[0-9a-z]{2}
darkred        N    [$#^_{}&]
gray        N    ^%%.*
gray        N    ^%

# Macrocode
green        N    \\(?:begin|end)\{macrocode\}

# LaTeX environments
darkgreen    N    \\(?:begin|end)\s*\{[^}]*\}

# control sequences
blue        N    \\(?:[A-Za-z@:_]+|.)

plus a few special auto-complete entries. Not quite up with the best just yet, but it’s still early days for TeXworks.

What to document

Getting documentation right is not easy. My general approach is to try to include lots of examples, so I always load the package being talked about for the documentation. That means I can use the package ‘in place’. Unfortunately, ltxdoc does not have a built-in example environment. I use the listings package, and although it’s a bit complex looking, the following code works well:

%\lst@RequireAspects{writefile}
%\newsavebox{\LaTeXdemo@box}
%\lstnewenvironment{LaTeXdemo}[1][code and example]{^^A
%  \global\let\lst@intname\@empty
%  \expandafter\let\expandafter\LaTeXdemo@end
%    \csname LaTeXdemo@#1@end\endcsname
%  \@nameuse{LaTeXdemo@#1}^^A
%}{^^A
%  \LaTeXdemo@end
%}
%\newcommand*\LaTeXdemo@new[3]{^^A
%  \expandafter\newcommand\expandafter*\expandafter
%    {\csname LaTeXdemo@#1\endcsname}{#2}^^A
%  \expandafter\newcommand\expandafter*\expandafter
%    {\csname LaTeXdemo@#1@end\endcsname}{#3}^^A
%}
%\newcommand*\LaTeXdemo@common{^^A
%  \setkeys{lst}{
%    basicstyle   = \small\ttfamily,
%    basewidth    = 0.51em,
%    gobble       = 3,
%    keywordstyle = \color{blue},
%    language     = [LaTeX]{TeX},
%    moretexcs    = {
%      examplemacro,
%      ^^A Add you command names here!
%    }
%  }^^A
%}
%\newcommand*\LaTeXdemo@input{^^A
%  \MakePercentComment
%  \catcode`\^^M=10\relax
%  \small
%  \begingroup
%    \setkeys{lst}{
%      SelectCharTable=\lst@ReplaceInput{\^\^I}{\lst@ProcessTabulator}
%    }^^A
%    \leavevmode
%      \input{\jobname.tmp}^^A
%  \endgroup
%  \MakePercentIgnore
%}
%\LaTeXdemo@new{code and example}{^^A
%  \setbox\LaTeXdemo@box=\hbox\bgroup
%    \lst@BeginAlsoWriteFile{\jobname.tmp}^^A
%    \LaTeXdemo@common
%}{^^A
%    \lst@EndWriteFile
%  \egroup
%  \begin{center}
%    \ifdim\wd\LaTeXdemo@box>0.48\linewidth\relax
%      \hbox to\linewidth{\box\LaTeXdemo@box\hss}^^A
%        \begin{minipage}{\linewidth}
%          \LaTeXdemo@input
%        \end{minipage}
%    \else
%      \begin{minipage}{0.48\linewidth}
%        \LaTeXdemo@input
%      \end{minipage}
%      \hfill
%      \begin{minipage}{0.48\linewidth}
%        \hbox to\linewidth{\box\LaTeXdemo@box\hss}^^A
%      \end{minipage}
%    \fi
%  \end{center}
%}
%\LaTeXdemo@new{code only}{^^A
%  \LaTeXdemo@common
%}{^^A
%}

This gets pasted in at the start of the document (after the driver): not great coding style, but not too bad for this job. The idea is that I can then write something like:

%\begin{LaTeXdemo}
%  Some clever demo here
%\end{LaTeXdemo}

and the demonstration will be both typeset as code and actually used. So the user can see the input and the result of whatever I’ve designed.
Of course, if you are writing LaTeX classes or the like, then this won’t work. For my achemso class, I include a demo document in the dtx. This then gets extracted as a separate file (achemso-demo.tex), which includes lots of hints in the text and demonstrates as much as possible about the class. Again, the idea is to show how things are done by example: much better than trying to explain in the abstract.

Releasing stuff to CTAN

In my next post, I’m hoping to talk about automating the process of getting stuff ready for CTAN. So here I’ll just mention a few general ideas. One is that users shouldn’t need to read the code to use a LaTeX package (unless of course you are aiming at supporting other package authors). So I tend to typeset only the user part of the documentation. Normally, it’s a good idea to also include an index and list of changes in the pdf documentation, so a typical ‘recipe’ would be

pdflatex -draftmode "\AtBeginDocument{\OnlyDescription} \input demopkg.dtx"
makeindex -s gglo.ist -o demopkg.gls demopkg.glo
makeindex --s gind.ist -o demopkg.ind demopkg.idx
pdflatex "\AtBeginDocument{\OnlyDescription} \input demopkg.dtx"
pdflatex "\AtBeginDocument{\OnlyDescription} \input demopkg.dtx"

The idea of all this is to only typeset what is needed (hence the -draftmode in the first line), to miss out the code (\OnlyDescription), and to index both the changes and the user functions (the two makeindex lines). This recipe will turn up in the next post as the basis for doing things without needing to remember everything yourself!

CTAN like to have a zip file containing the source, documentation and ins file. They also prefer Unix line endings, so those of us working on Windows have to use something like Info-ZIP or the Swiss File Knife to alter the line endings. The idea of a TDS-ready zip is also popular, so there are normally two files to get ready. All of that is a bit awkward, especially if like me you keep having to do bug fix releases. So I always do this using an automated tool. I’ll talk about this in the next post.

A model dtx file

In my previous post, I’ve tried to give a very general overview of how the dtx file format comes about, from a combination of the syntax of DocStrip and ltxdoc. The problem with the bald details is that there are still lots of way to actually use the ideas to construct a dtx. So here I’m going to detail a model dtx, which is ready to be filled in with real code and documentation. The entire file is available here as demopkg.dtx: get it now if you are impatient!

The idea of constructing a dtx file in the way I’ll describe is that it lets us achieve several things in one go:

  • All of the files for a package can be derived from a single source (unless you need a binary, of course).
  • The README is included in the dtx, with this useful information at the start.
  • The ins file is included in the dtx, so the file is self-extracting.
  • Running (pdf)tex <name>.dtx extracts the code and associated files (ins, README, etc.).
  • Running (pdf)latex <name>.dtx does the extraction then typesets the documentation. This way, the documentation always has the latest code available, and users don’t need to worry about which method they use to get stuff extracted.

Most of the ideas here are not mine: Will Robertson came up with a lot of this. I’m just going to give some details of what is going on. I’m going to present the source in order, with a section of the source followed by some comments explaining what is going on. I’m going to call the demonstration package ‘demopkg’: something easy for search and replace. Where ever possible, \jobname is used in the source so that the file name changes automatically when moving from one package to another.

% \iffalse meta-comment
% !TEX program  = pdfLaTeX

The file starts off with an \iffalse which will mean that ltxdoc will skip all of this code when typesetting the document. I use TeXworks as my editor, so I include the special !TEX program comment so that it defaults to pdfLaTeX with all of my files: this does no harm so may as well be there. The same comment is also recognised by TeXShop.

%<*internal>
\iffalse
%</internal>

There is then a guard called ‘internal’: this is never extracted out, but lets us have an uncommented \iffalse in the code. which will mean that the next section will be ignored by TeX initially. The idea here is that we are going to have some text (the README), that TeX would otherwise try to typeset. We don’t want that, so need to skip it at the moment.

%<*readme>
----------------------------------------------------------------
demopkg --- description text
E-mail: you@your.domain
Released under the LaTeX Project Public License v1.3c or later
See http://www.latex-project.org/lppl.txt
----------------------------------------------------------------

Some text about the package: probably the same as the abstract.
%</readme>

This part is pretty obvious: the README file for the package, inside guards called ‘readme’. As you might expect, this will get extracted out later as the README file. In the initial TeX run, this text will be skipped (because of the \iffalse), but when DocStrip runs it will show up (as DocStrip will ignore the \iffalse, which is in a different set of guards).

%<*internal>
\fi
\def\nameofplainTeX{plain}
\ifx\fmtname\nameofplainTeX\else
  \expandafter\begingroup
\fi
%</internal>

Back with the special ‘internal’ guards, the \iffalse is ended and a check is made on the current format. For LaTeX, a group needs to be begun so that DocStrip can be loaded without later problems. For plain TeX, only the extraction is going to happen, so that is not an issue.

%<*install>
\input docstrip.tex
\keepsilent
\askforoverwritefalse

The next section, inside ‘install’ guards, is the instructions for extracting the code out of the dtx. Later, this will also turn into a stand-alone ins file. DocStrip gets loaded, then we tell it to do its job without asking for any conformation or printing too much stuff.

\preamble
----------------------------------------------------------------
demopkg --- description text
E-mail: you@your.domain
Released under the LaTeX Project Public License v1.3c or later
See http://www.latex-project.org/lppl.txt
----------------------------------------------------------------

\endpreamble
\postamble

Copyright (C) 2009 by You <you@your.domain>

This work may be distributed and/or modified under the
conditions of the LaTeX Project Public License (LPPL), either
version 1.3c of this license or (at your option) any later
version.  The latest version of this license is in the file:

http://www.latex-project.org/lppl.txt

This work is "maintained" (as per LPPL maintenance status) by
You.

This work consists of the file  demopkg.dtx
and the derived files           demopkg.ins,
                                demopkg.pdf and
                                demopkg.sty.

\endpostamble

Some simple boiler-plate text, that DocStrip will add to the start and end of each extracted file. Of course, this can say what you like.

\usedir{tex/latex/demopkg}
\generate{
  \file{\jobname.sty}{\from{\jobname.dtx}{package}}
}

This section is the instruction to actually extract the LaTeX package file from the dtx. Each file to be extracted needs a line saying how to create it, so if there is a class to extract there would be a line for that, and so on. The \usedir instruction can be used to tell DocStrip how to lay files out: it is best to include it as some people use this. Normally, it will just specify tex/latex/<package>, but might change if there are lots of files to lay out in a structured way. For example, cfg files are often put in tex/latex/<package>/config.

%</install>
%<install>\endbatchfile

That ends what will get extracted into the ins file, so the install guard is closed. The second line is needed as the ins file needs to include \endbatchfile (for DocStrip), but we don’t want the same effect when the dtx is doing the extracting.

%<*internal>
\usedir{source/latex/demopkg}
\generate{
  \file{\jobname.ins}{\from{\jobname.dtx}{install}}
}
\nopreamble\nopostamble
\usedir{doc/latex/demopkg}
\generate{
  \file{README.txt}{\from{\jobname.dtx}{readme}}
}
\ifx\fmtname\nameofplainTeX
  \expandafter\endbatchfile
\else
  \expandafter\endgroup
\fi
%</internal>

When extracting the dtx (with TeX or LaTeX), we need to generate the ins file and the README, which is done here. The ins file is quite simple: the the same process as the sty file. However, there are a couple of points about the README. First, we don’t want DocStrip to add any extra text, hence \nopreamble and \nopostamble. Second, DocStrip can only make files with extensions, so the file has to be called README.txt. (It can be renamed later: hopefully there is no loss of clarity.) If plain TeX is in use, that is the end of the processing, whereas for LaTeX the group containing DocStrip can be closed.

%<*package>
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{demopkg}[2009/10/06 v1.0 description text]
%</package>

Next, the fact that DocStrip can process blocks in different places can be used for the same file. This part of the package does not really need to be printed later on, and done this way the version number is included near the top of the source. Things don’t have to be done this way: this section can always be left out if you like.

%<*driver>
\documentclass{ltxdoc}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage{\jobname}
\usepackage[numbered]{hypdoc}
\EnableCrossrefs
\CodelineIndex
\RecordChanges
\begin{document}
  \DocInput{\jobname.dtx}
\end{document}
%</driver>

The next block is the driver: this is the information used to typeset the code and documentation. I normally load the package I’m talking about so that I can use it in the documentation, and load a few refinements (modern fonts, hyperdoc to get hyperlinks, and so on). There are a few ltxdoc-specific instructions here: they mean that we get a proper index and information linking macro use information to the code.

% \fi

This matches the \iffalse in the very first line of the file: it marks the beginning of material which will actually be typeset.

%
%\GetFileInfo{\jobname.sty}
%
%\title{^^A
%  \textsf{demopkg} --- description text\thanks{^^A
%    This file describes version \fileversion, last revised \filedate.^^A
%  }^^A
%}
%\author{^^A
%  You\thanks{E-mail: you@your.domain}^^A
%}
%\date{Released \filedate}
%
%\maketitle
%

Here, the title is set up and printed. A few things to notice here. By using \GetFileInfo, the version and date information are picked up from the package itself: no repetition of the information is needed in the dtx. Also, we can’t use % as a comment character, and so ltxdoc sets up ^^A to do the job instead.

%\changes{v1.0}{2009/10/06}{First public release}

General changes (not associated with any particular macro) are best listed somewhere early on. These will be used by the \PrintChanges macro to provide users with a change log.

%
%\DescribeMacro{\examplemacro}
% Some text about an example macro called \cs{examplemacro}, which
% might have an optional argument \oarg{arg1} and mandatory one
% \marg{arg2}.
%

This is where the documentation goes. I’ve included an example macro with a couple of
arguments as reminders of the syntax.

%\StopEventually{^^A
%  \PrintChanges
%  \PrintIndex
%}
%

This macro marks the end of the user part of the documentation. The two functions in the argument
will be used either here (if the code is not typeset) or after the code (if it is typeset). As the dtx file is now,
the code will print. However, in the next blog post I’ll talk about printing only the documentation and
missing the code out.

%    \begin{macrocode}
%<*package>
%    \end{macrocode}

The lead off for the package code itself opens the guard for extracting the code. Normally, I like to have this on its own, to remind where what is going on.

%
%\begin{macro}{\examplemacro}
%\changes{v1.0}{2009/10/06}{Some change from the previous version}
%    \begin{macrocode}
\newcommand*\examplemacro[2][]{%
  Some code here, probably
}
%    \end{macrocode}
%\end{macro}
%

Here we have some code: separated out using the macrocode environment. As I described in the last post, the \begin{macro}\end{macro} block indicates that this is where \examplemacro is defined: indexing needs to know this. The \changes given in the code block only get printed if the code is typeset. They are therefore best used for low-level information, rather than usage changes that users need to know about.

%    \begin{macrocode}
%</package>
%    \end{macrocode}
%\Finale

The last part of the file: close the guard for the code, and call \Finale. This runs anything delayed from the earlier \StopEventually, so in this case the index will get printed here if the code is typeset.

The dtx format

A few comments and e-mails have prodded me to put down some thoughts on sorting out LaTeX packages. My original thought was to do one post, but I think a few are needed. So this is the first of a related series: probably three or four posts. I’m going to start with an overview of the DTX format, used for the source of a lot of LaTeX packages.

Why dtx at all?

The first question to consider is why people bother with the dtx format. Not everyone does, and there are good arguments for and against each different approach. I could easily write an entire post just discussing the various alternatives, but that’s not what I want to do here!

The dtx format is favoured by the LaTeX team, and so is something of a standard for LaTeX package authors. The idea of the dtx format is that it allows the package author to put the user documentation, code documentation and code itself in one place. The user documentation can be typeset on its own, or the user and code documentation can be typeset together (literate programming). The code can also be extracted from the source for use: this means that more than one file can be included in the same source. This last point is perhaps the biggest selling point of the dtx format: you can include LaTeX package, class and some configuration files in one source. There is also a speed gain from removing redundant (comment) lines from a package: on a modern PC, this is pretty tiny, but was a bigger point in the past.

How the format comes about

The dtx format is really defined by two mechanisms, provided by two separate (La)TeX packages. The ability to take one source file and generate several production files, with the comments removed, is provided by the DocStrip package (which is written in plain TeX). Documenting the code, and providing user details, is supported by a dedicated LaTeX class: ltxdoc. (This is itself based on the doc class, but I’m going to focus on ltxdoc.) The combination of the syntax for the two parts of the mechanism leads to the dtx file format.

It is quite possible to use dtx files as the source for things other than (La)TeX files. Indeed, in my second post I’m going to use the dtx format to also include a plain text file in the source. However, I’m not going to talk about making changes to include other types of code: this is doable, but a bit advanced to go into here!

DocStrip: guards and extracting

The DocStrip TeX file provides a mechanism to do two related tasks:

  1. Remove the comment lines from a source file
  2. Produce several production files from one source

To do this, the source file itself (normally a .dtx file) needs to be accompanied by a set of instructions on how to do the extraction (normally a .ins file). The two tasks are inter-related: DocStrip always has to generate a new file to remove the comment lines, even if the source is only for one file.

Removing the comment lines is relatively easy to understand. Any line in the source starting with one % will not appear in the generated file(s). So code lines in the source are written as normal, and any comments that should appear in the generated files need to start with two (or more) % characters.

The more complex idea is setting up the source so that several files can be generated from one source. This uses so-called guards to indicate what goes with what. Let’s imagine we have a very simple source, which will be used to generate two LaTeX packages. The two packages share some code, so we don’t want to repeat it in the source file.

%<*PackageA>
Code just for package A
%</PackageA>
%<*PackageB>
Code just for package B
%</PackageB>
%<*PackageA|PackageB>
Code for both packages
%</PackageA|PackageB>

What is happening here? Each guard line is a comment (so it will not appear in the production file), and is enclosed in angle brackets. The first line, %<*PackageA>, is a guard starting lines that will only appear when extracting PackageA. That continues until the matching closing guard, %</PackageA>. There is then a section that applies only to PackageB, marked up in much the same way. The final set of guards use the | symbol to mean ‘or’: lines here will appear in both PackageA and PackageB. You can do more complex things (nest guards, use & as a logical ‘and’, etc.), but the basic idea remains the same.

I’ve said a couple of times that the code is extracted, and that this needs some instructions for DocStrip. Essentially, this means matching up the names of the guards with the files they relate to. In my simple example, a suitable DocStrip .ins file would be

\input docstrip
\askforoverwritefalse
\generate{
  \file{PackageA.sty}{\from{example.dtx}{PackageA}}
  \file{PackageB.sty}{\from{example.dtx}{PackageB}}
}

Here, I’ve assumed the dtx file is called example.dtx. I’m only using one dtx file, and one guard for each output file. As with many other parts of DocStrip, there is more you can do.

Multiple guards can be used for each package, so if we had lots of packages with some common code, we might well have:

\input docstrip
\askforoverwritefalse
\generate{
  \file{PackageA.sty}{\from{example.dtx}{PackageA,common}}
  \file{PackageB.sty}{\from{example.dtx}{PackageB,common}}
  \file{PackageC.sty}{\from{example.dtx}{PackageC,common}}
}

and so on.

So DocStrip lets us extract code out, have common sections, remove comments and so on. However, it doesn’t help with the documentation side at all: that is all just comments to DocStrip.

ltxdoc: documenting the source

The ltxdoc class is the typesetting part of the dtx format. The idea is that the dtx is read in by a driver file, which actually does the typesetting. When the dtx is read in this way, the comment characters are ignored, meaning that what DocStrip sees as comments are the source for typesetting the dtx. In practice, most dtx files are written so that the driver is part of the dtx itself. The driver part of a dtx is normally very simple

\documentclass{ltxdoc}
% Perhaps some \usepackage instructions
\begin{document}
  \DocInput{\jobname.dtx}
\end{document}

Usually, the documentation and code then follows after \end{document}: with some correctly placed %\iffalse%\fi constructions, the driver part is then skipped and the documentation and code is typeset.

In the documentation part, the usual LaTeX mark-up can be used, with a few additional macros.

  • \cs{<name>} is used to print a function name, including the leading backslash, in a fixed-width font (and avoiding any category-code issues with the backslash).
  • \meta{<argument>} prints the name of an argument surrounded by angle brackets and printed in italic, so it stands out (as I’ve tried to do here using HTML!).
  • \marg{<argument>} and \oarg{<argument>} print mandatory and optional arguments as ‘{<argument>}’ and ‘[<argument>]’, respectively.
  • \DescribeMacro <csname> prints the argument name as a marginal note, and includes it for indexing and cross-referencing.

In the code section, the code itself is marked off from the documentation both by comment characters (for DocStrip), and some macros for ltxdoc:

%\begin{macro}{\MyMacro}
% This is some text which should explain the code.
%    \begin{macrocode}
\newcommand*\MyMacro[1]{Code here!}
%    \end{macrocode}
%\end{macro}

The \begin{macro}\end{macro} block is used so that cross-references between the code and index work correctly. They are used to show that this block defines \MyMacro (rather than just using it). On the other hand, \begin{macrocode}\end{macrocode} tells ltxdoc where the code is, so that it prints correctly. For mainly historical reasons, there have to be exactly four spaces in % \end{macrocode}!

There are, again, several extra ideas that can be used in the code parts of ltxdoc. However, to try to explain all of them would be to make this post completely impossible to read!

Putting everything together

There is a lot to take in in the dtx format, and a quick survey of CTAN will show that there are lots of ways of using it (before you even look at other approaches). In the next post on this subject, I’m going to present the approach I’ve developed to producing dtx files (with a lot of ideas taken from others, in particular Will Robertson). The aim is to have a single source for the README, ins file, documentation and LaTeX code. We’ll see how it’s possible to do that, and to have the code extract by running tex my.dtx and to typeset the package using latex my.dtx.