What does \write18 mean?

I recently talked about converting eps files to pdf format, and mentioned that to do it from within TeX, you need \write18 enabled. However, I failed to say what that means: probably not very helpful.

The TeX \write primitive instruction is used to write to different file ‘streams’; TeX refers to each open file by number, not by name (although most of the time we hide this). Stream 18 is special: it is not a file but means that TeX asking the operative system to do something. To run a command, we put it as the argument to \write18. So to run the epstopdf program on a file with name stored as \epsfilename, we’d do:

\write18{epstopf \epsfilename}

When using something like the epstopdf LaTeX package, that is hidden away and you don’t need to worry about the exact way it’s done. I’m not going to worry about the detail of the \write instruction here!

However, there is a security issue. If you download some TeX code from the Internet, can you be sure that there is not some command in it (perhaps in a hidden way) to do stuff that might be harmful to your PC (lets say delete everything on the hard disk!). So both MiKTeX and TeX Live have traditionally disabled \write18 as standard. To turn it on, both support an additional argument when starting TeX:

(pdf)(la)tex --shell-escape

The problem with this is that most people use (La)TeX via a graphical editor, and each one needs the correct settings changing to use \write18 for every file. It’s also not such a great idea to turn it on for everything: it rather defeats the point of it being off by default!

The latest version of MiKTeX (2.8), and the upcoming TeX Live release (2009) get around this by having a special ‘limited’ version of \write18 enabled ‘out of the box’. The idea is to allow only a pre-set list of commands (for example, BibTeX, epstopdf, TeX itself, and so on). Those on the list are regarded as safe enough to allow, whereas anything else (for example deleting files) still needs to be authorised by the user. This seems to be a good balance: most people most of the time will not need to worry about \write18 at all, but it will be available for things like epstopdf.

A model dtx file

In my previous post, I’ve tried to give a very general overview of how the dtx file format comes about, from a combination of the syntax of DocStrip and ltxdoc. The problem with the bald details is that there are still lots of way to actually use the ideas to construct a dtx. So here I’m going to detail a model dtx, which is ready to be filled in with real code and documentation. The entire file is available here as demopkg.dtx: get it now if you are impatient!

The idea of constructing a dtx file in the way I’ll describe is that it lets us achieve several things in one go:

  • All of the files for a package can be derived from a single source (unless you need a binary, of course).
  • The README is included in the dtx, with this useful information at the start.
  • The ins file is included in the dtx, so the file is self-extracting.
  • Running (pdf)tex <name>.dtx extracts the code and associated files (ins, README, etc.).
  • Running (pdf)latex <name>.dtx does the extraction then typesets the documentation. This way, the documentation always has the latest code available, and users don’t need to worry about which method they use to get stuff extracted.

Most of the ideas here are not mine: Will Robertson came up with a lot of this. I’m just going to give some details of what is going on. I’m going to present the source in order, with a section of the source followed by some comments explaining what is going on. I’m going to call the demonstration package ‘demopkg’: something easy for search and replace. Where ever possible, \jobname is used in the source so that the file name changes automatically when moving from one package to another.

% \iffalse meta-comment
% !TEX program  = pdfLaTeX

The file starts off with an \iffalse which will mean that ltxdoc will skip all of this code when typesetting the document. I use TeXworks as my editor, so I include the special !TEX program comment so that it defaults to pdfLaTeX with all of my files: this does no harm so may as well be there. The same comment is also recognised by TeXShop.

%<*internal>
\iffalse
%</internal>

There is then a guard called ‘internal’: this is never extracted out, but lets us have an uncommented \iffalse in the code. which will mean that the next section will be ignored by TeX initially. The idea here is that we are going to have some text (the README), that TeX would otherwise try to typeset. We don’t want that, so need to skip it at the moment.

%<*readme>
----------------------------------------------------------------
demopkg --- description text
E-mail: you@your.domain
Released under the LaTeX Project Public License v1.3c or later
See http://www.latex-project.org/lppl.txt
----------------------------------------------------------------

Some text about the package: probably the same as the abstract.
%</readme>

This part is pretty obvious: the README file for the package, inside guards called ‘readme’. As you might expect, this will get extracted out later as the README file. In the initial TeX run, this text will be skipped (because of the \iffalse), but when DocStrip runs it will show up (as DocStrip will ignore the \iffalse, which is in a different set of guards).

%<*internal>
\fi
\def\nameofplainTeX{plain}
\ifx\fmtname\nameofplainTeX\else
  \expandafter\begingroup
\fi
%</internal>

Back with the special ‘internal’ guards, the \iffalse is ended and a check is made on the current format. For LaTeX, a group needs to be begun so that DocStrip can be loaded without later problems. For plain TeX, only the extraction is going to happen, so that is not an issue.

%<*install>
\input docstrip.tex
\keepsilent
\askforoverwritefalse

The next section, inside ‘install’ guards, is the instructions for extracting the code out of the dtx. Later, this will also turn into a stand-alone ins file. DocStrip gets loaded, then we tell it to do its job without asking for any conformation or printing too much stuff.

\preamble
----------------------------------------------------------------
demopkg --- description text
E-mail: you@your.domain
Released under the LaTeX Project Public License v1.3c or later
See http://www.latex-project.org/lppl.txt
----------------------------------------------------------------

\endpreamble
\postamble

Copyright (C) 2009 by You <you@your.domain>

This work may be distributed and/or modified under the
conditions of the LaTeX Project Public License (LPPL), either
version 1.3c of this license or (at your option) any later
version.  The latest version of this license is in the file:

http://www.latex-project.org/lppl.txt

This work is "maintained" (as per LPPL maintenance status) by
You.

This work consists of the file  demopkg.dtx
and the derived files           demopkg.ins,
                                demopkg.pdf and
                                demopkg.sty.

\endpostamble

Some simple boiler-plate text, that DocStrip will add to the start and end of each extracted file. Of course, this can say what you like.

\usedir{tex/latex/demopkg}
\generate{
  \file{\jobname.sty}{\from{\jobname.dtx}{package}}
}

This section is the instruction to actually extract the LaTeX package file from the dtx. Each file to be extracted needs a line saying how to create it, so if there is a class to extract there would be a line for that, and so on. The \usedir instruction can be used to tell DocStrip how to lay files out: it is best to include it as some people use this. Normally, it will just specify tex/latex/<package>, but might change if there are lots of files to lay out in a structured way. For example, cfg files are often put in tex/latex/<package>/config.

%</install>
%<install>\endbatchfile

That ends what will get extracted into the ins file, so the install guard is closed. The second line is needed as the ins file needs to include \endbatchfile (for DocStrip), but we don’t want the same effect when the dtx is doing the extracting.

%<*internal>
\usedir{source/latex/demopkg}
\generate{
  \file{\jobname.ins}{\from{\jobname.dtx}{install}}
}
\nopreamble\nopostamble
\usedir{doc/latex/demopkg}
\generate{
  \file{README.txt}{\from{\jobname.dtx}{readme}}
}
\ifx\fmtname\nameofplainTeX
  \expandafter\endbatchfile
\else
  \expandafter\endgroup
\fi
%</internal>

When extracting the dtx (with TeX or LaTeX), we need to generate the ins file and the README, which is done here. The ins file is quite simple: the the same process as the sty file. However, there are a couple of points about the README. First, we don’t want DocStrip to add any extra text, hence \nopreamble and \nopostamble. Second, DocStrip can only make files with extensions, so the file has to be called README.txt. (It can be renamed later: hopefully there is no loss of clarity.) If plain TeX is in use, that is the end of the processing, whereas for LaTeX the group containing DocStrip can be closed.

%<*package>
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{demopkg}[2009/10/06 v1.0 description text]
%</package>

Next, the fact that DocStrip can process blocks in different places can be used for the same file. This part of the package does not really need to be printed later on, and done this way the version number is included near the top of the source. Things don’t have to be done this way: this section can always be left out if you like.

%<*driver>
\documentclass{ltxdoc}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage{\jobname}
\usepackage[numbered]{hypdoc}
\EnableCrossrefs
\CodelineIndex
\RecordChanges
\begin{document}
  \DocInput{\jobname.dtx}
\end{document}
%</driver>

The next block is the driver: this is the information used to typeset the code and documentation. I normally load the package I’m talking about so that I can use it in the documentation, and load a few refinements (modern fonts, hyperdoc to get hyperlinks, and so on). There are a few ltxdoc-specific instructions here: they mean that we get a proper index and information linking macro use information to the code.

% \fi

This matches the \iffalse in the very first line of the file: it marks the beginning of material which will actually be typeset.

%
%\GetFileInfo{\jobname.sty}
%
%\title{^^A
%  \textsf{demopkg} --- description text\thanks{^^A
%    This file describes version \fileversion, last revised \filedate.^^A
%  }^^A
%}
%\author{^^A
%  You\thanks{E-mail: you@your.domain}^^A
%}
%\date{Released \filedate}
%
%\maketitle
%

Here, the title is set up and printed. A few things to notice here. By using \GetFileInfo, the version and date information are picked up from the package itself: no repetition of the information is needed in the dtx. Also, we can’t use % as a comment character, and so ltxdoc sets up ^^A to do the job instead.

%\changes{v1.0}{2009/10/06}{First public release}

General changes (not associated with any particular macro) are best listed somewhere early on. These will be used by the \PrintChanges macro to provide users with a change log.

%
%\DescribeMacro{\examplemacro}
% Some text about an example macro called \cs{examplemacro}, which
% might have an optional argument \oarg{arg1} and mandatory one
% \marg{arg2}.
%

This is where the documentation goes. I’ve included an example macro with a couple of
arguments as reminders of the syntax.

%\StopEventually{^^A
%  \PrintChanges
%  \PrintIndex
%}
%

This macro marks the end of the user part of the documentation. The two functions in the argument
will be used either here (if the code is not typeset) or after the code (if it is typeset). As the dtx file is now,
the code will print. However, in the next blog post I’ll talk about printing only the documentation and
missing the code out.

%    \begin{macrocode}
%<*package>
%    \end{macrocode}

The lead off for the package code itself opens the guard for extracting the code. Normally, I like to have this on its own, to remind where what is going on.

%
%\begin{macro}{\examplemacro}
%\changes{v1.0}{2009/10/06}{Some change from the previous version}
%    \begin{macrocode}
\newcommand*\examplemacro[2][]{%
  Some code here, probably
}
%    \end{macrocode}
%\end{macro}
%

Here we have some code: separated out using the macrocode environment. As I described in the last post, the \begin{macro}\end{macro} block indicates that this is where \examplemacro is defined: indexing needs to know this. The \changes given in the code block only get printed if the code is typeset. They are therefore best used for low-level information, rather than usage changes that users need to know about.

%    \begin{macrocode}
%</package>
%    \end{macrocode}
%\Finale

The last part of the file: close the guard for the code, and call \Finale. This runs anything delayed from the earlier \StopEventually, so in this case the index will get printed here if the code is typeset.

The dtx format

A few comments and e-mails have prodded me to put down some thoughts on sorting out LaTeX packages. My original thought was to do one post, but I think a few are needed. So this is the first of a related series: probably three or four posts. I’m going to start with an overview of the DTX format, used for the source of a lot of LaTeX packages.

Why dtx at all?

The first question to consider is why people bother with the dtx format. Not everyone does, and there are good arguments for and against each different approach. I could easily write an entire post just discussing the various alternatives, but that’s not what I want to do here!

The dtx format is favoured by the LaTeX team, and so is something of a standard for LaTeX package authors. The idea of the dtx format is that it allows the package author to put the user documentation, code documentation and code itself in one place. The user documentation can be typeset on its own, or the user and code documentation can be typeset together (literate programming). The code can also be extracted from the source for use: this means that more than one file can be included in the same source. This last point is perhaps the biggest selling point of the dtx format: you can include LaTeX package, class and some configuration files in one source. There is also a speed gain from removing redundant (comment) lines from a package: on a modern PC, this is pretty tiny, but was a bigger point in the past.

How the format comes about

The dtx format is really defined by two mechanisms, provided by two separate (La)TeX packages. The ability to take one source file and generate several production files, with the comments removed, is provided by the DocStrip package (which is written in plain TeX). Documenting the code, and providing user details, is supported by a dedicated LaTeX class: ltxdoc. (This is itself based on the doc class, but I’m going to focus on ltxdoc.) The combination of the syntax for the two parts of the mechanism leads to the dtx file format.

It is quite possible to use dtx files as the source for things other than (La)TeX files. Indeed, in my second post I’m going to use the dtx format to also include a plain text file in the source. However, I’m not going to talk about making changes to include other types of code: this is doable, but a bit advanced to go into here!

DocStrip: guards and extracting

The DocStrip TeX file provides a mechanism to do two related tasks:

  1. Remove the comment lines from a source file
  2. Produce several production files from one source

To do this, the source file itself (normally a .dtx file) needs to be accompanied by a set of instructions on how to do the extraction (normally a .ins file). The two tasks are inter-related: DocStrip always has to generate a new file to remove the comment lines, even if the source is only for one file.

Removing the comment lines is relatively easy to understand. Any line in the source starting with one % will not appear in the generated file(s). So code lines in the source are written as normal, and any comments that should appear in the generated files need to start with two (or more) % characters.

The more complex idea is setting up the source so that several files can be generated from one source. This uses so-called guards to indicate what goes with what. Let’s imagine we have a very simple source, which will be used to generate two LaTeX packages. The two packages share some code, so we don’t want to repeat it in the source file.

%<*PackageA>
Code just for package A
%</PackageA>
%<*PackageB>
Code just for package B
%</PackageB>
%<*PackageA|PackageB>
Code for both packages
%</PackageA|PackageB>

What is happening here? Each guard line is a comment (so it will not appear in the production file), and is enclosed in angle brackets. The first line, %<*PackageA>, is a guard starting lines that will only appear when extracting PackageA. That continues until the matching closing guard, %</PackageA>. There is then a section that applies only to PackageB, marked up in much the same way. The final set of guards use the | symbol to mean ‘or’: lines here will appear in both PackageA and PackageB. You can do more complex things (nest guards, use & as a logical ‘and’, etc.), but the basic idea remains the same.

I’ve said a couple of times that the code is extracted, and that this needs some instructions for DocStrip. Essentially, this means matching up the names of the guards with the files they relate to. In my simple example, a suitable DocStrip .ins file would be

\input docstrip
\askforoverwritefalse
\generate{
  \file{PackageA.sty}{\from{example.dtx}{PackageA}}
  \file{PackageB.sty}{\from{example.dtx}{PackageB}}
}

Here, I’ve assumed the dtx file is called example.dtx. I’m only using one dtx file, and one guard for each output file. As with many other parts of DocStrip, there is more you can do.

Multiple guards can be used for each package, so if we had lots of packages with some common code, we might well have:

\input docstrip
\askforoverwritefalse
\generate{
  \file{PackageA.sty}{\from{example.dtx}{PackageA,common}}
  \file{PackageB.sty}{\from{example.dtx}{PackageB,common}}
  \file{PackageC.sty}{\from{example.dtx}{PackageC,common}}
}

and so on.

So DocStrip lets us extract code out, have common sections, remove comments and so on. However, it doesn’t help with the documentation side at all: that is all just comments to DocStrip.

ltxdoc: documenting the source

The ltxdoc class is the typesetting part of the dtx format. The idea is that the dtx is read in by a driver file, which actually does the typesetting. When the dtx is read in this way, the comment characters are ignored, meaning that what DocStrip sees as comments are the source for typesetting the dtx. In practice, most dtx files are written so that the driver is part of the dtx itself. The driver part of a dtx is normally very simple

\documentclass{ltxdoc}
% Perhaps some \usepackage instructions
\begin{document}
  \DocInput{\jobname.dtx}
\end{document}

Usually, the documentation and code then follows after \end{document}: with some correctly placed %\iffalse%\fi constructions, the driver part is then skipped and the documentation and code is typeset.

In the documentation part, the usual LaTeX mark-up can be used, with a few additional macros.

  • \cs{<name>} is used to print a function name, including the leading backslash, in a fixed-width font (and avoiding any category-code issues with the backslash).
  • \meta{<argument>} prints the name of an argument surrounded by angle brackets and printed in italic, so it stands out (as I’ve tried to do here using HTML!).
  • \marg{<argument>} and \oarg{<argument>} print mandatory and optional arguments as ‘{<argument>}’ and ‘[<argument>]’, respectively.
  • \DescribeMacro <csname> prints the argument name as a marginal note, and includes it for indexing and cross-referencing.

In the code section, the code itself is marked off from the documentation both by comment characters (for DocStrip), and some macros for ltxdoc:

%\begin{macro}{\MyMacro}
% This is some text which should explain the code.
%    \begin{macrocode}
\newcommand*\MyMacro[1]{Code here!}
%    \end{macrocode}
%\end{macro}

The \begin{macro}\end{macro} block is used so that cross-references between the code and index work correctly. They are used to show that this block defines \MyMacro (rather than just using it). On the other hand, \begin{macrocode}\end{macrocode} tells ltxdoc where the code is, so that it prints correctly. For mainly historical reasons, there have to be exactly four spaces in % \end{macrocode}!

There are, again, several extra ideas that can be used in the code parts of ltxdoc. However, to try to explain all of them would be to make this post completely impossible to read!

Putting everything together

There is a lot to take in in the dtx format, and a quick survey of CTAN will show that there are lots of ways of using it (before you even look at other approaches). In the next post on this subject, I’m going to present the approach I’ve developed to producing dtx files (with a lot of ideas taken from others, in particular Will Robertson). The aim is to have a single source for the README, ins file, documentation and LaTeX code. We’ll see how it’s possible to do that, and to have the code extract by running tex my.dtx and to typeset the package using latex my.dtx.