Some TeX Developments

Coding in the TeX world

Active characters again

with 9 comments

A while ago I wrote about avoiding active characters. There was a question on the LaTeX3 mailing list recently, where this came up again. So I thought I’d talk about it again here.

ε-TeX provides the primitive \scantokens, which can be used to re-assign the category codes of (most) input. This can be used to make some tokens in the input active, and then swap them for something else. For example:

\begingroup
  \catcode`\:=13\relax
  \gdef\example#1{%
    \begingroup
      \catcode`\:=13\relax
      \def:{[colon]}%
      \xdef\temp{\scantokens{#1}}%
    \endgroup
    \temp
  }

This will replace every “:” in #1 with “[colon]”. As this is done by the engine, it is pretty fast. With the characters only made active locally, it also looks safe. However, I’ve found that this does not necessarily follow. For example, in siunitx (version 1), there is a problem using htlatex under some circumstances because both want to make ^ active in this way. The other problem is that making characters active in this way makes it impossible to “protect” them from replacement.

The alternative is to look through the input for each “:” and replace it one at a time: this is done in LaTeX3 using \tl_replace_all_in:Nnn. At first sight, this does not look desirable as it is never going to be as fast as using TeX primitives. However, if the code is well written (and \tl_replace_all_in:Nnn certainly is), then there is no need to loop over every token to do the replacement. Whatever code is used for the replacement, the key advantage is that there is no chance of a clash with different packages doing the same thing. It also leaves open the possibility of protecting some tokens from being changed. So I’d always favour avoiding active characters, if at all possible.

Written by Joseph Wright

December 9th, 2009 at 7:19 pm

Posted in LaTeX

9 Responses to 'Active characters again'

Subscribe to comments with RSS or TrackBack to 'Active characters again'.

  1. “So I’d always favour avoiding active characters, if at all possible.” – so true… Unfortunately, there’s no (portable) alternative to inputenc (yet)…

    Marcin

    10 Dec 09 at 12:22 am

  2. Hi Joseph,

    I got a macro as follows:

    \begingroup
    \catcode`\|=\active
    \gdef|{\tabularnewline}
    \endgroup
    \newrobustcmd\multiline[2][c]{
    \begingroup
    \catcode`\|=\active
    \setlength{\extrarowheight}{0pt}
    \begin{tabular}{@{}#1@{}}
    \scantokens{#2}
    \end{tabular}
    \endgroup
    }

    How to make this macro safer? thanks.

    Leo

    leo

    10 Dec 09 at 3:01 am

  3. Hello Marcin,

    I was mainly looking at code-level stuff here. For user input, I tend to think that either XeTeX or LuaTeX are much better choices than trying to make UTF-8 work with pdfTeX. LuaTeX is pretty reasonable for general work now, although the lack of higher level LaTeX support is a bit of a pain (of course, if you use ConTeXt all is well).

    Joseph

    Joseph Wright

    10 Dec 09 at 9:09 pm

  4. Hello Leo,

    If you are only talking about code you use, then the problem is less pressing: the real troubles start when you write code other people use.

    As I explained in my post, if you’re happy to load expl3, then \tl_replace_all_in:Nnn would seem easiest:

    \newrobustcmd\multiline[2][c]{%
    \setlength{\extrarowheight}{0pt}%
    \begin{tabular}{@{}#1@{}}
    \def\temp{#2}%
    \csname\detokenize{tl_replace_all_in:Nnn}\endcsname
    \temp{|}{\tabularnewline}%
    \temp
    \end{tabular}
    }

    I’ve stuck with “traditional” category codes here, hence the \csname construction for calling \tl_replace_all_in:Nnn (the \detokenize avoids any issue if _ or : are active). If you want to avoid loading expl3, then it’s a question of implementing search-and-replace yourself. The expl3 version is efficient but quite intricate!

    Joseph

    Joseph Wright

    10 Dec 09 at 9:18 pm

  5. Hi Joseph,

    Thank you for that answer. Unfortunately I don’t plan to use expl3. I will look at it when I become more comfortable with TeX and LaTeX.

    Leo

    leo

    11 Dec 09 at 12:59 pm

  6. Hello Leo,

    Assuming you have a recent pdfTeX (or XeTeX), then the following implements the same idea as \tl_replace_all_in:Nnn but without expl3:

    \documentclass{article}
    \makeatletter
    \newtoks\replace@toks
    \newcommand\replace@all@in[3]{%
      \replace@toks{}%
      \long\def\replace@all@aux##1#2##2\@nil{%
        \if@no@value{##2}%
          {%
            \replace@toks\expandafter\expandafter\expandafter
              {\expandafter\the\expandafter\replace@toks##1}%
          }%
          {%
            \replace@toks\expandafter\expandafter\expandafter
              {\expandafter\the\expandafter\replace@toks##1#3}%
            \replace@all@aux\@empty##2\@nil
          }%
      }%
      \@firstofone{\expandafter\replace@all@aux\expandafter\@empty}%
      #1#2\no@value\@nil
      \edef#1{\the\replace@toks}%
    }
    \newcommand\replace@all@aux{}
    \newcommand\if@no@value[1]{%
      \ifnum\pdfstrcmp{\noexpand\no@value}{\unexpanded{#1}}=\z@\relax
        \expandafter\@firstoftwo
      \else
        \expandafter\@secondoftwo
      \fi
    }
    \makeatother
    \begin{document}
    \makeatletter
    \def\test{Hello|world}
    \replace@all@in\test{|}{ }
    \test
    \makeatother
    \end{document}
    

    This relies on \pdfstrcmp. If it’s not available, then some more code is needed for the comparison test (to do it safely, at least).

    Joseph

    Joseph Wright

    11 Dec 09 at 1:44 pm

  7. I should add that expl3 includes some more refinements to that code, mainly to do with # tokens. However, the principal is obvious.

    Joseph Wright

    11 Dec 09 at 6:17 pm

  8. Thank you, Joseph. I will come back to that code later when I finish my task at hand.

    Best,

    Leo

    leo

    13 Dec 09 at 7:32 pm

  9. No problem Leo: I hope it’s useful.

    Joseph Wright

    16 Dec 09 at 5:25 pm

Leave a Reply