Active characters again

A while ago I wrote about avoiding active characters. There was a question on the LaTeX3 mailing list recently, where this came up again. So I thought I’d talk about it again here.

ε-TeX provides the primitive \scantokens, which can be used to re-assign the category codes of (most) input. This can be used to make some tokens in the input active, and then swap them for something else. For example:

\begingroup
  \catcode`\:=13\relax
  \gdef\example#1{%
    \begingroup
      \catcode`\:=13\relax
      \def:{[colon]}%
      \xdef\temp{\scantokens{#1}}%
    \endgroup
    \temp
  }

This will replace every “:” in #1 with “[colon]”. As this is done by the engine, it is pretty fast. With the characters only made active locally, it also looks safe. However, I’ve found that this does not necessarily follow. For example, in siunitx (version 1), there is a problem using htlatex under some circumstances because both want to make ^ active in this way. The other problem is that making characters active in this way makes it impossible to “protect” them from replacement.

The alternative is to look through the input for each “:” and replace it one at a time: this is done in LaTeX3 using \tl_replace_all_in:Nnn. At first sight, this does not look desirable as it is never going to be as fast as using TeX primitives. However, if the code is well written (and \tl_replace_all_in:Nnn certainly is), then there is no need to loop over every token to do the replacement. Whatever code is used for the replacement, the key advantage is that there is no chance of a clash with different packages doing the same thing. It also leaves open the possibility of protecting some tokens from being changed. So I’d always favour avoiding active characters, if at all possible.

9 thoughts on “Active characters again

  1. “So I’d always favour avoiding active characters, if at all possible.” – so true… Unfortunately, there’s no (portable) alternative to inputenc (yet)…

  2. Hi Joseph,

    I got a macro as follows:

    begingroup
    catcode`|=active
    gdef|{tabularnewline}
    endgroup
    newrobustcmdmultiline[2][c]{
    begingroup
    catcode`|=active
    setlength{extrarowheight}{0pt}
    begin{tabular}{@{}#1@{}}
    scantokens{#2}
    end{tabular}
    endgroup
    }

    How to make this macro safer? thanks.

    Leo

  3. Hello Marcin,

    I was mainly looking at code-level stuff here. For user input, I tend to think that either XeTeX or LuaTeX are much better choices than trying to make UTF-8 work with pdfTeX. LuaTeX is pretty reasonable for general work now, although the lack of higher level LaTeX support is a bit of a pain (of course, if you use ConTeXt all is well).

    Joseph

  4. Hello Leo,

    If you are only talking about code you use, then the problem is less pressing: the real troubles start when you write code other people use.

    As I explained in my post, if you’re happy to load expl3, then tl_replace_all_in:Nnn would seem easiest:

    newrobustcmdmultiline[2][c]{%
    setlength{extrarowheight}{0pt}%
    begin{tabular}{@{}#1@{}}
    deftemp{#2}%
    csnamedetokenize{tl_replace_all_in:Nnn}endcsname
    temp{|}{tabularnewline}%
    temp
    end{tabular}
    }

    I’ve stuck with “traditional” category codes here, hence the csname construction for calling tl_replace_all_in:Nnn (the detokenize avoids any issue if _ or : are active). If you want to avoid loading expl3, then it’s a question of implementing search-and-replace yourself. The expl3 version is efficient but quite intricate!

    Joseph

  5. Hi Joseph,

    Thank you for that answer. Unfortunately I don’t plan to use expl3. I will look at it when I become more comfortable with TeX and LaTeX.

    Leo

  6. Hello Leo,

    Assuming you have a recent pdfTeX (or XeTeX), then the following implements the same idea as tl_replace_all_in:Nnn but without expl3:

    documentclass{article}
    makeatletter
    newtoksreplace@toks
    newcommandreplace@all@in[3]{%
      replace@toks{}%
      longdefreplace@all@aux##1#2##2@nil{%
        if@no@value{##2}%
          {%
            replace@toksexpandafterexpandafterexpandafter
              {expandaftertheexpandafterreplace@toks##1}%
          }%
          {%
            replace@toksexpandafterexpandafterexpandafter
              {expandaftertheexpandafterreplace@toks##1#3}%
            replace@all@aux@empty##2@nil
          }%
      }%
      @firstofone{expandafterreplace@all@auxexpandafter@empty}%
      #1#2no@value@nil
      edef#1{thereplace@toks}%
    }
    newcommandreplace@all@aux{}
    newcommandif@no@value[1]{%
      ifnumpdfstrcmp{noexpandno@value}{unexpanded{#1}}=z@relax
        expandafter@firstoftwo
      else
        expandafter@secondoftwo
      fi
    }
    makeatother
    begin{document}
    makeatletter
    deftest{Hello|world}
    replace@all@intest{|}{ }
    test
    makeatother
    end{document}
    

    This relies on pdfstrcmp. If it’s not available, then some more code is needed for the comparison test (to do it safely, at least).

    Joseph

  7. I should add that expl3 includes some more refinements to that code, mainly to do with # tokens. However, the principal is obvious.

Leave a Reply