# Active characters again

A while ago I wrote about avoiding active characters. There was a question on the LaTeX3 mailing list recently, where this came up again. So I thought I’d talk about it again here.

ε-TeX provides the primitive \scantokens, which can be used to re-assign the category codes of (most) input. This can be used to make some tokens in the input active, and then swap them for something else. For example:

\begingroup
\catcode\:=13\relax
\gdef\example#1{%
\begingroup
\catcode\:=13\relax
\def:{[colon]}%
\xdef\temp{\scantokens{#1}}%
\endgroup
\temp
}

This will replace every “:” in #1 with “[colon]”. As this is done by the engine, it is pretty fast. With the characters only made active locally, it also looks safe. However, I’ve found that this does not necessarily follow. For example, in siunitx (version 1), there is a problem using htlatex under some circumstances because both want to make ^ active in this way. The other problem is that making characters active in this way makes it impossible to “protect” them from replacement.

The alternative is to look through the input for each “:” and replace it one at a time: this is done in LaTeX3 using \tl_replace_all_in:Nnn. At first sight, this does not look desirable as it is never going to be as fast as using TeX primitives. However, if the code is well written (and \tl_replace_all_in:Nnn certainly is), then there is no need to loop over every token to do the replacement. Whatever code is used for the replacement, the key advantage is that there is no chance of a clash with different packages doing the same thing. It also leaves open the possibility of protecting some tokens from being changed. So I’d always favour avoiding active characters, if at all possible.

### 9 thoughts on “Active characters again”

1. “So I’d always favour avoiding active characters, if at all possible.” – so true… Unfortunately, there’s no (portable) alternative to inputenc (yet)…

2. leo

Hi Joseph,

I got a macro as follows:

begingroup
catcode|=active
gdef|{tabularnewline}
endgroup
newrobustcmdmultiline[2][c]{
begingroup
catcode|=active
setlength{extrarowheight}{0pt}
begin{tabular}{@{}#1@{}}
scantokens{#2}
end{tabular}
endgroup
}

How to make this macro safer? thanks.

Leo

3. Joseph Wright

Hello Marcin,

I was mainly looking at code-level stuff here. For user input, I tend to think that either XeTeX or LuaTeX are much better choices than trying to make UTF-8 work with pdfTeX. LuaTeX is pretty reasonable for general work now, although the lack of higher level LaTeX support is a bit of a pain (of course, if you use ConTeXt all is well).

Joseph

4. Joseph Wright

Hello Leo,

If you are only talking about code you use, then the problem is less pressing: the real troubles start when you write code other people use.

As I explained in my post, if you’re happy to load expl3, then tl_replace_all_in:Nnn would seem easiest:

newrobustcmdmultiline[2][c]{%
setlength{extrarowheight}{0pt}%
begin{tabular}{@{}#1@{}}
deftemp{#2}%
csnamedetokenize{tl_replace_all_in:Nnn}endcsname
temp{|}{tabularnewline}%
temp
end{tabular}
}

I’ve stuck with “traditional” category codes here, hence the csname construction for calling tl_replace_all_in:Nnn (the detokenize avoids any issue if _ or : are active). If you want to avoid loading expl3, then it’s a question of implementing search-and-replace yourself. The expl3 version is efficient but quite intricate!

Joseph

5. leo

Hi Joseph,

Thank you for that answer. Unfortunately I don’t plan to use expl3. I will look at it when I become more comfortable with TeX and LaTeX.

Leo

6. Joseph Wright

Hello Leo,

Assuming you have a recent pdfTeX (or XeTeX), then the following implements the same idea as tl_replace_all_in:Nnn but without expl3:

documentclass{article}
makeatletter
newtoksreplace@toks
newcommandreplace@all@in[3]{%
replace@toks{}%
longdefreplace@all@aux##1#2##2@nil{%
if@no@value{##2}%
{%
replace@toksexpandafterexpandafterexpandafter
{expandaftertheexpandafterreplace@toks##1}%
}%
{%
replace@toksexpandafterexpandafterexpandafter
{expandaftertheexpandafterreplace@toks##1#3}%
replace@all@aux@empty##2@nil
}%
}%
@firstofone{expandafterreplace@all@auxexpandafter@empty}%
#1#2no@value@nil
edef#1{thereplace@toks}%
}
newcommandreplace@all@aux{}
newcommandif@no@value[1]{%
ifnumpdfstrcmp{noexpandno@value}{unexpanded{#1}}=z@relax
expandafter@firstoftwo
else
expandafter@secondoftwo
fi
}
makeatother
begin{document}
makeatletter
deftest{Hello|world}
replace@all@intest{|}{ }
test
makeatother
end{document}


This relies on pdfstrcmp. If it’s not available, then some more code is needed for the comparison test (to do it safely, at least).

Joseph

7. Joseph Wright

I should add that expl3 includes some more refinements to that code, mainly to do with # tokens. However, the principal is obvious.

8. leo

Thank you, Joseph. I will come back to that code later when I finish my task at hand.

Best,

Leo

9. Joseph Wright

No problem Leo: I hope it’s useful.