Active characters again
A while ago I wrote about avoiding active characters. There was a question on the LaTeX3 mailing list recently, where this came up again. So I thought I’d talk about it again here.
ε-TeX provides the primitive \scantokens, which can be used to re-assign the category codes of (most) input. This can be used to make some tokens in the input active, and then swap them for something else. For example:
\begingroup
\catcode`\:=13\relax
\gdef\example#1{%
\begingroup
\catcode`\:=13\relax
\def:{[colon]}%
\xdef\temp{\scantokens{#1}}%
\endgroup
\temp
}
This will replace every “:” in #1 with “[colon]”. As this is done by the engine, it is pretty fast. With the characters only made active locally, it also looks safe. However, I’ve found that this does not necessarily follow. For example, in siunitx (version 1), there is a problem using htlatex under some circumstances because both want to make ^ active in this way. The other problem is that making characters active in this way makes it impossible to “protect” them from replacement.
The alternative is to look through the input for each “:” and replace it one at a time: this is done in LaTeX3 using \tl_replace_all_in:Nnn. At first sight, this does not look desirable as it is never going to be as fast as using TeX primitives. However, if the code is well written (and \tl_replace_all_in:Nnn certainly is), then there is no need to loop over every token to do the replacement. Whatever code is used for the replacement, the key advantage is that there is no chance of a clash with different packages doing the same thing. It also leaves open the possibility of protecting some tokens from being changed. So I’d always favour avoiding active characters, if at all possible.
“So I’d always favour avoiding active characters, if at all possible.” – so true… Unfortunately, there’s no (portable) alternative to inputenc (yet)…
Marcin
10 Dec 09 at 12:22 am
Hi Joseph,
I got a macro as follows:
\begingroup
\catcode`\|=\active
\gdef|{\tabularnewline}
\endgroup
\newrobustcmd\multiline[2][c]{
\begingroup
\catcode`\|=\active
\setlength{\extrarowheight}{0pt}
\begin{tabular}{@{}#1@{}}
\scantokens{#2}
\end{tabular}
\endgroup
}
How to make this macro safer? thanks.
Leo
leo
10 Dec 09 at 3:01 am
Hello Marcin,
I was mainly looking at code-level stuff here. For user input, I tend to think that either XeTeX or LuaTeX are much better choices than trying to make UTF-8 work with pdfTeX. LuaTeX is pretty reasonable for general work now, although the lack of higher level LaTeX support is a bit of a pain (of course, if you use ConTeXt all is well).
Joseph
Joseph Wright
10 Dec 09 at 9:09 pm
Hello Leo,
If you are only talking about code you use, then the problem is less pressing: the real troubles start when you write code other people use.
As I explained in my post, if you’re happy to load expl3, then \tl_replace_all_in:Nnn would seem easiest:
\newrobustcmd\multiline[2][c]{%
\setlength{\extrarowheight}{0pt}%
\begin{tabular}{@{}#1@{}}
\def\temp{#2}%
\csname\detokenize{tl_replace_all_in:Nnn}\endcsname
\temp{|}{\tabularnewline}%
\temp
\end{tabular}
}
I’ve stuck with “traditional” category codes here, hence the \csname construction for calling \tl_replace_all_in:Nnn (the \detokenize avoids any issue if _ or : are active). If you want to avoid loading expl3, then it’s a question of implementing search-and-replace yourself. The expl3 version is efficient but quite intricate!
Joseph
Joseph Wright
10 Dec 09 at 9:18 pm
Hi Joseph,
Thank you for that answer. Unfortunately I don’t plan to use expl3. I will look at it when I become more comfortable with TeX and LaTeX.
Leo
leo
11 Dec 09 at 12:59 pm
Hello Leo,
Assuming you have a recent pdfTeX (or XeTeX), then the following implements the same idea as \tl_replace_all_in:Nnn but without expl3:
\documentclass{article} \makeatletter \newtoks\replace@toks \newcommand\replace@all@in[3]{% \replace@toks{}% \long\def\replace@all@aux##1#2##2\@nil{% \if@no@value{##2}% {% \replace@toks\expandafter\expandafter\expandafter {\expandafter\the\expandafter\replace@toks##1}% }% {% \replace@toks\expandafter\expandafter\expandafter {\expandafter\the\expandafter\replace@toks##1#3}% \replace@all@aux\@empty##2\@nil }% }% \@firstofone{\expandafter\replace@all@aux\expandafter\@empty}% #1#2\no@value\@nil \edef#1{\the\replace@toks}% } \newcommand\replace@all@aux{} \newcommand\if@no@value[1]{% \ifnum\pdfstrcmp{\noexpand\no@value}{\unexpanded{#1}}=\z@\relax \expandafter\@firstoftwo \else \expandafter\@secondoftwo \fi } \makeatother \begin{document} \makeatletter \def\test{Hello|world} \replace@all@in\test{|}{ } \test \makeatother \end{document}This relies on \pdfstrcmp. If it’s not available, then some more code is needed for the comparison test (to do it safely, at least).
Joseph
Joseph Wright
11 Dec 09 at 1:44 pm
I should add that expl3 includes some more refinements to that code, mainly to do with # tokens. However, the principal is obvious.
Joseph Wright
11 Dec 09 at 6:17 pm
Thank you, Joseph. I will come back to that code later when I finish my task at hand.
Best,
Leo
leo
13 Dec 09 at 7:32 pm
No problem Leo: I hope it’s useful.
Joseph Wright
16 Dec 09 at 5:25 pm