Making custom loaders expl3-aware

The expl3 syntax used by the developing programming layer for LaTeX3 is rather different from ‘traditional’ TeX syntax, and therefore needs to be turned on and off using the command pair \ExplSyntaxOn/\ExplSyntaxOff. In package code making use of expl3, the structure

\ExplSyntaxOn % Or implicit from \ProvidesExplPackage
....
\usepackage{somepackage}
....
\ExplSyntaxOff % Or the end of the package

will switch off expl3 syntax for the loading of somepackage and so will work whether this dependency uses expl3 or not.

This is achieved by using the LaTeX2e kernel mechanism \@pushfilename/@popfilename, which exists to deal with the status of @ but which is extended by expl3 to cover the new syntax too. However, this only applies as standard to code loaded using \usepackage (or the lower-level kernel command \@onefilewithoptions). Some bundles, most notable TikZ, provide their own loader commands for specialised files. These can be made ‘expl3-aware’ by including the necessary kernel commands

\def\myloader#1{%
  \@pushfilename
  \xdef\@currname{#1}%
  % Main loader, including \input or similar
  \@popfilename
} 

For packages which also work with formats other than LaTeX, the push and pop steps can be set up using \csname

\def\myloader#1{%
  \csname @pushfilename\endcsname
  \expandafter\xdef\csname @currname\endcsname{#1}%
  % Main loader, including \input or similar
  \csname @popfilename\endcsname
}

Of course, that will only work with LaTeX (the stack is not present in plain TeX or ConTeXt), but as the entire package idea is essentially a LaTeX one that should be a small problem.

TUG2015 Beyond the formal

I’ve given a summary of the ‘formal’ business of each session at TUG2015 over the past few days:

Of course, there was a lot more to the meeting beyond the talk sessions. Stefan Kottwitz covered some of them in a TeX-sx blog post, including a picture of (most of) the TeX-sx regulars in attendance.

It was great to meet people I’ve come across over the years but haven’t met in person: I think the only delegate I’d met before was David Carlisle (who lives less than an hour’s drive from home). So each coffee and lunch break was a (quick) chance to at least say hello to people.

I’m told we’ve not had a proper LaTeX team meeting for 10 years: certainly not before whilst I’ve been on the team. So a lot of the team for me (and the other LaTeX3 people) was taken up with a long list of ‘agenda’ items. We just about got through them by doing to evenings, the last afternoon (before the banquet) and breakfast the day after the conference finished! Hopefully we’ll manage something a bit more regular in the future!

TUG2015 Day Three

The final day of the meeting offered another exciting mix of talks.

Morning three

Session one: Publishing

The day started with Kaveh Barzagan (new TUG President) and Jagath AR from River Valley Technology. They gave us an insight into the serious publishing end of the TeX world. Kaveh showed us the some of the ‘interesting’ features one sees in XML workflows, and explained how TeX can enable a XML-first approach. Jagath then showed us the way that they can integrate rich content into PDFs within this method.

Next was Joachim Schrod, who focussed on dealing with lots of documents: the needs of an online bank. Joachim explained the challenges of created hundreds of thousands of documents a month. In contrast to books or similar, these documents are all very similar and have very limited requirements. What is needed is resilience and speed. We saw how using LaTeX with knowledge of ‘classical’ TeX techniques (DVI mode and \special) can deliver key performance enhancements. He also told us about the challenges, particularly the human one: hiring (La)TeX experts for maintenance is not easy.

The third talk of the session came from S. K. Venkatesan and focussed on using TeX algorithms for scroll-like output. He showed how TeX compares with browsers in this area

Session two

After coffee, I was back on again to talk about \parshape. I talked about different elements of design of text which are best implemented at the primitive level using \parshape. I showed that we can provide interfaces for different aspects of the shape without the end user needing to know about the back-end detail. My talk was quite short but we got a lot of discussion!

Next was Julien Cretel, who talked about ideas for implementing Haskell-like functionality in TeX. Julien explained what he enjoys about functional languages, what has already been done in TeX and what he’d like to achieve. In particular, he focussed on tree data structures.

The final morning talk came from Hans Hagen. He started by showing us one of the challenges of grid setting: how to adjust design to accommodate overheight items. There are a lot of challenges, and he explained that there are probably more ways of tackling the problem than users! He then talked about bibliographies in ConTeXt and the reimplementation recently undertaken to use a flexible approach to cover many times of formatting. Hans finished with ‘ASCII math’ parsing, where all mathematics is represented with plain text. Here, Hans had the issue that the input format is rather flexible and not well defined.

Afternoon

After lunch, we had the group photo: there should be a lot of pictures available, given the number of budding photographers! We then reconvened for the final session.

Boris Vetysman gave his third talk of the meeting, looking at how we can arrangement for parts of files to be omitted from the output. He described two situations, missing out irrelevant data and omitting sensitive data. Boris showed how to tackle these two challenges by skipping material in the first case, and by stripping the sources in the second.

The final talk came from Enrico Gregorio (TeX-sx user egreg) and his recollections as a spurious space catcher. Enrico showed a collection of ‘interesting’ code, either with missing %, extra % or a curious mixture of both. He then showed us how to fix them, moving from manually setting catcodes for spaces and the like to using expl3 to avoid spacing issues, but also to avoid re-inventing the wheel.

Q&A Session

The final session was a free-form Question and Answer session. This led to interesting discussions on BibTeX databases, templates and source highlighting. It also meant we (formally) found out where the next meeting will be: Toronto, some time in early summer 2016.

TUG2015 Day Two

The second day of the meeting had a morning of talks and then the afternoon for the conference outing to the Messel Pit.

Morning two

Session one

The day started with with a talk from Pavneet Arora telling us about something a bit different: detecting water leaks in property. Pavneet focussed on what most users want, the output rather the interface, and how this might lead us to a ‘TeX of Things’. He explained how he’s using TeX as part of a multi-tool chain to provide insight into water flow, using ConTeXt as the mechanism for making reports. All of this was based on Raspberry Pi to target embedded systems.

Tom Hejda then told us about his work creating two document classes: one for a journal and one for producing a thesis, both linked to his university. He contrasted the needs of users for these two document types. He showed us how he’d tackled this, with very different interfaces for two.

Next was Boris Veytsman one creating multiple bibliographies. He started at the end: looking at the ways you can access reference lists. You might want to look at references by date, by author, by reference callout or indeed by something else. Boris explained how he’s learned from his earlier multibibliography package to create a new package nmbib. This allows the user to select one or more views of the bibliography in the output.

Session two

After the coffee break, Boris returned along with Leyla Akhmadeeva looking at supporting a new medicinal institute in Russia. Leyla is a neurologist and laid out the needs for training doctors. Setting up a new institution in Bashkortostan means developing new communication templates. Boris showed us the requirements for multi-language documents following the Russian formal standard. He showed us the challenges of following those standards, particularly for when one of the languages (Bashkir) doesn’t currently have any hyphenation patterns available. He also talked about the design challenges of creating a beamer style using the colour elements from potentially clashing logos.

We then heard from Paul Gessler on converting Git logs into pretty-printed material using TikZ. Paul told us how he got started on the project, answering a question on TeX-StackExchange and getting pulled in by feature requests. He showed us his plans for turning Git branches into PDFs, and also how people have used Git branching to map the Paris Metro!

Question and answer session

The morning session finished with a Q&A to the TUG board. Topics were varied but the focus was on how we attract new users and new developer, and what is the meaning of a user group today. There’s a lot to talk about there, and we broken for lunch with informal chats going on.

Afternoon

The afternoon today features a visit to the Messel Pit. It will be an opportunity to talk about lots of things across the whole group attending. I’ll aim to report back later on the key topics.

TUG2015 Day One

Arrival

The TUG2015 meeting proper started today, but people started meeting up yesterday. I arrived quite late (and indeed later than I’d expected), but a ‘welcome committee’ of TeX-sx regulars were outside the hotel when I got here! It was nice to finally be able to put some faces to the names.

Morning one

For those of us staying at the conference hotel, there was a chance to meet up with other people over breakfast. We then congregated in the meeting room, which filled up nicely as we got up to the business end: the start of the meeting proper.

The organisers have split the talks into broad topics, which makes summarising things a bit easier!

Session one: PDF Output

After a (very) short opening by outgoing TUG President, Steve Peter (in excellent German), we moved on to three talks broadly focussed on aspects of PDF production, and in particular creating ‘rich’ PDFs. Ross Moore started us off by looking at how he’s been tackling making semantic information available in the PDF output from maths. He’s tackling that using dedicated comments (read by his package) and PDF ‘tool tip’ comments. We then heard from Olaf Drümmer from the PDF Association about accessible PDFs: PDF/UA. These developments again keep semantic information in the PDF itself, so it can be parsed by for example screen readers. Ross then returned in a two-hand talk with Peter Selinger to explain work on updating pdfx to generate PDF/A files from pdfTeX and LuaTeX. They told us about the technical challenges and the improvements users will see in there use of the package.

Session two: Unicode

Session two focussed on the challenges with using Unicode-compliant engines, XeTeX and LuaTeX. I started off, talking about how we can get data from Unicode into the engines for text processing. I focussed on two area, setting up things like \catcode and doing case changing. (I’ll probably post the slides an a summary.) Will Robertson then talked about deal with maths in Unicode, and in particular the challenges of matching up the way Unicode describe maths characters with the way (La)TeX describes them. He looked at some of the decisions he made in the unicode-math and how he’s revisiting them. That ran nicely into the final talk of the morning: the GUST TeX team’s first talk from Piotr Strzelczyk on Unicode maths fonts. He focussed on the detailed challenges the team have faces.

Session three: Futures

After lunch (for me, a LaTeX3 chat with Will Robertson and Bruno Le Floch), we headed back for what I think I’d call a ‘futures’ session. Bogusław Jackowskigv have us a ‘big picture': what to do now that they have a ‘complete’ set of the OpenType maths fonts they set out to develop. We heard about the idea of fonts beyond the rigid box model and other exciting horizons. Frank Mittelbach then gave on overview of LaTeX kernel stability over the last 21 years. He looked at recent changes to how the team are making sure that the kernel stays stable while still fixing bugs and how that will work in the future. Hans Hagen then gave us something to think about: ‘what if?’. He talked about how TeX has developed based around the limitations of computers, data structures and ideas over time. The conflicts of the desires of users with technology and with developers was familiar to anyone who does any development.

Session four: News

The final session of the day focussed on ‘announcements’. Joachim Schrod gave us an overview of the structure of CTAN, telling us about the different interfaces for different users, and how the different parts interact. The talk gave us an insight into the hard work that goes on every day making TeX available to us all. I then popped back up for a short announcement about the status of the UK TeX FAQ since it moved to a new server. The formal business finished with a memorial for the losses in the last year to key (ex-)TeX people: Pierre MacKay, Richard Southall and Hermann Zapf. Three moving insights.

Preparations for TUG2015

TUG2015 takes place next week in Darmstadt, and as it’s the first time I’ve been able to go to a TUG meeting I’m really looking forward to it. The programme and participant list both look excellent and it will be good to meet several people in person who I’ve only known to date by e-mail. I’ve managed to end up with three talks to give, so I’m hard at work getting them ready (and hoping I can get something in writing for TUGboat too!). As well as the formal business, there will be lot so chances to chat, not least with several other people on the LaTeX team.

pgfplots: Showing points as just error bars

Presenting experimental work in a clear form is an important skill. For plotting data, I like the excellent pgfplots package, which makes it easy to put together consistent presentations of complex data. At the moment, I’d doing some experiments where showing the error bars on the raw data is important, but at the same time to show fit lines clearly. The best style I’ve seen for this is one where the data are show as simple vertical bars which have length determined by the error bars for the measurements. The fit lines then stand out clearly without overcrowding the plot. That style isn’t built in to pgfplots but it’s easy to set up with a little work:

\documentclass{standalone}
\usepackage{pgfplots}

% Use features from current release
\pgfplotsset{compat = 1.12}

% Error 'sticks'
\pgfplotsset{
  error bars/error mark options = {draw = none}
  % OR more low-level
  % error bars/draw error bar/.code 2 args = {\draw #1 -- #2;} 
}

\begin{document}
\begin{tikzpicture}
  \begin{axis}
    [
      error bars/y dir      = both,
      error bars/y explicit = true,
    ]
    \addplot[draw = none] table[y error index = 2]
      {
        0   0.023 0.204
        1   0.956 0.332
        2   4.234 0.552
        3   8.764 0.345
        4  17.025 0.943
        5  27.201 2.445
      };
    \addplot[color = red, domain = 0:5, samples = 100] {x^2};
  \end{axis}
\end{tikzpicture}
\end{document}

Demo
My demo only has a few data points, but this style really shows it’s worth as the number of points rises.

A new maintainer for etoolbox (and csquotes)

One of the most significant new LaTeX packages of recent years has been biblatex, originally developed by Philipp Lehman and offering an extremely powerful approach to bibliographies. As I’ve covered before, Philipp Lehman vanished from the TeX world a few years ago. To keep biblatex development going, a team was assembled led by Philip Kime. However, Philipp Lehman’s other packages have up to now been left unmaintained.

The LaTeX team are currently working on some LaTeX2e improvements, and they have a knock-on effect on Philipp Lehman’s etoolbox package. To date, it’s automatically loaded etex, but the team are moving that functionality to the LaTeX kernel so it will no longer be needed. Thus we needed to sort out a minor update to etoolbox. As I’m already involved with biblatex, it seemed natural for me to take up this challenge. I’ve therefore forked etoolbox (see The LPPL: ‘maintainer’ or ‘author-maintained’ for why it’s technically a fork), set up a GitHub site and made the changes. Of course, two days after that ‘one off’ fix I got my first bug report!

Philipp Lehman’s other big contribution along with biblatex and etoolbox is csquotes. While I don’t have any immediate need to make a change there, this seems like a good time for someone to pick it up too. So I’ve set up a (technical) fork and GitHub page for that too, and expect to have a few minor changes to make (I’ve had informal discussions about at least one). Should there be a need I’ll also be looking at Philipp’s other packages (he and I had interesting discussions about logreq, for example, and how the ideas might make it into expl3).

Font encodings, hyphenation and Unicode engines

The LaTeX team have over the past couple months been taking a good look at the Unicode TeX engines, XeTeX and LuaTeX, and making efforts to make the LaTeX2e kernel more ‘Unicode aware’. We’ve now started looking at an important question: moving documents from pdfTeX to XeTeX or LuaTeX. There are some important differences in how the engines work, and I’ve discussed some of them in a TeX StackExchange post, but here I’m going to look at one (broad) area in particular: font encodings and hyphenation. To understand the issues, we’ll first need a bit of background: first for ‘traditional’ TeX then for Unicode engines.

Knuth’s TeX (TeX90), e-TeX and pdfTeX are all 8-bit programs. That means that each font loaded with these engines has 256 slots available for different glyphs. TeX works with numerical character codes, not with what we humans think of as characters, and so what’s happening when we give the input

\documentclass{article}
\begin{document}
Hello world
\end{document}

to produce the output is that TeX is using the glyph in position 72 of the current font (‘H’), then position 101 (‘e’), and so on. For that to work and to allow different languages to be supported, we use the concept of font encodings. Depending on the encoding the relationship between character number and glyph appearance varies. So for example with

\documentclass{article}
\usepackage[T1]{fontenc}
\begin{document}
\char200
\end{document}

we get ‘È’ but with

\documentclass{article}
\usepackage[T2A]{fontenc}
\begin{document}
\char200
\end{document}

we get ‘И’ (T2A is a Cyrillic encoding).

This has a knock-on effect on dealing with hyphenation: a word which uses ‘È’ will probably have very different allowed hyphenation positions from one using ‘И’. ‘Traditional’ TeX engines store hyphenation data (‘patterns’) in the format file, and to set that up we therefore need to know which encoding will be used for a particular language. For example, English text uses the T1 encoding while Russian uses T2A. So when the LaTeX format gets built for pdfTeX there is some code which selects the correct encoding and does various bits of set up for each language before reading the patterns.

Unicode engines are different here for a few reasons. Unicode doesn’t need different font encodings to represent all of the glyph slots we need. Instead, there is a much clearer one-to-one relationship between a slot and what it represents. For the Latin-1 range this is (almost) the same as the T1 encoding. However, once we step outside of this all bets are off, and of course beyond the 8-bit range there’s no equivalent at all in classical TeX. That might sound fine (just pick the right encoding), but there’s the hyphenation issue to watch. Some years ago now the hyphenation patterns used by TeX were translated to Unicode form, and these are read natively by XeTeX (more on LuaTeX below). That means that at present XeTeX will only hyphenate text correctly if it’s either using a Unicode font set up or if it’s in a language that is covered by the Latin-1/T1 range: for example English, French or Spanish but not German (as ß is different in T1 from the Latin-1 position).

LuaTeX is something of a special case as it doesn’t save patterns into the format and as the use of ‘callbacks’ allows behaviour to be modified ‘on the fly’. However, at least without some precautions the same ideas apply here: things are not really quite ‘right’ if you try to use a traditional encoding. (Using LuaLaTeX today you get the same result as with XeTeX.)

There are potential ways to fix the above, but at the moment these are not fully worked out. It’s not also clear how practical they might be: for XeTeX, it seems the only ‘correct’ solution is to save all of the hyphenation patterns twice, once for Unicode work and once for using ‘traditional’ encodings.

What does this mean for users? Bottom line: don’t use fontenc with XeTeX or LuaTeX unless your text is covered completely by Latin-1/T1. At the moment, if you try something as simple as

\documentclass{article}
\usepackage[T1]{fontenc}
% A quick test to use inputenc only with pdfTeX
\ifdefined\Umathchar\else
  \usepackage[utf8]{inputenc}
\fi
\begin{document}
straße
\end{document}

then you’ll get a surprise: the output is wrong with XeTeX and LuaTeX. So working today you should (probably) be removing fontenc (and almost certainly loading fontspec) if you are using XeTeX or LuaTeX. The team are working on making this more transparent, but it’s not so easy!