Biblatex back-end update

I wrote recently abut the need to manage biblatex back-ends, and thoughts the maintenance team had on which way to proceed.

After a bit of thought, we’ve gone for a solution we hope works for users and for us. Reversing what seemed like a ‘good idea at the time’, we’ve re-integrated (almost) all of the TeX code into one pathway. This is focussed on Biber, but we have a small stub that converts the relevant parts to work with BibTeX. So users how don’t need Biber or can’t use it can still use BibTeX and get mainly the same results.

There are a few changes, mainly related to places where BibTeX and Biber had ended up with different interfaces and where BibTeX can’t (reasonably) support the Biber one. In those places we’ve dropped the ability to use a ‘BibTeX alternative pathway’ as it makes the code complex and makes switching between back-ends tricky.

Hopefully the balance we’ve gone for will work for everyone: you can still use BibTeX, it still does almost everything it always has with biblatex, but we’ve got a more sustainable code base for the future.

biblatex feedback

Getting an idea of what users actually use can sometimes be tricky. To help understand how people are using biblatex, there’s a short survey set up. In particular, we are looking for some idea of how people use back-ends: as I’ve mentioned before, having two of them means more ‘interesting’ development so it’s important to get some insight into real life use.

Managing biblatex backends

For the past few years, biblatex has been looked after by a small team led by Philip Kime. When we first took this up we were mindful of the need to think carefully about back-ends: how reference data is extracted from a .bib file or similar sources.

There are currently two back-ends for biblatex: BibTeX and Biber. Biber is where the development is taking place and offers Unicode support, whilst BibTeX itself is frozen but also does a lot less ‘stuff’. So there are several features in biblatex that are Biber-only. When the current team took over the maintained, there was consideration of dropping BibTeX entirely. Philip and I have discussed this quite a bit, as the original biblatex developer (Philippe Lehman) picked BibTeX as the original back-end from necessity. (Biber was developed after biblatex, and for extracting and sorting data there was only originally BibTeX.) We decided against it some time ago: what the BibTeX back-end offers is stability but also speed, precisely as it’s more limited that Biber. At least for people like me, in the physical sciences and writing in western European scripts, the BibTeX back-end is perfectly usable.

The way we originally decided to allow continued Biber improvement but keep BibTeX use stable was to split the LaTeX code into two paths. That made sense with the proviso that new Biber features were essentially extensions to the code rather than any changes to existing ideas. However, over time that’s not quite worked out, particularly recently with the new data model driven approach that Philip has developed for Biber. As I’ve detailed elsewhere, that’s led to a new (breaking) syntax for \DeclareNameFormat, as well as various other changes that could be covered in BibTeX but to-date haven’t been. We’ve therefore decided we need to look again at this.

The current plan is for me to work on re-integrating the two back-end code paths, which I’m doing in a fork of biblatex as it’s non-trivial and I don’t want to mess the main development line up. I’ll also look to extend the BibTeX back-end code as appropriate such that we get back to the differences being about the differing capabilities of the back-ends rather than anything in the LaTeX code. I need a little while to do that, probably a couple of months. However, if I get it right we should be in a much stronger position for the future.

biblatex: A new syntax for \DeclareNameFormat

The ‘traditional’ BibTeX model for dividing up names is based around four parts:

  • First name(s)
  • Last name(s)
  • Prefix(es) (the ‘von part’)
  • Suffix(es) (the ‘junior part’)

This works well for many western European names, but falls down for many cases.

As part of Biber/biblatex developments, Philippe Kime has been working on moving beyond this rigid model for names to allow true flexibility. However, this comes with a caveat: a breaking change to \DeclareNameFormat in biblatex. The older syntax takes hard-wired arguments for each name part, but that obviously can’t be extended. The new format only deals with one argument (the name as a whole), but this requires changes to (non-standard) styles.

At the moment, the change is only true for Biber, which means some conditional code is needed. The best way to do that is to test for the older (BibTeX) back-end. For example, in the latest release of biblatex-chem I have in chem-acs.bbx:

% Modify the name format
\@ifpackageloaded{biblatex_legacy}
  {
    % Original syntax for BibTeX model
    \DeclareNameFormat{default}{%
      \renewcommand*{\multinamedelim}{\addsemicolon\addspace}%
      \usebibmacro{name:last-first}{#1}{#4}{#5}{#7}%
      \usebibmacro{name:andothers}%
    }

    \DeclareNameFormat{editor}{%
      \renewcommand*{\multinamedelim}{\addcomma\addspace}%
      \usebibmacro{name:last-first}{#1}{#4}{#5}{#7}%
      \usebibmacro{name:andothers}%
    }
  }
  {
   % New syntax for flexible back end
    \DeclareNameFormat{default}{%
      \renewcommand*{\multinamedelim}{\addsemicolon\addspace}%
      \nameparts{#1}%
      \usebibmacro{name:family-given}
        {\namepartfamily}
        {\namepartgiveni}
        {\namepartprefix}
        {\namepartsuffix}%
      \usebibmacro{name:andothers}%
    }

    \DeclareNameFormat{editor}{%
      \renewcommand*{\multinamedelim}{\addcomma\addspace}%
      \nameparts{#1}%
      \usebibmacro{name:family-given}
        {\namepartfamily}
        {\namepartgiveni}
        {\namepartprefix}
        {\namepartsuffix}%
      \usebibmacro{name:andothers}%
    }
  }

I’ll deal with the differences in back-ends in another post, but for the present this formulation will keep styles working for everyone.

TeXworks developments

Over recent weeks, Stefan Löffler has provided new builds for TeXworks
featuring a re-implementation of the PDF view. This offers
new modes for previewing the output, most notably continuous scrolling.
It’s also intended to be much faster than the older viewer.

Stefan’s also been setting up Travis-CI testing for TeXworks, and this
has the added benefit that he’s now able to provide Mac binaries in addition
to Windows and Linux ones. Stefan himself doesn’t have a Mac for testing
them, but Travis-CI can run automated tests. Moreover, individual users
can grab the burning edge code and use it themselves.

Making custom loaders expl3-aware

The expl3 syntax used by the developing programming layer for LaTeX3 is rather different from ‘traditional’ TeX syntax, and therefore needs to be turned on and off using the command pair \ExplSyntaxOn/\ExplSyntaxOff. In package code making use of expl3, the structure

\ExplSyntaxOn % Or implicit from \ProvidesExplPackage
....
\usepackage{somepackage}
....
\ExplSyntaxOff % Or the end of the package

will switch off expl3 syntax for the loading of somepackage and so will work whether this dependency uses expl3 or not.

This is achieved by using the LaTeX2e kernel mechanism \@pushfilename/@popfilename, which exists to deal with the status of @ but which is extended by expl3 to cover the new syntax too. However, this only applies as standard to code loaded using \usepackage (or the lower-level kernel command \@onefilewithoptions). Some bundles, most notable TikZ, provide their own loader commands for specialised files. These can be made ‘expl3-aware’ by including the necessary kernel commands

\def\myloader#1{%
  \@pushfilename
  \xdef\@currname{#1}%
  % Main loader, including \input or similar
  \@popfilename
} 

For packages which also work with formats other than LaTeX, the push and pop steps can be set up using \csname

\def\myloader#1{%
  \csname @pushfilename\endcsname
  \expandafter\xdef\csname @currname\endcsname{#1}%
  % Main loader, including \input or similar
  \csname @popfilename\endcsname
}

Of course, that will only work with LaTeX (the stack is not present in plain TeX or ConTeXt), but as the entire package idea is essentially a LaTeX one that should be a small problem.

TUG2015 Beyond the formal

I’ve given a summary of the ‘formal’ business of each session at TUG2015 over the past few days:

Of course, there was a lot more to the meeting beyond the talk sessions. Stefan Kottwitz covered some of them in a TeX-sx blog post, including a picture of (most of) the TeX-sx regulars in attendance.

It was great to meet people I’ve come across over the years but haven’t met in person: I think the only delegate I’d met before was David Carlisle (who lives less than an hour’s drive from home). So each coffee and lunch break was a (quick) chance to at least say hello to people.

I’m told we’ve not had a proper LaTeX team meeting for 10 years: certainly not before whilst I’ve been on the team. So a lot of the team for me (and the other LaTeX3 people) was taken up with a long list of ‘agenda’ items. We just about got through them by doing to evenings, the last afternoon (before the banquet) and breakfast the day after the conference finished! Hopefully we’ll manage something a bit more regular in the future!

TUG2015 Day Three

The final day of the meeting offered another exciting mix of talks.

Morning three

Session one: Publishing

The day started with Kaveh Barzagan (new TUG President) and Jagath AR from River Valley Technology. They gave us an insight into the serious publishing end of the TeX world. Kaveh showed us the some of the ‘interesting’ features one sees in XML workflows, and explained how TeX can enable a XML-first approach. Jagath then showed us the way that they can integrate rich content into PDFs within this method.

Next was Joachim Schrod, who focussed on dealing with lots of documents: the needs of an online bank. Joachim explained the challenges of created hundreds of thousands of documents a month. In contrast to books or similar, these documents are all very similar and have very limited requirements. What is needed is resilience and speed. We saw how using LaTeX with knowledge of ‘classical’ TeX techniques (DVI mode and \special) can deliver key performance enhancements. He also told us about the challenges, particularly the human one: hiring (La)TeX experts for maintenance is not easy.

The third talk of the session came from S. K. Venkatesan and focussed on using TeX algorithms for scroll-like output. He showed how TeX compares with browsers in this area

Session two

After coffee, I was back on again to talk about \parshape. I talked about different elements of design of text which are best implemented at the primitive level using \parshape. I showed that we can provide interfaces for different aspects of the shape without the end user needing to know about the back-end detail. My talk was quite short but we got a lot of discussion!

Next was Julien Cretel, who talked about ideas for implementing Haskell-like functionality in TeX. Julien explained what he enjoys about functional languages, what has already been done in TeX and what he’d like to achieve. In particular, he focussed on tree data structures.

The final morning talk came from Hans Hagen. He started by showing us one of the challenges of grid setting: how to adjust design to accommodate overheight items. There are a lot of challenges, and he explained that there are probably more ways of tackling the problem than users! He then talked about bibliographies in ConTeXt and the reimplementation recently undertaken to use a flexible approach to cover many times of formatting. Hans finished with ‘ASCII math’ parsing, where all mathematics is represented with plain text. Here, Hans had the issue that the input format is rather flexible and not well defined.

Afternoon

After lunch, we had the group photo: there should be a lot of pictures available, given the number of budding photographers! We then reconvened for the final session.

Boris Vetysman gave his third talk of the meeting, looking at how we can arrangement for parts of files to be omitted from the output. He described two situations, missing out irrelevant data and omitting sensitive data. Boris showed how to tackle these two challenges by skipping material in the first case, and by stripping the sources in the second.

The final talk came from Enrico Gregorio (TeX-sx user egreg) and his recollections as a spurious space catcher. Enrico showed a collection of ‘interesting’ code, either with missing %, extra % or a curious mixture of both. He then showed us how to fix them, moving from manually setting catcodes for spaces and the like to using expl3 to avoid spacing issues, but also to avoid re-inventing the wheel.

Q&A Session

The final session was a free-form Question and Answer session. This led to interesting discussions on BibTeX databases, templates and source highlighting. It also meant we (formally) found out where the next meeting will be: Toronto, some time in early summer 2016.

TUG2015 Day Two

The second day of the meeting had a morning of talks and then the afternoon for the conference outing to the Messel Pit.

Morning two

Session one

The day started with with a talk from Pavneet Arora telling us about something a bit different: detecting water leaks in property. Pavneet focussed on what most users want, the output rather the interface, and how this might lead us to a ‘TeX of Things’. He explained how he’s using TeX as part of a multi-tool chain to provide insight into water flow, using ConTeXt as the mechanism for making reports. All of this was based on Raspberry Pi to target embedded systems.

Tom Hejda then told us about his work creating two document classes: one for a journal and one for producing a thesis, both linked to his university. He contrasted the needs of users for these two document types. He showed us how he’d tackled this, with very different interfaces for two.

Next was Boris Veytsman one creating multiple bibliographies. He started at the end: looking at the ways you can access reference lists. You might want to look at references by date, by author, by reference callout or indeed by something else. Boris explained how he’s learned from his earlier multibibliography package to create a new package nmbib. This allows the user to select one or more views of the bibliography in the output.

Session two

After the coffee break, Boris returned along with Leyla Akhmadeeva looking at supporting a new medicinal institute in Russia. Leyla is a neurologist and laid out the needs for training doctors. Setting up a new institution in Bashkortostan means developing new communication templates. Boris showed us the requirements for multi-language documents following the Russian formal standard. He showed us the challenges of following those standards, particularly for when one of the languages (Bashkir) doesn’t currently have any hyphenation patterns available. He also talked about the design challenges of creating a beamer style using the colour elements from potentially clashing logos.

We then heard from Paul Gessler on converting Git logs into pretty-printed material using TikZ. Paul told us how he got started on the project, answering a question on TeX-StackExchange and getting pulled in by feature requests. He showed us his plans for turning Git branches into PDFs, and also how people have used Git branching to map the Paris Metro!

Question and answer session

The morning session finished with a Q&A to the TUG board. Topics were varied but the focus was on how we attract new users and new developer, and what is the meaning of a user group today. There’s a lot to talk about there, and we broken for lunch with informal chats going on.

Afternoon

The afternoon today features a visit to the Messel Pit. It will be an opportunity to talk about lots of things across the whole group attending. I’ll aim to report back later on the key topics.