Exploring ChemFig: Going further

In the first two parts of this short series, I’ve looked at some ChemFig basics and improving the settings used to get to publication-quality appearance. In this final part, I want to look as some more complex effects. I’m going to keep using the customisations I made in part two, so the demos here all use them in the preamble.

Decorating bonds

Chemists don’t only use simple line bonds: we use bold, dashed and wavy lines a lot. ChemDraw has all of these set up ‘out of the box’:

ChemFig does not have a simple input syntax for them, unlike = for a double bond or ~ for a triple bond. However, it does let us customise bond appearance: the basic syntax we need is to put [,,,,<settings>] after the bond to be customised (there are four commas here as ChemFig has other settings to alter bonding). The settings are TikZ commands, and it’s possible to set up these customisations as styles, which is better than doing everything by hand.

First, we need two settings from ChemDraw: the thickness to use for a bold bond and the spacing in a hashes. I’ll want to use these a few times, so save them with readable names

\newcommand*{\bondboldwidth}{0.22832 em} %'Bold Width'
\newcommand*{\bondhashlength}{0.25737 em} % 'Hash Spacing'

Bold, hashed and dashed bonds are then easy to set up

\tikzset{
  bold bond/.style = {line width = \bondboldwidth},
  dash bond/.style =
    {dash pattern = on \bondhashlength off \bondhashlength},
  hash bond/.style =
    {
      dash pattern = on \bondwidth off \bondhashlength,
      line width   = \bondboldwidth
    },
}

Wavy bonds are a bit more tricky. TikZ has a ‘decorations’ library including the idea of a ‘snake’ line, but this is not quite right. Instead, I’ll use a ‘real’ sine wavy as described on the TeX-sx site.
At the same time, I want to pick up something ‘internal’ from ChemFig: the inter-atom spacing, which we set using \setatomsep. That’s stored in the macro CF@atom@sep, which I want as wavy bonds should have an integer number of repetitions along a standard-length bond:

\tikzset{
  wavy bond/.style =
    {
      decorate,
      decoration =
        {
          complete sines,
          amplitude   = \bondboldwidth,
          post length = 0 pt,
          pre length  = 0 pt,
          % Use the atom spacing: saved 
          segment length = 
            \the\dimexpr\csname CF@atom@sep\endcsname/5\relax
        }
    }
}

Okay, so how does this all look? The document input is not so bad

\chemfig{
  *6((-[,,,,hash bond])-
  -(-[,,,,wavy bond])
  -(-[,,,,dash bond])-
  -(-[,,,,bold bond])-)
}

and gives result

If you look really carefully, you’ll see that this highlights an issue. The bond junctions are just flat ‘ends’, which does not show very much for the single bonds but does where the bold bond meets the ring. If you compare with ChemDraw, you’ll see that it does not make the same error: the bonds ‘run in’ to each other. I’ve not found a way to solve that, unfortunately.

Into three dimensions

Chemical structures exist in three dimensions, and it’s very common to show this using wedged bonds, invented by Cram.

ChemFig let’s us use < in place of - for a filled wedged bond, with <| for a hollow one and <: for a dashed (backward) one. So the input we want here is

\chemfig{
  *6((<)-(<:)-(<|)-(>)-(>:)-(>|)-)
}

If you try that with no setting changes, the bonds are too wide at the ends. That’s controlled by \setcrambond, which has three parameters: the width of the bond, the thickness of hash lines and the hash line spacings. ChemDraw seems to set the wider end to the width of a bold bond plus two normal bonds, so I used

\setcrambond
  {\the\dimexpr \bondwidth * 2 + \bondboldwidth \relax}
  {\bondwidth}{\bondhashlength}

and got

Here, it’s clear that the issue with bond joins shows up a lot more than the earlier cases: it’s still reasonably subtle, but definitely shows up more strongly.

Schemes and so on

I’m focussing here on drawing individual structures, but should mention schemes and compound numbering. In the MyChemistry article I’ve already linked to, there is quite a bit about this, using a combination of ChemFig (for the schemes) and chemnum for the numbering. What I will say is that it works well provided you don’t have complex alignment needs: one of the tricky parts of creating a good-looking scheme is deciding exactly what to line up!

The other quick note I’d add on schemes is that the arrow width really should match that of bonds. So I’d use

\setarrowdefault{,,line width = \bondwidth}

in my preamble to have everything match.

Conclusions

With a bit of effort with the settings, ChemFig can produce quality output, and can get quite a bit ‘right’ (although there are a few gaps). However, as I said in the first part, I won’t be abandoning ChemDraw any time soon. I deliberately picked something reasonably straight-forward for my tests, and the sort of thing I do in my research work would be a lot harder to draw and maintain using ChemFig. In particular, I don’t fancy trying to show up three-dimensional affects (for example a benzene ring going ‘into’ the page) using a text-based approach.

So what could I recommend ChemFig for? First, the most obvious case is for people who don’t have a copy of ChemDraw. There are other graphical editors, but none of the free ones are as good as ChemDraw. So if you want high-quality output without paying, this looks the best approach I’ve seen. It also looks good for creating stand-alone documents (using ChemDraw means needing graphics files). That does look useful for me for teaching, where the structures will be in general not so complicated and where it will be perhaps better to have only a single .tex file. There’s also the fact that drawing using TikZ means that the font match using ChemFig is exact: no need to try to measure up different fonts by eye. So there are uses for ChemFig, and it’s certainly an interesting package. Now all we need is someone to write a ChemDraw to ChemFig converter!

Exploring ChemFig: Customising appearance

In my previous post, I looked at the basics of using the ChemFig package to create chemical structures. I finished that post with a structure that is complete but which I think does not look great compared with the reference version I created in ChemDraw. (There’s a MyChemistry entry that looks at similar customisation: worth a look!)

Atom placement

The first issue to tackle is the placement of atom labels. ChemFig ‘detects’ atoms, so that the labels are correctly centred relative to bonds. However, that does not work with numbered R-groups, as the numbers need to be ‘ignored’ for alignment purposes. This is a pretty common requirement, so ChemFig provides a way to ‘split’ labels, using |:

\chemfig{
  *6(-(-R|^2)=-
    (-=[::-60]N-*6(=(-R|^3)-=(-R|^4)-=(-R|^3)-))
  =(-OH)-(-R|^1)=)
}

which gives the output

To see the difference here, look at for example R2 here compared to the version in the previous post: it’s subtle, but it is there!

Atom spacing and bond width

The standard settings for ChemFig share a ‘feature’ with those for ChemDraw: they don’t look very good! As I said in the previous post, I use the Royal Society of Chemistry’s template for my structures, as I think they look much better. The template uses 7 pt text, and so the lengths, etc. all match that size. For use in LaTeX, I want things to be more flexible so wanted to convert the values into em (i.e. relative dimensions based on font size).

There are three key dimensions used by both ChemDraw and ChemFig to set how bonds look: the bond length, the line width and the gap between lines when drawing double bonds. There is also a ‘margin’ used between atom labels and bonds, so the two don’t touch. After a bit of work doing the calculation (using to the LaTeX3 FPU), I found that

\setdoublesep{0.35700 em}  % 'Bond Spacing'
\setatomsep{1.78500 em}    % 'Fixed Length'
\setbondoffset{0.18265 em} % 'Margin Width'
\newcommand{\bondwidth}{0.06642 em} % 'Line Width'
\setbondstyle{line width = \bondwidth}

was the right set up. The comments are the ChemDraw names for settings, and I’ve set the line width as a command as it turns out I’ll want it again for some more advanced things to be covered in the next post.

The central double bond

Changing the bond spacing shows up another issue: the central double bond is not right. Rather than bond to the ‘middle’ of the double bond, we want the chain to choose one ‘side’. That can be done using either _ or ^, depending on which side is required. I decided to match the ChemDraw version using

\chemfig{
  *6(-(-R|^2)=-
    (-=^[::-60]N-*6(=(-R|^3)-=(-R|^4)-=(-R|^3)-))
  =(-OH)-(-R|^1)=)
}

Atom font

The final thing to adjust to get this example right is the font used for atom labels: the convention is to use sanserif. ChemFig prints text using the \printatom command, which is set up to ensure math mode and \mathrm. Thus the simplest approach is

\renewcommand*{\printatom}[1]{\ensuremath{\mathsf{#1}}

Like many people, I use the excellent mhchem to write in-line chemical equations, so I wanted to use the \ce (or faster \cf) command for printing atoms. My initial attempt failed, with an internal error. A quick e-mail to the ChemFig author led to a fix

\makeatletter
\def\CF@node@content{%
  \expandafter\expandafter\expandafter
    \printatom\expandafter\expandafter\expandafter
      {\csname atom@\number\CF@cnt@atomnumber\endcsname}%
    \ensuremath{\CF@node@strut}%
}
\makeatother

followed by

 \renewcommand*{\printatom}[1]{{\sffamily\cf{#1}}}

leads to the final input

\documentclass{article}
\usepackage{chemfig}
\usepackage[version=3]{mhchem}
\makeatletter
\def\CF@node@content{%
  \expandafter\expandafter\expandafter
    \printatom\expandafter\expandafter\expandafter
      {\csname atom@\number\CF@cnt@atomnumber\endcsname}%
    \ensuremath{\CF@node@strut}%
}
\makeatother
\setdoublesep{0.35700 em}  % 'Bond Spacing'
\setatomsep{1.78500 em}    % 'Fixed Length'
\setbondoffset{0.18265 em} % 'Margin Width'
\newcommand{\bondwidth}{0.06642 em} % 'Line Width'
\setbondstyle{line width = \bondwidth}
\renewcommand*{\printatom}[1]{{\sffamily\cf{#1}}}
\begin{document}
\chemfig{
  *6(-(-R|^2)=-
    (-=^[::-60]N-*6(=(-R|^3)-=(-R|^4)-=(-R|^3)-))
  =(-OH)-(-R|^1)=)
}
\end{document}

and output

I’d say that is pretty good: I’d be happy to use this in a publication (although drawing the kind of structures I do my research with would be a challenge!).

In the final part of this series, I’m going to look at some other things that are needed for chemical structures but which don’t show up in the demo I’ve used. We’ll see that many can be done, but there will be one or two outstanding challenges.

Exploring ChemFig: Basics

Drawing chemical structures is one of the most important parts of my job. For me, although I love using LaTeX, the best tool for doing this is graphical: ChemDraw. There are a few reasons why I favour using ChemDraw over other approaches. Most importantly of all it produces the best output I know of (although ChemDoodle is pretty close). Complex structures are hard enough to produce and edit with a graphical tool, and the challenge of using a text-based approach makes this even more tricky. Finally, it’s what my colleagues use, so there is some realism involved.

On the other hand, you always need to be ready to try new approaches, so I’ve been meaning for a while to look at the new-ish ChemFig package, which is based on TikZ. I’m starting as a lecturer next month, so with some teaching material to prepare as an incentive I’ve decided to take another look at ChemFig. I’m going to take two or three posts to look at how I’ve got on. I won’t spoil the conclusions, but I think it’s worth saying now that I won’t be moving from ChemDraw just yet for my research work!

The target

As a first target, I decided to try to reproduce a structure I’m going to need to draw for some practical hand-outs. My favoured settings for ChemDraw are those used by the Royal Society of Chemistry, which are set up for 7 pt text to match 9 pt body text in two-column journals. I’ll be coming back to these settings a bit more in the second part of this mini-series, but for the moment let’s see what the result looks like:

The aim is to get this ‘right’, working out first how to get the structure correct using ChemFig, then get the finer points of the appearance right. In this post, I’ll tackle the basic connectivity, and in the next one how to match the appearance.

Rings and chains

As you’d expect, the ChemFig manual covers how to produce structures in some detail. Here, I’m going to look very briefly at the syntax needed to get us started. Rather than repeat myself multiple times, I’m using a simple LaTeX document

\documentclass{article}
\usepackage{chemfig}
\begin{document}
% Content here
\end{document}

for all of this.

The basic command we are going to need is \chemfig, which takes a single argument: a description of the structure required. As you might expect, this can take a bit of getting used to. For example, a benzene ring is

\chemfig{*6(-=-=-=)}

which comes out as

The syntax here is reasonably clear: * makes a ring, 6 means it’s a six-membered ring and -=-=-= is the bonding pattern in the ring.

If we just wanted a linear structure, we could omit the ring part with \chemfig{-=-=-=} giving

Decorating the ring

Adding substituents is not too hard once you work out that the first position on the ring is not the bottom but is the lower of the two left-hand atoms, and that the sequence runs anti-clockwise. The parenthesis in the ring part above might give you a clue that they are used to define groups inside the structure. So the left-hand ring we want is written

\chemfig{*6(-(-R^2)=-(-)=(-OH)-(-R^1)=)}

and gives

Hopefully the pattern is reasonably clear: you need to have a - inside the parentheses to have the bond coming off the ring, and can use ^ for superscripts in the usual TeX way.

Completing the structure

The same scheme applies to constructing the rest of the molecule: you can put one ring as a substituent on another, and can have an atom in a chain simply by including the atom name ‘in place’. However, there’s a slight issue, as

\chemfig{
  *6(-(-R^2)=-
    (-=N-*6(=(-R^3)-=(-R^4)-=(-R^3)-))
  =(-OH)-(-R^1)=)
}

is not quite right:

As you can see, the bond angle in the chain part is wrong: ChemFig does not ‘auto-stagger’ things. Of course, this is a pretty basic requirement, so there is a syntax to set the angle of a join: [::-60] will set the relative angle to 60 degrees clockwise, and all will then be well.

\chemfig{
  *6(-(-R^2)=-
    (-=[::-60]N-*6(=(-R^3)-=(-R^4)-=(-R^3)-))
  =(-OH)-(-R^1)=)
}

That completes the connectivity we want, and as you can see the input is starting to look a bit frightening (see my comment at the start of the post). It’s also not great looking compared with the ChemDraw reference version: in the next post, I’ll see how that can be addressed.

Fixing problems the rapid way

The latest l3kernel update included a ‘breaking change’: something we know alters behaviour, but which is needed for the long term. Of course, despite the fact the team try to pick up what these things will break, we missed one, and there was an issue with lualatex-math as a result, which showed up for people using unicode-math (also reported on TeX-sx). Luckily, those packages all use GitHub, as does the LaTeX3 team, so it was easy to quickly fork the code and for me to create a fix. That’s the big advantage of having code available using one of the distributed version systems (GitHub and BitBucket are the two obvious places): sending in a fix is a two-minute job, even if it’s someone else’s project. So I’d encourage everyone developing open code to got to CTAN to consider using one of these services: it really does make fixing bugs easier. From report to fix and CTAN update in less than 24 h, which I’d say is pretty good!

Babel development news

A few months ago, Javier Bezos offered to take over dealing with babel maintenance. He’s been working on getting things in order, and collecting up bug reports, and has now set up a page for development news. Javier has included a road map of what he’s hoping to do, all of which looks very sensible to me. I particularly welcome the idea that he’s going to stick to the core part of babel (the mechanisms), with each language viewed as a module to be maintained by someone knowledgeable. One of the issues babel has faced is that it’s simply not realistic to handle all of that in one place. It’s great that he’s putting the effort in.