TeX Hacks: October 2009

Friday, October 16, 2009

A quick interlude into fonts

Dealing with fonts in LaTeX is one of the hardest aspects of using it. Fortunately, if one does not want to use Knuth's Computer Modern fonts, there is a very simple way to change font families. Full details are here, but for a publication that requires a Times roman typeface, one should use

\usepackage{mathptmx}
\usepackage[scaled=.92]{helvet}
\usepackage{courier}

which causes the roman and math fonts to be Times, the sans serif fonts to be in Helvetica—scaled so that it matches the other fonts better—and courier for the typewriter family. These three look nice together. In addition, output font encoding can be changed to T1 with

\usepackage[T1]{fontenc}

which is recommended. For more details see the above link

Wednesday, October 14, 2009

Spaces in TeX

Spaces appear all over .tex files but only some of them appear as actual spaces in the output. To understand this, we first need to understand how TeX reads lines of input.

When reading input, TeX is in one of three states: state N is when TeX is at the beginning of a new line; state M is when TeX is in the middle of the line; and state S is when TeX is skipping spaces. TeX will discard space characters it sees in any state except for M. Basically, TeX starts in state N and on the first non space character (actually, it's slightly more complicated, but for for the purposes of this post, just consider tabs as spaces), it transitions into state M. While in state M, each character that is read is turned into a token except that control sequences are turned into a single token (again, it's more complicated than that, but this will suffice). Once a space is encountered, a space token is created and TeX enters state S. Again, a nonspace character brings TeX into state M. As an example, consider the line of input:

Hello      \TeX!

TeX begins in state N and then upon reading the H transitions into state M and produces an H token. Then e, l, l, and o tokens are produced in turn.

Upon reading the first space, TeX produces a space token and then enters state S. The rest of the spaces up to the \ are ignored. Once TeX reads the \, it will scan the rest of the control sequence and produce a single \TeX token. Finally, TeX produces a ! token. I haven't said what state TeX enters when it scan a control sequence. The answer depends on what type of control sequence it is. If the first character after the \ is not a letter, for example if it's a symbol like @ or #, then TeX produces a token consisting of the control symbol. (For example, the token \@ or \#.) In this case, TeX enters (or remains in) state M. If instead, the first character after the \ is a letter, then TeX reads a control word consisting of the \ and all following letters. TeX then enters state S. This explains why TeX ignores spaces after control words like \TeX or \bf. So in the example above, since \TeX is a control word, TeX will enter state S after reading that control word and then immediately enter state M when it reads the !.

Before we can move on, there are two points I skipped. Before TeX starts processing a line of input, it deletes all space characters at the right of the line and inserts a carriage return character which, by default, is the end of line character. So to conclude the discussion of a single line, we need to know what happens with comment characters and end of line characters. For a comment character, all information on the rest of the input line is thrown away and TeX starts on the next line of input in state N. For an end of line character, TeX throws away all remaining input on the line (just like a comment) and then does one of three things. If TeX is in state N, then it produces a \par token. If TeX is in state M, it produces a space token. If TeX is in state S, it ignores the end of line character.

Let's consider the implications of the handling of the end of line character. In state N, it produces a new paragraph which is why entering a blank line in your TeX source gives you a new paragraph. In State M, it produces a space which is why we can sprinkle newlines (almost) anywhere we like in our source and we get spaces. If spaces are being skipped, for example after a control word (but not a control symbol!), then the end of line does nothing. [Okay, one final lie above, after a control symbol consisting of \ and a space, TeX enters state S. This is so that \ followed by two spaces does not produce two space tokens.] To summarize, when TeX reads a line of input, it

removes trailing spaces and adds a carriage return,
enters state N,
reads characters, creating tokens and changing states as described above until it,
reaches the end of line character which is either turned into a \par token, a space token, or ignored, depending on the current state.

This is not the end of the story as there are situations where TeX ignores space tokens and \par tokens a.k.a, "why don't I get more blank lines when I enter more blank lines in my source?" However, this post is long enough, so I'll put that off for now and discuss modes, next time.

Monday, October 5, 2009

Defining new math operators

Defining a new math operator that behaves similar to \sin or \lim is very easy to do using the amsmath package. It provides a \DeclareMathOperator macro that works in the preamble to declare a new operator. It also contains a starred version that behaves similar to \lim with respect to subscripts. For example:

\DeclareMathOperator\arcsec{arcsec}
\DeclareMathOperator*\Lim{Lim}

In addition, \operatorname or \operatorname* can be used for one-time uses that don't warrant defining a new control sequence for the operator name. These are better than using \mathrm to define operator names if for no other reason than spacing is handled correctly in the presence or absence of parentheses.

Friday, October 2, 2009

Getting publication quality tables is easy

One of the problems with reading about various aspects of typography is I start to see the short comings in other's work, and far more importantly, in my own. Creating tables is one area where this is certainly true. In my experience, nearly every document prepared with LaTeX that contains a table, contains an ugly table. I'm not sure what the reason for this is, but everybody seems to want to make tables that look like the following.

\begin{tabular}{|c|c|c|}
\hline
A & B & C\\
\hline\hline
foo & bar & baz\\
\hline
zab & rab & oof\\
\hline
\end{tabular}

(As usual, try this out here.) There's no reason at all for each cell to be boxed that way. I rather suspect this comes from looking at too many ugly HTML tables one gets by default. Fortunately, the solution is very simple. Use the booktabs package. I strongly encourage anyone writing a table to read the documentation (pdf). The use is very simple.

\begin{tabular}{ccc}
\toprule
A & B & C\\
\midrule
foo & bar & baz\\
zab & rab & oof\\
\bottomrule
\end{tabular}

(You'll need to select the booktabs package in the previewer above to try this out.) Notice that in addition to looking better, this actually requires less work to produce! There is a \cmidrule that works similar to \cline, but is more flexible and can actually be used in adjacent columns; however, I don't often find a use for any rules but the three above. Finally, the author of booktabs gives 2 guidelines for making publication quality tables. 1) Do not use vertical rules; and 2) Do not use double rules. These are excellent guidelines. Follow them.

TeX Hacks