Sunday, March 27, 2011

LaTeX's failure with floats

It's probably fairly uncontroversial to say that floats are one of the main areas where LaTeX performs poorly in comparison to WYSIWYG editors. The basic complaint is that floats just don't go where we want them.

To compensate for this, people often use the h float specifier or H from the float package to say, “place the figure here!” This is often a poor approach since there's no real idea of where here really is. This leads to moving the code around between paragraphs, trying to find a reasonable place to put it. Since this is a bad idea, I'm not going to focus on it. Instead, I'm going to talk about real floating material.

Part of the problem is that TeX produces output one page at a time. Once a page is finished, it is shipped out (i.e., written to the dvi or pdf) and not touched again. What this means for LaTeX is that by the time it has seen the \begin{figure} (or other floating material), it has already finished with all of the pages before it. So what it does is it performs a complicated interaction with the output routine which will try to place the figure (subject to the specifiers) on the current page. If that fails, it gets stuffed onto a defer list to be inserted later.

Often, what we really want is for the image to go onto the previous page since that leads to better overall placement. But since LaTeX cannot handle that, we're forced to move the image code ourselves, just like we had to do with the H specifier.

One partial solution is to put each float in a separate file, floatfoo.tex and then move \input{floatfoo} around until a reasonable placement is found. This is not entirely satisfactory since we still have this guess and check procedure.

What I would like is a solution that allows the author to specify a page number (and position) and have the float placed there, if at all possible. I haven't fully thought through what I'd like in an interface, so here are some thoughts about requirements and challenges.
  • The interface should work with twocolumn documents at the very least and it would be better if it supported the multicol package. Something like
    \begin{figure}[page=4,column=2,position=tb]
    might be nice.
  • There are tokenization issues so it is probably not acceptable to tokenize the body of the float, store it somewhere, and then reproduce it when needed since category codes will be assigned at tokenization time. This almost certainly requires writing the output to another file.
  • One idea is to use the filecontents environment to write the body of the figure to separate file with appropriate \if... guards. I'm envisioning something like
    \begin{figure}[page=4,position=tb]
        \centering
        \includegraphics{foo}
        \caption{bar}
        \label{fig:foo}
    \end{figure}
    being written to \jobname.figs as
    \ifnum4=\count0
    \begin{figure}[tb]
        \centering
        \includegraphics{foo}
        \caption{bar}
        \label{fig:foo}
    \end{figure}
    \endif
    Then, for each page, \jobname.figs is \input in a manner similar to \afterpage or \AtBeginPage. This would need something extra for twocolumn documents.
  • There's the question of trying to keep figures in order if some are specified with particular page requirements and others are not.
  • There's an issue if a float depends on a macro being defined but the page specifier puts it before the definition. I don't see how to get around that.
  • There's an issue with trying to work with other floating environments such as the excellent lstlisting from the listings package.
I'm sure there are more issues I haven't considered. This does seem doable though.

6 comments:

  1. I think this sounds great. I can't tell you how many times I couldn't get the figures where I wanted them!

    ReplyDelete
  2. I hear just as bad things about floats in Word ("my figure kept jumping around") and worse ("my figure just disappeared!"). Although I don't use Word myself, so I can't judge the relative merits, and I've never had trouble with LaTeX's floats.

    Anyway, you may be interested to read Frank Mittelbach's paper on floats:

    http://www.latex-project.org/papers/xo-pfloat.pdf

    See especially section 5.4. This code is currently part of the xpackages in the LaTeX3 code repository, but it's not yet ready for production use. Would be interesting to give it a try, though!

    ReplyDelete
  3. Admittedly, I haven't used Word for anything in many years, but I seem to recall from my high school days (more than a decade ago) that it was fairly easy to put a picture somewhere and then have text flow around it. Maybe Word has gotten worse.

    Thanks for the paper link, I'm reading it now.

    ReplyDelete
  4. "Part of the problem is that TeX produces output one page at a time."

    AFAIK this isn't the case. I got told that TeX sometimes accumulates the content of several pages before shipping out the first one.

    But in general you are right. Floats are some major issues when people go from MS Word to LaTeX.

    ReplyDelete
  5. Martin, you're right that TeX can accumulate a lot of text on its main vertical list before deciding that it has enough. It has to do with \pagegoal, \pagetotal, and \pagedepth. TeX by Topic and The TeXbook have all of the details. You can watch TeX's output by setting \tracingoutput to be positive.

    Still, the issue is that once pages are shipped out, there is no more modification and there is not really a good way to prevent shipouts until the end. It might be an interesting exercise to try to do it sanely.

    ReplyDelete
  6. You can prevent shipping out in an output routine, the box can also be stored for modification or more splitting or whatever, and ship them out later. However this requires more memory to store the boxes.

    You can also do multiple passes, in many possible ways. One way to save memory is to cause the pages to be discarded instead of shipped out on the first pass (this won't execute \write whatsits and so on, but there is another way, by creating a new kind of insertion, with its parameters set to zero (so that it does not affect page breaking), and which consists of marks alternating with penalties, and then use \vsplit to split it in the output routine and read the marks from it). And then there can be tricky ways for the file to input itself, too.

    For forward references, what you might do, is with a paragraph, make a temporary box of the paragraph and then make it again with the forward reference text replaced with the longest possible replacement text, find the difference of the number of lines (with \prevgraf), and then typeset the paragraph for real this time with \looseness set to the difference, to ensure that the page break will not cause the reference to be incorrect once it has been filled in.

    Also, I have not have these problems with float because I use Plain TeX instead of LaTeX.

    ReplyDelete