Sunday, January 2, 2011

Counting the number of lines in a file

The simplest way to count the number of lines in a file seems to be to use the ε-TeX primitive \readline. (One could probably get away with using the TeX primitive \read instead, but that reads lines according to the current catcodes set. Also, the code is slightly more complicated since we cannot use \unless without ε-TeX.)
                        \readline\lf to\lfline
                        \advance\linecount by1
If the file does not exist, \linecount will be -1. Otherwise, it'll contain the number of lines in the file.


  1. I think it could be done with Plain TeX as well, you just have to make a group that has all category codes 12, or something like that, and then you can use \read to read the lines. There is a trick I have figured out to be able to set a number outside of a group (without making it global): Make a kern inside the group, and then after leaving the group, use \lastkern and \unkern to remove the kern (adding a kern never causes any side-effects). I think there is another way too: Set all category codes inside of a group to ignore except that the end of line character is active, and then make the active character a macro like \def^^M{\advance\linecount by 1 } and then \input the file and leave the group to restore the category codes.

  2. A better trick is something like \edef\next{\endgroup\count0=\the\count0}\next

  3. \newcount\numberoflines

    \uccode`\~=13\uppercase{\def~{\advance\numberoflines by 1 }}
    \ifnum\count0=13 \else \catcode\count0=9 \fi
    \ifnum\count0<255 \advance\count0 by 1 \repeat
    \input #1\relax


    The source file contains {\the\numberoflines} lines.


  4. Actually using \lastkern might be more efficient than using \edef in this way, since it will use less memory (only one node and no save stack), have less tokens to read, not having to convert a number to tokens and back again, etc.