Documented in Volume 1 of the UNIX Programmers Manual.
See /usr/doc/lex on 4.2 BSD systems for additional online documentation on this.
From martin@noscvax.UUCP (Douglas W. Martin) arpanet: martin@nosc
Subject: lex(1) output statistics (Summary)
... I requested info explaining lex output statistics; e.g. the meanings
of the "%" parameters, what sorts of rules caused which tables to increase
in size, etc... A summary of responses follows.
From: (Vern Paxson [rtsg]) vern@lbl-csam
>I can tell you what some of the mysterious %{e,p,n,...}'s in the Lex
>output are:
>
> %e is the number of NFA states needed to recognize the input patterns.
> It is a linear function of the number of characters and operators in
> the patterns.
>
> %n is the number of DFA states needed for the recognizer. It is
> potentially an exponential function of the number of NFA states,
> but in practice tends to be no greater and usually less than the
> %e figure.
>
> %t is the number of state transitions in the NFA. It is a function of
> the complexity of your patterns (especially "*", and "+" operators)
> and especially the occurence of character classes inside of closures.
> One character class and closure is enough to do it. For example,
> "keyword recognizer" inputs like:
begin return( TOK_BEGIN );
end return( TOK_END );
procedure return( TOK_PROCEDURE );
.
.
.
[a-z]+ return( TOK_IDENTIFIER );
> will result in a very high %t figure. It's Lex's job to compress
> all of these transitions into something much smaller, but it doesn't
> do an especially good job. We have written our own Lex program in
> Ratfor and our experience shows that you can halve the size of the
> tables for complicated sets of rules.
>
> %o is basically the compressed version of %t. The total amount of
> memory the tables generated by Lex will take is
> 3*(%n) + 2*(%o) + X integer words
> where X is roughly 2*(%n). Most of this stuff could be declared
> as short but isn't. In addition, there's a constant overhead of
> two arrays of 128 characters each.
>
> I could make guesses as to what %k, %p, and %a are, but all they'd be
>is guesses.
> If your concern is in staying within Lex's default maximums for these
>values, the way to do it is to eliminate ambiguities and similarities between
>rules. The above example of rules that recognize keywords and identifiers
>will use a great deal less of Lex's resources (and have much smaller output
>tables) if you get rid of either the keyword rules or the identifier rules.
>The complexity in the set of rules arises from the ambiguity of each
>keyword also matching the identifier rule.
> But if you want to keep your input rules intact, you can increase Lex's
>maximum for a given value by putting, e.g,
%a 3000
>in the declarations section of the input. In picking how much to raise
>the value, I just increase by 500 or 1000 and see if that's enough. Judging
>from the operation of our Lex, I don't think any of these maximums have
>a geometric effect on the amount of memory Lex uses. They may very easily
>have a geometric effect on Lex's run-time, though.
> Vern Paxson, Real Time Systems Group, Lawrence Berkeley Laboratory
> vern@lbl-csam.ARPA
From: Don Davis JHU/APL
>I was confronted with this problem about one year ago. I had to
>increase the table parameters (I forget the name of the file, but
>it was a .h file) and recompile lex. I called my new lex "biglex" and
>it worked just fine.
> How do you determine the correct size for the table parameters? I didn't
>have the time or inclination to analyze the operation of lex, so I just
>used trial and error. What can I say; it worked!
> Don Davis JHU/APL
>