The keyword lookahead in a compound pattern precedes data to be recognized but not consumed by the pattern-matching process. For example, the pattern
digit+ lookahead blank* "+"
matches a string of digits that is followed by optional spaces and tabs and then a plus sign. However, only the digits are selected. The white space characters, if any, and plus sign remain in the source and can be selected by other patterns.
lookahead can also be used to verify that selected data is not followed by input matching a given pattern. For example,
digit+ lookahead not letter
selects a string of digits as long as the digits are not immediately followed by letters. Note that only one letter needs to be found for the lookahead test to fail, so there is no need to put a "+" following letter in the example above.
Positive and negative lookahead can be combined in one pattern. For example, in data files for the TeX formatter, instructions (called "control sequences") consist of a backslash followed by letters.
The control sequence to end a paragraph is \par. However, standard control sequences such as \parskip or \parindent as well as programmer-defined macro names can begin with the same string.
Suppose paragraphs consist only of letters, punctuation, and space characters. In other words, suppose that no control sequences occur within a paragraph. The following pattern matches paragraph text terminated by the \par control sequence; it fails to match input terminated by another control sequence beginning with the characters \par:
[letter | ".,!?" | blank]+ lookahead "\par" not letter
Recall that any pattern can be enclosed in parentheses and used as a subpattern. lookahead patterns can be used in this way. For example,
((lookahead not "xyz") any)+
matches any input string that does not contain the sequence "xyz" as a substring. Note that both sets of parentheses are necessary. Without the inner set, any becomes part of the lookahead pattern. Without the outer set, the lookahead is not repeated as successive characters are selected.
The above example works in the following manner, beginning at the current point in the file, the data content, or the data being scanned:
lookahead pattern fails, and the pattern terminates.
lookahead pattern succeeds. The current position is not advanced.
any will fail, and the whole pattern terminates.
any matches the next character.
If the pattern any has matched at least one character, then the pattern succeeds. Otherwise, it fails.