Nested pattern matching is used to process nested structures in input data. Nested pattern matching involves creating a new process which scans the current scanning source. This allows you to isolate the identification and processing of a nested structure in an input source.
Here is an example of the nested pattern matching technique. The following program uses nested pattern matching
to find and process a stylesheet from an RTF (Rich Text Format) document. An RTF stylesheet starts with
{/stylesheet
and ends with }
. The problem with finding the closing brace is that you
can not simply match the first }
you see: Stylesheets contain many subelements which begin with
{
and end with }
, so the first }
encountered will not be the end of the stylesheet
but the end of a subelement. The trick is to keep track of all the matching braces so that you find and process
the whole stylesheet. Here is the code:
declare catch close-brace () process using output as #suppress submit #main-input find "{\stylesheet" using group "process stylesheet" using output as #main-output do output "{\stylesheet" submit #current-input catch close-brace () output "}" done group "process stylesheet" find [any \ "{}"]* => t output t find "{" output "{" submit #current-input catch close-brace () output "}" find "}" throw close-brace ()
First you find the beginning of the stylesheet with the rule find "{/stylesheet"
. When you find it,
you switch to the group process stylesheet
and submit #current-input
. This causes a new find
process to start in the process stylesheet
group. Every time you encounter an open brace,
submit
#current-input
again, creating a stack of find
processes. Every time you find a close brace, throw to
close-brace ()
, ending the current submit
and popping a find
process off the stack.
When the closing brace of the stylesheet is found, the catch
in the original stylesheet rule will be the
closest catch
for close-brace
, and you will exit the process stylesheet
group, having
captured the entire stylesheet. You then output the stylesheet.
This program simply outputs the stylesheet. It is simple to extend this program so that each subelement of the
stylesheet is found and processed explicitly within the process stylesheet
group. For instance, you
could add a rule to find individual styles within a stylesheet:
global string styles variable find "{\s" digit+ => style-number set new styles to style-number output "{\s" || style-number submit #current-input catch close-brace () output "}"
Note that with the nested pattern matching technique, there is no need to capture a structure such as a stylesheet in a buffer and then process it afterward. You can process it on the fly. Nested pattern matching greatly improves the efficiency of processing such structures, while reducing the size and complexity of your code.
The example above involved multiple levels of nesting of RTF group structures, but each nested group was
exited explicitly by a throw
that unwound exactly one level of nesting. It is often useful to unwind
more than one level of nesting at a time. Consider the following RTF fragment which describes direct formatting
of the text Hello World
:
{\b\i\u Hello World}Here three RTF control words turn on bold, underlining, and italic formatting for the text in the group. Suppose we are trying to transform this RTF into equivalent HTML, which would look like this:
<b><i><u>Hello World</u></i></b>
Transforming the control words into opening tags is easy enough, but how do we create all the end tags and get
them in the right order? We do it by using nested pattern matching:
declare catch close-brace () process submit "{\b\i\u Hello World}" find "{" submit #current-input catch close-brace () find "\" letter+ => code space? output "<" || code || ">" submit #current-input always output "</" || code || ">" find "}" throw close-brace ()
This code treats the occurrence of each RTF control word as the start of a new nested structure in the data.
This corresponds to the structure of the HTML we are trying to create. By the time we reach the }
character in the data, we are four submit
s deep. (The second find
rule has been invoked
recursively by the submit
in the rule itself.) We unwind all those submit
s at once with a throw
to close-brace
. This collapses us right back to the first find
rule where we
discovered the opening brace. As the throw
is being executed, the
always
clause of each of the recursively invoked rules is executed, producing the closing tags in
the proper order.
Among other things, this example illustrates an important point about pattern matching. The hierarchical pattern we perceived in the data was not there in the mind of the person who designed RTF. The clear intention was to define a single structure and define its properties. We treated it as three nested structures because we wanted to turn it into HTML. We did it by imposing on the data our idea that it contained nested structures. This structure is not apparent in the data. It was not discovered, it was imposed. Patterns are not so much found in the data as they are imposed on the data by what we decide to do with it.
Pattern matching functions are particularly useful for nested pattern matching.