|
|||||
|
|||||
Prerequisite Concepts | Related Syntax | ||||
Pattern matching, nested |
You can use the technique of "nested pattern matching" to process nested structures in input data. Nested pattern matching involves creating a new process which scans the current scanning source. This allows you to isolate the identification and processing of a nested structure in an input source.
Here is an example of the nested pattern matching technique. The following program uses nested pattern matching to find and process a stylesheet from an RTF (Rich Text Format) document. An RTF stylesheet starts with "{/stylesheet" and ends with "}". The problem with finding that closing brace is that you can't simply match the first "}" you see. Stylesheets contain many subelements which begin with "{" and end with "}", so the first "}" you see won't be the end of the stylesheet but the end of a subelement. The trick is to keep track of all the matching braces so that you find and process the whole stylesheet. Here's the code:
declare catch close-brace process using output as #suppress submit #main-input find "{\stylesheet" using group process-stylesheet using output as #main-output do output "{\stylesheet" submit #current-input catch close-brace output "}" done group process-stylesheet find [\"{}"]* => stuff output stuff find "{" output "{" submit #current-input catch close-brace output "}" find "}" throw close-brace
First you find the beginning of the stylesheet with the rule find "{/stylesheet"
. When you find it, you switch to the group process-stylesheet and submit #current-input
. This causes a new find process to start in the process-stylesheet group. Every time you encounter an open brace, submit #current-input
again, creating a stack of find processes. Every time you find a close brace, throw to close-brace, ending the current submit and popping a find process off the stack. When the closing brace of the stylesheet is found, the catch in the original stylesheet rule will be the closest catch for close-brace, and you will exit the process-stylesheet group, having captured the entire stylesheet. You then output the stylesheet.
This program simply outputs the stylesheet. It is simple to extend this program so that each subelement of the stylesheet is found and processed explicitly within the process-stylesheet group. For instance, you could add a rule to find individual styles within a stylesheet:
global stream my-styles variable initial-size 0 ... find "{\s" digit+ => style-number set new my-style to style-number output "{\s" || style-number submit #current-input catch close-brace output "}"
Note that with the nested pattern matching technique, there is no need to capture a structure such as a stylesheet in a buffer and then process it afterward. You can process it on the fly. Nested pattern matching greatly improves the efficiency of processing such structures, while reducing the size and complexity of your code.
The example above involved multiple levels of nesting of RTF group structures (RTF groups begin with "{" and end with "}"), but each nested group was exited explicitly by a throw that unwound exactly one level of nesting. It is often useful to unwind more than one level of nesting at a time. Consider the following RTF fragment which describes direct formatting of the text "Hello World":
{\b\i\u Hello World}Here three RTF control words turn on bold, underlining, and italic formatting for the text in the group. Suppose we are trying to transform this RTF into equivalent HTML, which would look like this:
<b><i><u>Hello World</u></i></b>
Transforming the control words into opening tags is easy enough, but how do we create all the end tags and get them in the right order? We do it by using nested pattern matching:
declare catch close-brace process submit "{\b\i\u Hello World}" find "{" submit #current-input catch close-brace find "\" letter+ => code space? output "<" || code || ">" submit #current-input always output "</" || code || ">" find "}" throw close-brace
This code treats the occurrence of each RTF control word as the start of a new nested structure in the data. This corresponds to the structure of the HTML we are trying to create. By the time we reach the "}" character in the data, we are four submits deep. (The second find rule has been invoked recursively by the submit in the rule itself.) We unwind all those submits at once with a throw to close-brace. This collapses us right back to the first find rule where we discovered the opening brace. As the throw is being executed, the always
clause of each of the recursively invoked rules is executed, producing the closing tags in the proper order.
Among other things, this example illustrates an important point about pattern matching. The hierarchical pattern we perceived in the data was not there in the mind of the person who designed RTF. The clear intention was to define a single structure and define its properties. We treated it as three nested structures because we wanted to turn it into HTML. We did it by imposing on the data our idea that it contained nested structures. This structure is not apparent in the data. It was not discovered, it was imposed. Patterns are not so much found in the data as they are imposed on the data by what we decide to do with it.
Pattern matching functions are particularly useful for nested pattern matching.
Prerequisite Concepts Catch and throw |
Related Syntax #current-input |