Pattern matching, nested

Nested pattern matching is used to process nested structures in input data. Nested pattern matching involves creating a new process which scans the current scanning source. This allows you to isolate the identification and processing of a nested structure in an input source.

Here is an example of the nested pattern matching technique. The following program uses nested pattern matching to find and process a stylesheet from an RTF (Rich Text Format) document. An RTF stylesheet starts with {/stylesheet and ends with }. The problem with finding the closing brace is that you can not simply match the first } you see: Stylesheets contain many subelements which begin with { and end with }, so the first } encountered will not be the end of the stylesheet but the end of a subelement. The trick is to keep track of all the matching braces so that you find and process the whole stylesheet. Here is the code:

  declare catch close-brace ()
  
  process
     using output as #suppress
        submit #main-input
     
  
  find "{\stylesheet"
     using group "process stylesheet"
        using output as #main-output
        do
           output "{\stylesheet"
           submit #current-input
  
         catch close-brace ()
           output "}"
        done   
         
        
  group "process stylesheet"
     find [any \ "{}"]* => t
        output t
        
  
     find "{"
        output "{"
        submit #current-input
  
      catch close-brace ()
        output "}"
        
  
     find "}"
        throw close-brace ()

First you find the beginning of the stylesheet with the rule find "{/stylesheet". When you find it, you switch to the group process stylesheet and submit #current-input. This causes a new find process to start in the process stylesheet group. Every time you encounter an open brace, submit #current-input again, creating a stack of find processes. Every time you find a close brace, throw to close-brace (), ending the current submit and popping a find process off the stack. When the closing brace of the stylesheet is found, the catch in the original stylesheet rule will be the closest catch for close-brace, and you will exit the process stylesheet group, having captured the entire stylesheet. You then output the stylesheet.

This program simply outputs the stylesheet. It is simple to extend this program so that each subelement of the stylesheet is found and processed explicitly within the process stylesheet group. For instance, you could add a rule to find individual styles within a stylesheet:

  global string styles variable
  
  
  find "{\s" digit+ => style-number 
     set new styles to style-number
     output "{\s" || style-number
     submit #current-input
  
   catch close-brace ()
     output "}"

Note that with the nested pattern matching technique, there is no need to capture a structure such as a stylesheet in a buffer and then process it afterward. You can process it on the fly. Nested pattern matching greatly improves the efficiency of processing such structures, while reducing the size and complexity of your code.

Multiple levels of nesting

The example above involved multiple levels of nesting of RTF group structures, but each nested group was exited explicitly by a throw that unwound exactly one level of nesting. It is often useful to unwind more than one level of nesting at a time. Consider the following RTF fragment which describes direct formatting of the text Hello World:

  {\b\i\u Hello World}

Here three RTF control words turn on bold, underlining, and italic formatting for the text in the group. Suppose we are trying to transform this RTF into equivalent HTML, which would look like this:

  <b><i><u>Hello World</u></i></b>

Transforming the control words into opening tags is easy enough, but how do we create all the end tags and get them in the right order? We do it by using nested pattern matching:

  declare catch close-brace ()
  
  process
     submit "{\b\i\u Hello World}"
     
  
  find "{"
     submit #current-input
  
   catch close-brace ()
  
     
  find "\" letter+ => code space?
     output "<" || code || ">"
     submit #current-input
  
   always
     output "</" || code || ">"
     
  
  find "}" 
     throw close-brace ()

This code treats the occurrence of each RTF control word as the start of a new nested structure in the data. This corresponds to the structure of the HTML we are trying to create. By the time we reach the } character in the data, we are four submits deep. (The second find rule has been invoked recursively by the submit in the rule itself.) We unwind all those submits at once with a throw to close-brace. This collapses us right back to the first find rule where we discovered the opening brace. As the throw is being executed, the always clause of each of the recursively invoked rules is executed, producing the closing tags in the proper order.

Among other things, this example illustrates an important point about pattern matching. The hierarchical pattern we perceived in the data was not there in the mind of the person who designed RTF. The clear intention was to define a single structure and define its properties. We treated it as three nested structures because we wanted to turn it into HTML. We did it by imposing on the data our idea that it contained nested structures. This structure is not apparent in the data. It was not discovered, it was imposed. Patterns are not so much found in the data as they are imposed on the data by what we decide to do with it.

Pattern matching functions are particularly useful for nested pattern matching.

Prerequisite Concepts

Catch and throw

Related Topics

#current-input