swirl
Guide to OmniMark 8   OmniMark home
docs home 
IndexConceptsTasksSyntaxLibrariesLegacy LibrariesErrors
 
Prerequisite Concepts     Related Syntax  

Pattern matching, nested

You can use the technique of "nested pattern matching" to process nested structures in input data. Nested pattern matching involves creating a new process which scans the current scanning source. This allows you to isolate the identification and processing of a nested structure in an input source.

Here is an example of the nested pattern matching technique. The following program uses nested pattern matching to find and process a stylesheet from an RTF (Rich Text Format) document. An RTF stylesheet starts with "{/stylesheet" and ends with "}". The problem with finding that closing brace is that you can't simply match the first "}" you see. Stylesheets contain many subelements which begin with "{" and end with "}", so the first "}" you see won't be the end of the stylesheet but the end of a subelement. The trick is to keep track of all the matching braces so that you find and process the whole stylesheet. Here's the code:

  declare catch close-brace
  
  process
     using output as #suppress
        submit #main-input
     
  find "{\stylesheet"
     using group process-stylesheet
        using output as #main-output
        do
           output "{\stylesheet"
           submit #current-input
           catch close-brace
              output "}"
        done   
         
        
  group process-stylesheet
        
  find [\"{}"]* => stuff 
     output stuff
     
  find "{"
     output "{"
     submit #current-input
     catch close-brace 
        output "}"
     
  find "}"
     throw close-brace

First you find the beginning of the stylesheet with the rule find "{/stylesheet". When you find it, you switch to the group process-stylesheet and submit #current-input. This causes a new find process to start in the process-stylesheet group. Every time you encounter an open brace, submit #current-input again, creating a stack of find processes. Every time you find a close brace, throw to close-brace, ending the current submit and popping a find process off the stack. When the closing brace of the stylesheet is found, the catch in the original stylesheet rule will be the closest catch for close-brace, and you will exit the process-stylesheet group, having captured the entire stylesheet. You then output the stylesheet.

This program simply outputs the stylesheet. It is simple to extend this program so that each subelement of the stylesheet is found and processed explicitly within the process-stylesheet group. For instance, you could add a rule to find individual styles within a stylesheet:

  global stream my-styles variable initial-size 0
  ...
  
  find "{\s" digit+ => style-number 
     set new my-style to style-number
     output "{\s" || style-number
     submit #current-input
     catch close-brace 
        output "}"

Note that with the nested pattern matching technique, there is no need to capture a structure such as a stylesheet in a buffer and then process it afterward. You can process it on the fly. Nested pattern matching greatly improves the efficiency of processing such structures, while reducing the size and complexity of your code.

Multiple levels of nesting

The example above involved multiple levels of nesting of RTF group structures (RTF groups begin with "{" and end with "}"), but each nested group was exited explicitly by a throw that unwound exactly one level of nesting. It is often useful to unwind more than one level of nesting at a time. Consider the following RTF fragment which describes direct formatting of the text "Hello World":

  {\b\i\u Hello World}
Here three RTF control words turn on bold, underlining, and italic formatting for the text in the group. Suppose we are trying to transform this RTF into equivalent HTML, which would look like this:
  <b><i><u>Hello World</u></i></b>

Transforming the control words into opening tags is easy enough, but how do we create all the end tags and get them in the right order? We do it by using nested pattern matching:

  declare catch close-brace
  
  process
     submit "{\b\i\u Hello World}"
     
  find "{"
     submit #current-input
     catch close-brace
     
  find "\" letter+ => code space?
     output "<" || code || ">"
     submit #current-input
     always
        output "</" || code || ">"
     
  find "}" 
     throw close-brace

This code treats the occurrence of each RTF control word as the start of a new nested structure in the data. This corresponds to the structure of the HTML we are trying to create. By the time we reach the "}" character in the data, we are four submits deep. (The second find rule has been invoked recursively by the submit in the rule itself.) We unwind all those submits at once with a throw to close-brace. This collapses us right back to the first find rule where we discovered the opening brace. As the throw is being executed, the always clause of each of the recursively invoked rules is executed, producing the closing tags in the proper order.

Among other things, this example illustrates an important point about pattern matching. The hierarchical pattern we perceived in the data was not there in the mind of the person who designed RTF. The clear intention was to define a single structure and define its properties. We treated it as three nested structures because we wanted to turn it into HTML. We did it by imposing on the data our idea that it contained nested structures. This structure is not apparent in the data. It was not discovered, it was imposed. Patterns are not so much found in the data as they are imposed on the data by what we decide to do with it.

Pattern matching functions are particularly useful for nested pattern matching.

Prerequisite Concepts
   Catch and throw
 
  Related Syntax
   #current-input
 
 

Top [ INDEX ] [ CONCEPTS ] [ TASKS ] [ SYNTAX ] [ LIBRARIES ] [ LEGACY LIBRARIES ] [ ERRORS ]

OmniMark 8.2.0 Documentation Generated: March 13, 2008 at 3:25:49 pm
If you have any comments about this section of the documentation, please use this form.

Copyright © Stilo International plc, 1988-2008.