swirl
Guide to OmniMark 8   OmniMark home
docs home 
IndexConceptsTasksSyntaxLibrariesLegacy LibrariesErrors
 
Prerequisite Concepts     Related Syntax  

Pattern matching functions

A pattern matching function is a switch function that is used in a pattern and participates in the pattern matching process by scanning #current-input. The function's return value is used in the calling pattern to determine if the pattern matched by the function succeeded or failed.

Here is a very simple pattern matching function that matches text up to and including a specified string:

  
  
  define switch function upto-and-including
      ( value string pat )
      as
      return #current-input matches any** pat
  
  process
      submit "Mary had a little lamb."
      
  find "Mary" upto-and-including("little") =>stuff
      output stuff
      
  find any

Here the function "upto-and-including" uses the matches operator to determine if the input data, represented by #current-input, contains the terminating string value. If it does, matches consumes that portion of the data and the function returns true, allowing the pattern that called the function to continue.

If the input data does not contain the terminal string, matches returns false, the function returns false, and the pattern that called the function fails.

As the code above shows, data matched by a pattern matching function can be captured in a pattern variable in the usual way. One of the limits of conventional pattern variables is that they cannot be used to build a shelf of values from a repeated pattern. Pattern matching functions offer a way around this limitation:

  global stream foo variable
  
  define switch function digit-catcher
      (modifiable stream the-digits)
      as
      do scan #current-input
      match digit+ => digits
          set new the-digits to digits
          return true
      else
          return false
      done
  
  process
      submit "(1)(2)(3)(4)"
      
  find ("(" digit-catcher(foo) ")")+
      repeat over foo
          output foo || "%n"
      again

Pattern matching functions are particularly useful in nested pattern matching. The following code uses a pattern matching function to handle nested parentheses:

  define switch function between-parens as
      repeat scan #current-input
      match [\"()"]+
      match "(" between-parens ")"
      match value-end
          return false
      again
      return true
      
  process
      submit "(1((2)(3))478(954)"
      
  find ("(" between-parens => stuff ")") 
          output stuff || "%n"
          
  find any

The function "between-parens" matches material between parentheses. If it encounters an opening parenthesis character, it calls itself recursively so that any level of parenthetical matter will be matched. If it encounters a closing parenthesis that is not balanced by a preceding opening parenthesis, the character will not match, the repeat scan will exit, and the function will return true.

Note that we do not actively match the closing parenthesis. Rather, the closing parenthesis is the only thing we don't match. This is a common and useful technique in many kinds of balancing operations. Find everything but the closing delimiter, and allow the repeat scan to exit. This allows the closing delimiter to be matched in the outer pattern, which is good for two reasons. First, it makes the pattern easier to read. Second, it allows you to capture the content of the structure without its delimiters (as we do here).

If the function matches the end of the input without seeing the closing parenthesis, it returns false. If this occurs in an iterative call, value-end will then be matched by each instance of the function as it unwinds.

Interestingly enough, this function can be written in a slightly more compact fashion:

  define switch function between-parens as
      repeat scan #current-input
      match [\"()"]+
      match "(" between-parens ")"
      again
      return true
  
  process
      submit "(1((2)(3))478(954)"
      
  find ("(" between-parens => stuff ")") 
          output stuff || "%n"
          
  find any

This form never returns false. It does, however, work almost identically to the original function. Unless a balancing closing parenthesis is encountered, the function will read to the end of the data, just like the previous version. It then returns true, rather than false, just as if it had ended with the closing delimiter. But the pattern that called the function will now fail because it will not be able to match the closing ")".

You can also use pattern matching functions to do some of the processing of the matched data, though it is important to remember that the code in a pattern matching function is called and executed before the pattern as a whole is complete. This means the function could execute even though the pattern as a whole fails. Thus the function could be called and executed again in a subsequent attempt to match the same data.

Prerequisite Concepts
   Functions
   Pattern matching
   Pattern variables
 
  Related Syntax
   do scan
   find
   function, define function
   matches
   repeat scan
 
 

Top [ INDEX ] [ CONCEPTS ] [ TASKS ] [ SYNTAX ] [ LIBRARIES ] [ LEGACY LIBRARIES ] [ ERRORS ]

OmniMark 8.2.0 Documentation Generated: March 13, 2008 at 3:25:49 pm
If you have any comments about this section of the documentation, please use this form.

Copyright © Stilo International plc, 1988-2008.