Scanning

Scanning is OmniMark's principal data processing mechanism. Scanning is a powerful mechanism that provides simple solutions to most data processing problems. As an OmniMark programmer you must learn to think of data processing problems first and foremost as scanning problems. Problems that you would solve by substring operations in other languages are generally handled by scanning in OmniMark.

Scanning is a process of progressing systematically through an input source and testing that data against one or more patterns. Whenever a pattern matches, a corresponding set of actions is executed.

OmniMark provides three scanning constructs:

It also provides three scanning operators:

Processing data by scanning

If you need to build a data structure from a data source, you do it by scanning the data and assigning data captured by patterns to elements of your data structure:

  process
     local string files variable
  
     repeat scan file "filelist.txt"
     match any-text+ => file-name "%n"
        set new files to file-name
     again
  
     repeat over files
        ; process the file
     again
          

But in most cases it is not necessary to build data structures. You can process your data directly as part of the scanning operation:

  process
     repeat scan file "filelist.txt"
     match any-text+ => file-name "%n"
        ; process the file file-name
     again

Because you can easily associate any processing code with a pattern matching event (the firing of a find rule or a match alternative) you can process most data directly as it streams. You can output the result of your processing as part of responding to the event, knowing it will all be collected by the current output scope and streamed to the proper destination.

Order of find rules or match alternatives

If one string of data could cause more than one pattern to match, the pattern that occurs first in the scanning construct will fire, and the one that occurs later will not. This allows you to put more specific find rules or match alternatives before more general ones and have the general ones fire only if the specific ones do not. The following two programs produce different output because of the order of their find rules:

  global integer word-count
  
  process
     submit "Mary had a little lamb"
     output "d" % word-count || "%n"
  
  find "had"
     output "*"
  
  find letter+
     increment word-count
  
  find any
          

The program above prints "*4". The program below changes the order of the find rules and produces a different output.

  global integer word-count
  
  process
     submit "Mary had a little lamb"
     output "d" % word-count || "%n"
  
  find letter+
     increment word-count
  
  find "had"
     output "*"
  
  find any
          

This program prints 5.

You must always place more specific rules before more general rules that can match the same data, or the more specific rules will never fire.