swirl
Guide to OmniMark 7   OmniMark home
docs home 
IndexConceptsTasksSyntaxLibrariesLegacy LibrariesErrors
 
     

Scanning

Scanning is OmniMark's principal data processing mechanism. Scanning in a powerful mechanism that provides simple solutions to most data processing problems. As an OmniMark programmer you must learn to think of data processing problems first and foremost as scanning problems. Problems that you would solve by pointer manipulation, indexing, or substring operations in other languages are generally handled by scanning in OmniMark.

Scanning is a process of progressing systematically through a input source and testing that data against one or more patterns. Whenever a pattern matches, a corresponding set of actions is executed.

OmniMark provides three scanning constructs:

It also provides three scanning operators:

Processing data by scanning

If you need to build a data structure from a data source, you do it by scanning the data and assigning data captured by patterns to elements of your data structure:

  process
     local stream files variable
     repeat scan file "filelist.txt"
        match any-text+ => file-name "%n"
           set new files to file-name
     again

     repeat over files
        ;process the file
     again

But in most cases it is not necessary to build data structures. You can process your data directly as part of the scanning operation:

  process
     repeat scan file "filelist.txt"
        match any-text+ => file-name "%n"
           ;process the file file-name
     again

Because you can easily associate any processing code with a pattern matching event (the firing of a find rule or a match alternative) you can process most data directly as it streams. You can output the result of your processing as part of responding to the event, knowing it will all be collected by the current output scope and streamed to the proper destination.

Order of find rules or match alternatives

If one string of data could cause more than one pattern to match, the pattern that occurs first in the scanning construct will fire, and the one that occurs later will not. This allows you to put more specific find rules or match alternatives before more general ones and have the general ones fire only if the specific ones do not. The following two programs produce different output because of the order of their find rules:

  global integer wordcount initial {0}

  process
     submit "Mary had a little lamb"
     output "d" % wordcount || "%n"

  find "had"
     output "*"

  find letter+
    increment wordcount

  find any

The program above prints "*4". The program below changes the order of the find rules and produces a different output.

  global integer wordcount initial {0}

  process
     submit "Mary had a little lamb"
     output "d" % wordcount || "%n"

  find letter+
     increment wordcount

  find "had"
     output "*"

  find any

This program prints "5".

You must always place more specific rules before more general rules that can match the same data, or the more specific rules will never fire.

Scanning vs. parsing

OmniMark has integrated streaming XML and SGML parsers. The parsers work by scanning their input source. When you invoke a parser, you hand off the task of scanning the data to the parser. The parser then reports the markup elements it finds through markup rules. You can process XML or SGML data by responding to markup rules. As with scanning, you can usually process the data by direct response and output, without the building of data structures.

       
 

Top [ INDEX ] [ CONCEPTS ] [ TASKS ] [ SYNTAX ] [ LIBRARIES ] [ LEGACYLIBRARIES ] [ ERRORS ]

OmniMark 7.1.2 Documentation Generated: June 28, 2005 at 5:44:42 pm
If you have any comments about this section of the documentation, send email to [email protected]

Copyright © Stilo Corporation, 1988-2005.