|
|||||
|
|||||
Scanning |
Scanning is OmniMark's principal data processing mechanism. Scanning in a powerful mechanism that provides simple solutions to most data processing problems. As an OmniMark programmer you must learn to think of data processing problems first and foremost as scanning problems. Problems that you would solve by pointer manipulation, indexing, or substring operations in other languages are generally handled by scanning in OmniMark.
Scanning is a process of progressing systematically through a input source and testing that data against one or more patterns. Whenever a pattern matches, a corresponding set of actions is executed.
OmniMark provides three scanning constructs:
submit
and find
rules
repeat scan
do scan
It also provides three scanning operators:
If you need to build a data structure from a data source, you do it by scanning the data and assigning data captured by patterns to elements of your data structure:
process local stream files variable repeat scan file "filelist.txt" match any-text+ => file-name "%n" set new files to file-name again repeat over files ;process the file again
But in most cases it is not necessary to build data structures. You can process your data directly as part of the scanning operation:
process repeat scan file "filelist.txt" match any-text+ => file-name "%n" ;process the file file-name again
Because you can easily associate any processing code with a pattern matching event (the firing of a find rule or a match alternative) you can process most data directly as it streams. You can output the result of your processing as part of responding to the event, knowing it will all be collected by the current output scope and streamed to the proper destination.
If one string of data could cause more than one pattern to match, the pattern that occurs first in the scanning construct will fire, and the one that occurs later will not. This allows you to put more specific find rules or match alternatives before more general ones and have the general ones fire only if the specific ones do not. The following two programs produce different output because of the order of their find rules:
global integer wordcount initial {0} process submit "Mary had a little lamb" output "d" % wordcount || "%n" find "had" output "*" find letter+ increment wordcount find any
The program above prints "*4". The program below changes the order of the find rules and produces a different output.
global integer wordcount initial {0} process submit "Mary had a little lamb" output "d" % wordcount || "%n" find letter+ increment wordcount find "had" output "*" find any
This program prints "5".
You must always place more specific rules before more general rules that can match the same data, or the more specific rules will never fire.
OmniMark has integrated streaming XML and SGML parsers. The parsers work by scanning their input source. When you invoke a parser, you hand off the task of scanning the data to the parser. The parser then reports the markup elements it finds through markup rules. You can process XML or SGML data by responding to markup rules. As with scanning, you can usually process the data by direct response and output, without the building of data structures.
Copyright © Stilo International plc, 1988-2008.