Rules, rules-based programming

OmniMark is a rule-based programming language. This means that an OmniMark program consists of a set of rules. Each rule is fired when certain conditions are satisfied. The actions associated with the rule are then performed.

A rule is made up of two parts—a rule header and a rule body.

In the rule header you define the event which, when encountered in an input stream, will cause a certain action or set of actions to be executed. Rule headers are made up of two things: an event definition (usually a keyword followed by a name or pattern), and an optional condition which must be satisfied before the actions in the rule body are executed.

The rule body contains any number of local declarations and actions that are to be executed when the event in the rule header is encountered.

If the event defined in a rule header is encountered in an input source, and if the condition on that event is satisfied or evaluates to true, then the rule fires and its actions are executed.

OmniMark provides the following classes of rules:

  • process rules, used to initiate and control processing
  • find rules, used to scan data
  • markup rules, used to process data parsed by OmniMark's integrated XML and SGML parsers or an external parser.

Rules in action

Suppose you wanted to count the words in the text Mary had a little lamb. You would write an OmniMark rule that defined the occurrence of a word as an event:

  find letter+
          

This is an OmniMark find rule. find rules attempt to match patterns that occur in a data stream, and if they match something completely, they detect an event. This rule matches letters. The + sign after the keyword letter stands for "one or more", so this rule will go on matching letters until it comes to something that is not a letter, such as punctuation or a space. Having run out of letters, it will see if it needs to match anything else. Since it doesn't, the pattern is complete and the rule is fired. Any actions following the rule are then executed. This rule will fire once for every word in the data, so all that remains to do is increment an integer each time the rule is fired. A complete program to count the words in Mary had a little lamb looks like this:

  global integer word-count initial { 0 }
  
  process
     submit "Mary had a little lamb"
     output "d" % word-count || "%n"
  
  
  find letter+
     increment word-count
          

Nested execution model

Many programming languages encourage nested code, with functions calling functions calling functions. This helps modularize functionality in a regular programming language. It also makes the execution path rigid and makes it difficult to react to complex sequences of events. OmniMark code is very flat. While you can define and use functions, they are used only within OmniMark's principal execution unit, the rule, and cannot contain rules themselves. All OmniMark rules exist at the base level of the program. In OmniMark you tend to find not nested code, but nested execution.

In processing complex markup, with many nested elements, rules are invoked at each level as appropriate. If you are seven layers of markup deep, seven rules are in mid-execution. This means that you do not have to maintain complex state tables or parse trees. The current execution state of the OmniMark program itself maintains the current parse state for you and makes it easily addressable.

Since you cannot tell in advance the order in which the execution of rules may be nested, nesting the rules themselves would make no sense. Hence the simplicity and flatness of a typical OmniMark program.

Nevertheless, you can and should encapsulate common functionality in your OmniMark programs. OmniMark provides several facilities to do this including functions, groups, macros, include files and modules .

Prerequisite Concepts