Functions

Like most languages, OmniMark supports functions. Unlike many languages, an OmniMark program is not simply a hierarchy of functions. Rules are the principal structural element of OmniMark programs. Functions are supplementary structures. Functions cannot contain rules (though they can invoke them through submit, do xml-parse, do sgml-parse, or do markup-parse). You can use functions to encapsulate code you use commonly within different rule bodies.

There are three types of functions in OmniMark:

  • value-returning functions,
  • action functions, and
  • coroutine functions.

Value-returning functions

A function that returns a value can be defined as follows:

  define integer function
     add (value integer x,
          value integer y)
  as
     return x + y
          

The return type of the function is declared following the define keyword. It may be any OmniMark data type, including record or opaque types. The value is returned using the return keyword. return exits the function.

Here is how the add function can be called:

  process
     output "d" % add (2,3)
          

Coroutine functions

There are four types of coroutine functions in OmniMark:

When a value-returning function is called, its caller suspends execution while the function executes. The caller can proceed only after the function terminates and returns its single result value at the same time.

Coroutines always execute in pairs. When a coroutine function is called, two coroutines are created: one that executes the function, and its sibling coroutine that is specified along with the call. The caller suspends until both coroutines terminate their execution. Here is an example of a coroutine function and its call:

  define string sink function
     add-all ()
  as
     local integer result
  
     repeat scan #current-input
     match digit+ => n
        set result to result + n
  
     match white-space+
        ; EMPTY
     again
  
     put #main-output "d" % result
  
  
  process
     using output as add-all ()
     do
        output " 2"
        output " 3"
     done

In this example, the sibling coroutine for the function add-all is the dodone scope that outputs two numbers.

A coroutine function of string source or markup source type produces a continuous stream of values. As the function produces the stream, its sibling coroutine consumes it. The stream is terminated when the producing function terminates.

A coroutine function defined as string sink or markup sink consumes a stream produced by its sibling coroutine. When the consuming coroutine terminates, its producer sibling is forcibly terminated as well.

Action functions

An action function does not return a value. Rather, it performs an action. Here is how an action function might be defined. Note that it has no return type in the definition and no return is required:

  define function
     clear-flags (modifiable switch flags)
  as
     repeat over flags
        set flags to false
     again
          

This function clears all the switches on a switch shelf that is passed to it as a modifiable argument.

Action functions can generate output. The following function outputs characters in a specified range:

  define function
     output-character-range (value string start,
                             value string end)
  as
     repeat for integer i from binary start to binary end
        output "b" % i
     again
  
  
  process
     output-character-range ("A", "M")
          

While this type of function is permitted, it is generally preferable to write such functions as string source functions. This will improve the readability of your code and increase the generality of the functions. The function is changed to a string source function simply by adding string source to the definition and changing the function name from output-character-range to character-range:

  define string source function
     character-range (value string start,
                      value string end)
  as
     repeat for integer i from binary start to binary end
        output "b" % i
     again
  
  
  process
     output character-range ("A", "M")
          

Here the normal OmniMark keyword output can be used in the function call, enhancing the clarity of the program. In addition, the string source function can be used in a wider range of contexts such as:

  submit character-range ("A", "Z")
          

You can also write functions that both return a value and do output:

  define integer function
     add (value integer x, 
          value integer y)
  as
     output "I will add %d(x) and %d(y)%n"
  
     return x + y
  
  
  process
     local integer z
  
     set z to add (2,3)
     output "d" % z || "%n"

While it is certainly possible to program like this, we recommend that you avoid writing functions that both do output and return a value. Not only do they make it hard to follow your code, but they can have unexpected results. In particular, if the return value is directed to #current-output, you may not get the function's return values and output in the order you expected.

Pattern functions

You can use switch-returning functions as pattern functions.

You can also use string source functions or string-returning functions to dynamically define the text to be matched in a pattern.

Records and functions

Because record shelves are references, they behave slightly differently when passed to functions. In particular, the value of a record passed to a function as a value argument can be changed, since it is the reference that is passed by value, not the record itself. For the same reason, if the value of a record passed to a function as a value argument is changed, its value will also be changed in the calling environment, since it is the same record.

Recursion

You can call OmniMark functions recursively. The following program calculates the factorial of a number using a recursive function:

  define integer function
     factorial (value integer n)
  as
     do when n <= 0
        return 1
  
     else
        return n * factorial (n - 1)
     done
  
  
  process
      output "d" % factorial (7)
          

Overloading

Functions can be overloaded. See Functions: overloaded for details.

Side effects

The principal job of a function is to encapsulate a discrete operation. However, a function may have side effects on the global state of the program. While writing functions with side effects is appropriate in some situations, you should exercise caution when using this technique as it can lead to programs that are difficult to debug and hard to read and maintain.

Functions isolate sections of code, but don't isolate you from the current environment, in particular the current output scope. Output generated in an action function goes to the current output scope. If a function changes the destinations of the current output scope (with output-to), this carries over to the calling environment.

Similarly, accessing #current-input in a function can modify the current input being scanned by the calling environment, affecting whether patterns succeed or not. This can be desirable in some cases (e.g., pattern functions), but in other cases this can lead to programs that are difficult to understand and debug.

Function side effects can be particularly problematic with functions used in patterns and in the guards of rules. To allow for optimization of pattern matching routines, OmniMark does not define whether a pattern or the guard on a pattern is executed first (a pattern is itself a kind of guard on a statement, so this is sensible). You should never write a program that depends on the order in which a pattern and a guard on that pattern are executed.

Coroutine functions that modify the global state can also make program hard to understand because the order of their execution, and therefore of their side effects, depends on the data flow. You can restrict the scope of side effects by using save and by declaring the relevant global shelves as domain-bound.

In the case of patterns that fail, OmniMark does not guarantee that all parts of the pattern will be tried, or that the same parts will be tried in all circumstances. This allows OmniMark to optimize pattern matching. You should never write a program that depends on the side effects of a function called in a pattern that fails.