swirl
Guide to OmniMark 8   OmniMark home
docs home 
IndexConceptsTasksSyntaxLibrariesLegacy LibrariesErrors
 
Prerequisite Concepts     Related Syntax  

String source data type

You can use the string source data type to represent a source of string data. Since a string source must actively generate data, you cannot declare a variable of type string source, you can only create a function of that type. Such a function will stream the data it produces to the calling environment. For example, the following function produces a stream of numbers:

  define string source function 
     numbers from value integer first
               to value integer last
  as
     repeat for integer i from first to last 
        output "%d(i) "
     again
  
  process
     output numbers from 1 to 100

A string source function streams the data that it creates to the calling context. The calling context becomes the current output for the duration of the function (unless it is explicitly changed during the execution of the function). All data output during the execution of the function, including output created by other functions or rules invoked during the execution of the function, is streamed to the calling context. A string source function does not buffer the data it creates, but streams it incrementally to its calling context. To accomplish this, a string source function runs as a coroutine with the calling environment. When two functions run as coroutines, control is handed back and forth between them until the data is completely processed, ensuring that the data is not buffered as it passes from one routine to the next. In OmniMark 8 you can have as many connected coroutines as you need. In the example above, the numbers function and the output action run as coroutines. In the following example, the roman and numbers functions and the output action all run as coroutines with each other:

  define string source function 
     numbers from value integer first
               to value integer last
  as
     repeat for integer i from first to last 
        output "%d(i) "
     again
  
  define string source function 
     roman value string source numbers
  as
     repeat scan numbers
     match digit+ => num
        output "i" % num
     match [\digit]+ => chars
        output chars
     again
  process
     output roman numbers from 1 to 100

The roman function in the example above uses string source as the type of a function argument. You can declare an argument of type string source. Naturally the object passed to such an argument must be a string source: either a function of type string source, or a built-in OmniMark source such as #main-input. A string source function (like any source) can only operate in a streaming fashion if it is called in a streaming context. If a string source function is called in non-streaming context, such as the set action, it will not operate in a streaming fashion and will buffer its data completely before it returns, as in the following example:

  process
     local string number-string
  
     set number-string to numbers from 1 to 100
     output number-string

Here the string source function numbers is called by the set action for the string variable number-string. The function runs to completion and returns its entire value to the set action, just as if it had been a string-returning function.

There is nothing inherently incorrect about calling a string source function in a non-streaming context. In fact, it may be a useful habit to develop to write string source functions rather than string functions, since a string source function is never less efficient than a string function, and is often more efficient. Just recognize that data is being buffered when a string source function (or any other kind of source) is called in a non-streaming context and that this may have negative consequences for the performance or resource requirements of your program if the amount of data being buffered is large.

The string source argument must always be a value argument. You can use a value string source argument for both internal and external functions.

A string source function can be either an internal function or an external function.

Strings vs. string sources

The difference between a string source and a string is that the string is static: it exists in a particular place and can be referenced at will. Therefore if you pass a string to a function, you can reference that string as often as you like:

  define string function 
     duplicate value string to-be-duplicated
  as 
     return to-be-duplicated || to-be-duplicated
  
  process 
     output duplicate "Hip " || "Hooray%n"

A string source, by contrast, is a dynamic supply of characters, and once that supply is exhausted, you can't get the same characters again:

  define string function 
     duplicate value string source to-be-duplicated
  as 
     return to-be-duplicated || to-be-duplicated
  
  process 
     output duplicate "Hip " || "Hooray%n"

Unlike the first program, which outputs "Hip Hip Hooray", this version outputs only "Hip Hooray", since the string source to-be-duplicated is fully drained the first time it is referenced.

It you needed to output the value of a string source twice, you would need to capture the output of the source in a variable:

  define string function 
     duplicate value string source to-be-duplicated
  as 
     local string temp
  
     set temp to to-be-duplicated
     return temp || temp
  
  process 
     output duplicate "Hip " || "Hooray%n"

It never makes sense to write a function this way, however, since writing the function with a string argument, rather than a string source argument would achieve the exact same effect - draining the stream into a local string variable in the function.

Notice that this restriction only applies to an instantiated source, which, in practice, means OmniMark sources and string source parameters within functions. String source functions can be called as many times as you like, since they instantiate a new source each time they are called.

A string source can be used wherever a value of type string is expected. The source will be drained into the string. With one exception, a string can be used wherever a string source is expected: a new source will be instantiated to provide the contents of the string to the calling environment. In this case, the string is used to initialize the string source, but the string is not affected when the string source is drained. Its value remains unchanged.

The one place where a string cannot be used instead of a string source is as a destination for the signal action: signal to must be followed by a string sink or string source name.

#current-input in string source functions

There is one source that is normally present everywhere in an OmniMark program: #current-input. During the execution of a normal function, the current input scope of the calling environment is available to the function as #current-input, as illustrated in the following program:

  define string function 
     parse
  as
     do xml-parse scan #current-input
        return "%c"
     done
  
  element "greeting"
     output "%c"
  
  process 
     using input as "<greeting>Hello World</greeting>"
        output parse

Because a string source function is itself a generator of data, however, #current-input is not attached in a string source function. Thus if the above program were rewritten as follows, OmniMark would report an error when the program tried to read the unattached source #current-input.

  define string source function parse
  as 
     do xml-parse scan #current-input
        output "%c"
     done
  
  element "greeting"
     output "%c"
  
  process 
     using input as "<greeting>Hello World</greeting>"
     output parse

To make #current-input available to a string source function, it must be passed to the function explicitly as a string source argument:

  define string source function 
     parse value string source to-be-parsed
  as 
     do xml-parse scan to-be-parsed
        output "%c"
     done
  
  element "greeting"
     output "%c"
  
  process 
     using input as "<greeting>Hello World</greeting>"
      output parse #current-input

Defining a string source function

The syntax of a string source function definition is:

  define string source    function      
     <function name> <function argument list>
  (as
      <function body> | elsewhere)

or, in the case of an external function:

  define external string source function 
       <function name> <function argument list>
  as 
     <external name> (in function-library <library name>)?

Ending a string source

You can use a return action with no value to end a string source function, or you can simply allow the function to end. There is no operational difference between the two, except that no part of function body will be executed after the return is executed. return is therefore useful if you want to end the function within a conditional construct. Alternatively, you can throw an exception from a string source function. If the exception is not caught within the function itself, it will propagate to the scope where the function was called from.

Whether a string source function ends by throwing, returning, or by reaching the end of the function body, its consumer will continue execution notrmally, but the string source referring to the function will be at value-end. On the other hand, if the body of a scope consuming a string source function ends or throws before consuming the entire source, the function will be halted and only its always rules will run. In either case, the program execution then proceeds after the scope where the string source function was called.

In the following example, the function root-element-of consumes only the name of the first element and discards the rest of the input. The string source function normalize is halted at that point.

  define string source function
     normalize value string source document
  as
     do xml-parse scan document
        output "%c"
     done
  
  define string function
     root-element-of value string source document
  as
     do scan normalize document
     match "<" [letter \ digit | "-_.:"]+ => element-name
        return element-name
     done
  
  process
     output root-element-of #main-input
  
  element #implied
     output "<%q>%c</%q>"

Superseded functionality

The string source data type replaces the input type and the external source type, which are deprecated.

The input function type declaration:

  define input function ...

is deprecated in favor of the string source function type declaration:

  define string source function ...

The external source function type declaration

  define external source function ...

is deprecated in favor of the external string source function declaration:

  define external string source function ...

The value source parameter declaration:

  define external function foo
     value source origin

is deprecated in favor of the value string source parameter declaration:

  define external function foo
     value string source origin

Prerequisite Concepts
   Data types
   Filter Functions
   String sink data type
 
  Related Syntax
   define string source function
   using input as
 
 

Top [ INDEX ] [ CONCEPTS ] [ TASKS ] [ SYNTAX ] [ LIBRARIES ] [ LEGACY LIBRARIES ] [ ERRORS ]

OmniMark 8.2.0 Documentation Generated: March 13, 2008 at 3:25:49 pm
If you have any comments about this section of the documentation, please use this form.

Copyright © Stilo International plc, 1988-2008.