|
|||||
|
|||||
Related Syntax | |||||
Input |
As a streaming language, OmniMark processes data by streaming it from one place to another and processing it as it streams. Within a program the source of the streamed data is always the current input scope. To get OmniMark to process a piece of data, therefore, you must:
You can often accomplish this in one step:
submit "Mary had a little lamb"
Here, the literal string "Mary had a little lamb" is the data to be processed. OmniMark automatically converts strings to sources when they are used as the argument of a scanning action. The submit
statement automatically creates an input scope and makes its argument the source for that input scope.
In this example, the source is a file:
submit file "mary.txt"
Here the file operator returns a source that is attached to the named file. Submit creates an input scope using that source.
To access network data sources, you use OMX components. For every OMX component that handles an external data source, there is a function that returns an OmniMark source attached to that external data source. This example uses the tcp.reader
function that works with the tcp.connection OMX component:
submit tcp.reader of connection
This command will scan the data from a TCP/IP connection directly as it streams from the sending application.
Notice that it does not matter whether a piece of data is internal or external to the program. In order to process the data you must scan or parse it. Anything from a simple stream variable or literal string to a multi-gigabyte data file is processed in the same way by the same scanning and parsing operations, and they are all "input" as far as OmniMark's streaming architecture is concerned. Any distinction between internal and external sources is taken care of by OmniMark or an OMX component. All your program sees is a standard OmniMark source, no matter what the original source of the data.
While all OmniMark's scanning and parsing operations establish a new input scope automatically when they are executed, you can establish a current input scope independent of any scanning or parsing action using the statement using input as
:
process using input as file #args[1] submit #current-input
You can perform a scanning or parsing operation on the data in the current input scope by scanning or parsing #current-input
(as in the sample above). This allows you to start a new scanning process to do a particular job in the scanning of an input source. This is discussed in nested pattern matching.
OmniMark reads sources incrementally as their data is scanned or parsed. This means that you can process very large data sources or -- in the case of network data perhaps -- sources of indeterminate length that go on for days and weeks without having to worry about running out of memory or needing to do any buffering yourself.
Sources and strings both represent linear data sequences. The difference is that a string must be fully read into memory when it is created, while a source is read incrementally as the data is needed.
OmniMark coerces strings to sources and sources to strings as required. You should be careful not to cause a large source to be coerced to a string unnecessarily, as this will result in it being read into memory completely, which may impose a performance or resource consumption penalty. For example, anything passed as a value stream argument to a function, or returned by a stream function, is coerced to a string.
Related Syntax #current-output #error #main-input #main-output buffered, unbuffered do sgml-parse do xml-parse submit |