Data content, processing

By default, the data content of an XML or SGML document is streamed through to the current output scope by the parser. You can intercept and process data content in one of three ways:

with data-content rules
with translate rules
by scanning "%c" in an element rule

Using data-content rules

If you add a data-content rule to your program, it will be fired whenever a continuous piece of text data occurs in your input data. You can then process that text by scanning "%c":

  data-content
     repeat scan "%c"
        ...
     again

You can restrict a data-content rule to a particular element by adding a condition to the rule:

  data-content when element is "product-name"
     repeat scan "%c"
        ...
     again

A data-content rule processes a contiguous sequence of text characters. A contiguous sequence of text characters is bounded by:

the start of an element
the end of an element
a processing instruction
an external CDATA, SCDATA, NDATA, or SUBDOC entity reference

using translate rules

If you put translate rules into your program they will scan data-content (and attribute content) automatically, without the need for you to explicitly initiate scanning. In effect, translate rules work like find rules, except that they are initiated by do xml-parse or do sgml-parse instead of submit.

  translate "$" digit+ => dollars
            ("." digit{2} => cents)?

     output dollars
     output "," || cents when cents is specified
     output "$"

When processing SGML, you can also use translate rules to capture and process entities.

scanning "%c" in an element rule

You can also process data-content by scanning "%c" in an element rule. However, you should be aware that such a scanning process will scan the result of all the parsing operations that take place on the content of an element, including the processing of any element, translate, or data-content rules, not on the raw data content of the element.

You should scan "%c" only if you know that the current element contains only data content or you want to scan the result of parsing the current element. Bear in mind that even if the element has only data content, any applicable translate rules and data content rules will fire before the scanning operation takes place, and the scanning source will be the output of those rules acting on the data content, not the raw data content of the element.

[ INDEX ] [ CONCEPTS ] [ TASKS ] [ SYNTAX ] [ LIBRARIES ] [ LEGACYLIBRARIES ] [ ERRORS ]

OmniMark 7.1.2 Documentation Generated: June 28, 2005 at 5:44:35 pm
If you have any comments about this section of the documentation, send email to [email protected]

Copyright © Stilo Corporation, 1988-2005.