swirl
Guide to OmniMark 9   OmniMark home
docs home 
IndexConceptsTasksSyntaxLibrariesLegacy LibrariesErrors
 
    Related Topics  

SGML record boundaries

In SGML, a document's text consists of records which are surrounded by SGML RS (record-start) and RE (record-end) characters. In general, OmniMark prepares the text directed to the parser so it will be suitable to the SGML parser, and text returned by the parser is similarly treated for the markup rules. OmniMark programmers and users usually never need to be aware of these two operations, but exceptions can arise.

The vast majority of applications on all systems use the line-feed and carriage-return character values for the record-start and record-end characters. As a consequence, very few applications will be affected by this behavior. To be affected, an application must use an SGML declaration that specifies RE and/or RS function character values other than those usually used by the system on which OmniMark is running.

OmniMark uses the system-defined values of line feed and carriage return for record-start and record-end, respectively.

By default, OmniMark supports the SGML form of line representation in the following two ways:

OmniMark can be used to override this behavior with the sgml-in and sgml-out actions. These actions are intended to be used when the application's view of record boundaries is different from that specified in the SGML declaration.

Additionally, these two actions can be used to suppress record boundary conversion.

There are some interdependencies between the value given in the newline declaration and the default record boundary conversions that you should be aware of.

If no sgml-in action is encountered prior to the output of (some) data to the #markup-parser stream, then the default conversion depends on the value of the newline sequence, as follows:

These defaults are in effect until an sgml-in action is encountered.

If no sgml-out action is encountered prior to the processing of data content, then all record-end characters in data content are converted to the newline sequence prior to their being provided to markup rules. In other words, for all systems, the default sgml-out action is: sgml-out "%n".

Comments, marked sections and processing instructions

Record ends occurring in processing instruction text, IGNORE marked section text, and the text of SGML comments are subject to processing as record ends occurring in PCDATA. OmniMark converts the record ends to the value specified by the sgml-out action. If the sgml-out action specifies #none, record-ends are provided to the markup rules in the form in which they come from the SGML parser.

The SGML standard (ISO 8879) doesn't address the processing of text in processing instructions, IGNORE marked sections, or SGML comments, as it does for data content. As a consequence, in these types of text, OmniMark's built-in SGML parser does not discard record-start characters, as it usually does in data content and attribute value text. When the sgml-out action specifies #none, record-start characters will be present in the text.

When the sgml-out action specifies a string:

This processing is different than that for data content and attribute value text in which:

This processing ensures that, unless character references are used in strange ways, all "newlines" come out the same.

The conversion of the record-end/record-start sequence to the sgml-out string occurs when the %c operator is processing in a marked-section ignore rule or an sgml-comment rule, just as in a data-content rule. For a processing-instruction rule, the conversion occurs prior to the text of the processing instruction being matched to the pattern at the head of the rule.

The processing of record-starts and record-ends in the text of processing instructions differs between different versions of OmniMark:

Record ends in processing instructions are not subject to the same rules as record ends in data content and attribute value text. The text in processing instructions is subject to the same processing as the text in SGML comments and IGNORE marked sections: any record-end/record-start sequence is replaced by the string specified by the sgml-out action.

    Related Topics
 
 

Top [ INDEX ] [ CONCEPTS ] [ TASKS ] [ SYNTAX ] [ LIBRARIES ] [ LEGACY LIBRARIES ] [ ERRORS ]

OmniMark 9.1.0 Documentation Generated: September 2, 2010 at 1:35:14 pm
If you have any comments about this section of the documentation, please use this form.

Copyright © Stilo International plc, 1988-2010.