In SGML, a document's text consists of records which are surrounded by SGML RS (record-start) and RE (record-end) characters. In general, OmniMark prepares the text directed to the parser so it will be suitable to the SGML parser, and text returned by the parser is similarly treated for the markup rules. OmniMark programmers and users usually never need to be aware of these two operations, but exceptions can arise.
The vast majority of applications on all systems use the line-feed and carriage-return character values for the record-start and record-end characters. As a consequence, very few applications will be affected by this behavior. To be affected, an application must use an SGML declaration that specifies RE and/or RS function character values other than those usually used by the system on which OmniMark is running.
OmniMark uses the system-defined values of line feed and carriage return for record-start and record-end, respectively.
By default, OmniMark supports the SGML form of line representation in the following two ways:
#markup-parserstream, each instance of the newline sequence, "%n", is converted to the two-character sequence (RE, RS). The effect of this conversion is that each newline sequence becomes the record-end mark for the line the newline ends as well as the record-start mark for the following line.
OmniMark can be used to override this behavior with the
sgml-out actions. These actions are intended to be used when the application's view of record boundaries is different from that specified in the SGML declaration.
Additionally, these two actions can be used to suppress record boundary conversion.
There are some interdependencies between the value given in the
newline declaration and the default record boundary conversions that you should be aware of.
newlinedeclaration in the OmniMark program, then all newline sequence characters in data output to the
#markup-parserstream are converted to the sequence of carriage return followed by line feed. For systems that use the ASCII character set, this is equivalent to
#markup-parserstream are not converted. For all systems, this is equivalent to
These defaults are in effect until an
sgml-in action is encountered.
sgml-out action is encountered prior to the processing of data content, then all record-end characters in data content are converted to the newline sequence prior to their being provided to markup rules. In other words, for all systems, the default
sgml-out action is:
Record ends occurring in processing instruction text, IGNORE marked section text, and the text of SGML comments are subject to processing as record ends occurring in PCDATA. OmniMark converts the record ends to the value specified by the
sgml-out action. If the
sgml-out action specifies
#none, record-ends are provided to the markup rules in the form in which they come from the SGML parser.
The SGML standard (ISO 8879) doesn't address the processing of text in processing instructions, IGNORE marked sections, or SGML comments, as it does for data content. As a consequence, in these types of text, OmniMark's built-in SGML parser does not discard record-start characters, as it usually does in data content and attribute value text. When the
sgml-out action specifies
#none, record-start characters will be present in the text.
sgml-out action specifies a string:
This processing is different than that for data content and attribute value text in which:
This processing ensures that, unless character references are used in strange ways, all "newlines" come out the same.
The conversion of the record-end/record-start sequence to the
sgml-out string occurs when the
%c operator is processing in a
marked-section ignore rule or an
sgml-comment rule, just as in a
data-content rule. For a
processing-instruction rule, the conversion occurs prior to the text of the processing instruction being matched to the pattern at the head of the rule.
The processing of record-starts and record-ends in the text of processing instructions differs between different versions of OmniMark:
Record ends in processing instructions are not subject to the same rules as record ends in data content and attribute value text. The text in processing instructions is subject to the same processing as the text in SGML comments and IGNORE marked sections: any record-end/record-start sequence is replaced by the string specified by the