|
|||||
|
|||||
Related Syntax | |||||
SGML record boundaries |
In SGML, a document's text consists of records which are surrounded by SGML RS (record-start) and RE (record-end) characters. In general, OmniMark prepares the text directed to the parser so it will be suitable to the SGML parser, and text returned by the parser is similarly treated for the markup rules. OmniMark programmers and users usually never need to be aware of these two operations, but exceptions can arise.
The vast majority of applications on all systems use the line-feed and carriage-return character values for the record-start and record-end characters. As a consequence, very few applications will be affected by this behavior. To be affected, an application must use an SGML declaration that specifies RE and/or RS function character values other than those usually used by the system on which OmniMark is running.
OmniMark uses the system-defined values of line feed and carriage return for record-start and record-end, respectively.
By default, OmniMark supports the SGML form of line representation in the following two ways:
#markup-parser
stream, each instance of the newline sequence, "%n", is converted to the two-character sequence (RE, RS). The effect of this conversion is that each newline sequence becomes the record-end mark for the line the newline ends as well as the record-start mark for the following line.
OmniMark can be used to override this behavior with the sgml-in
and sgml-out
actions. These actions are intended to be used when the application's view of record boundaries is different from that specified in the SGML declaration.
Additionally, these two actions can be used to suppress record boundary conversion.
There are some interdependencies between the value given in the newline
declaration and the default record boundary conversions that you should be aware of.
If no sgml-in
action is encountered prior to the output of (some) data to the #markup-parser
stream, then the default conversion depends on the value of the newline sequence, as follows:
newline
declaration in the OmniMark program, then all newline sequence characters in data output to the #markup-parser
stream are converted to the sequence of carriage return followed by line feed. For systems that use the ASCII character set, this is equivalent to sgml-in "%13#%10#"
.
#markup-parser
stream are not converted. For all systems, this is equivalent to sgml-in #none
.
These defaults are in effect until an sgml-in
action is encountered.
If no sgml-out
action is encountered prior to the processing of data content, then all record-end characters in data content are converted to the newline sequence prior to their being provided to markup rules. In other words, for all systems, the default sgml-out
action is: sgml-out "%n"
.
Record ends occurring in processing instruction text, IGNORE marked section text, and the text of SGML comments are subject to processing as record ends occurring in PCDATA. OmniMark converts the record ends to the value specified by the sgml-out
action. If the sgml-out
action specifies #none
, record-ends are provided to the markup rules in the form in which they come from the SGML parser.
The SGML standard (ISO 8879) doesn't address the processing of text in processing instructions, IGNORE marked sections, or SGML comments, as it does for data content. As a consequence, in these types of text, OmniMark's built-in SGML parser does not discard record-start characters, as it usually does in data content and attribute value text. When the sgml-out
action specifies #none
, record-start characters will be present in the text.
When the sgml-out
action specifies a string:
This processing is different than that for data content and attribute value text in which:
sgml-out
string, and
This processing ensures that, unless character references are used in strange ways, all "newlines" come out the same.
The conversion of the record-end/record-start sequence to the sgml-out
string occurs when the %c
operator is processing in a marked-section ignore
rule or an sgml-comment
rule, just as in a data-content
rule. For a processing-instruction
rule, the conversion occurs prior to the text of the processing instruction being matched to the pattern at the head of the rule.
The processing of record-starts and record-ends in the text of processing instructions differs between different versions of OmniMark:
sgml-out
action.
Record ends in processing instructions are not subject to the same rules as record ends in data content and attribute value text. The text in processing instructions is subject to the same processing as the text in SGML comments and IGNORE marked sections: any record-end/record-start sequence is replaced by the string specified by the sgml-out
action.
Related Syntax #markup-parser #sgml |