XML documents may contain CDATA marked sections. SGML documents may contain CDATA, RCDATA, IGNORE, and INCLUDE marked sections. OmniMark provides markup rules for handling all these types of marked sections.
CDATA and RCDATA marked sections serve to protect text from being misinterpreted as markup (start tags, end tags, entity references or declarations). These marked sections affect how the data is parsed by the SGML parser, but they do not usually affect the way that OmniMark processes the resulting data content.
It is very important to understand that the presence or absence of
marked-section rules does not affect how marked sections are treated by the SGML parser. They only determine how the SGML parser presents the resulting text to OmniMark.
A similar set of statements applies to CDATA and RCDATA marked sections as applies to IGNORE marked sections. The major difference is that the "default" processing for CDATA and RCDATA marked section is to treat their text content as data content, and not to discard it.
marked-section rcdatarules, then OmniMark treats the text resulting from these marked sections as if the text resulted from ordinary data content. In other words, OmniMark does not detect the boundaries between the text originating from inside the marked section and the text originating from outside the marked section.
marked-section cdatarule may be selected for a CDATA marked section. That is, either there must only be one
marked-section cdatarule or, if there is more than one such rule, each must have a condition. Similarly, only one
marked-section rcdatarule may be selected for an RCDATA marked section. It is an error for more than one
marked-section rcdatarule to be selected for a CDATA or an RCDATA marked section.
%coperator captures the text of a CDATA or RCDATA marked section. Either
suppressmust be used exactly once in a
marked-section rcdatarule. All modifiers supported by
%ccan be used on a
%coperator in a
marked-sectionrule in the OmniMark program.
sgml-outaction determines what happens to record ends in the text of a CDATA and RCDATA marked section.
OmniMark programmers should note that, in keeping with the provisions of clause 10.4.1 of the SGML standard (ISO 8879:1986), all pairs of "<[" and "]]>" within an IGNORE marked section are matched and treated as text. This means that any marked sections nested within an IGNORE marked section, including the opening and closing delimiters, are treated as part of the text of the IGNORE marked section.
The text of an IGNORE marked section consists of all the characters between the DSO delimiter following the status keyword specification, and the marked section end (that is, between the "[" following the keyword IGNORE and the "]]>"). The text does not include the surrounding delimiters, but does include any record ends or white space within the marked section.
Any SGML comment in the header of an IGNORE marked section is processed prior to the processing of the IGNORE marked section.
Only marked sections in the document instance are available for processing by an OmniMark program. Marked sections in the DTD are always ignored, whether or not there is any
marked-section rule in the OmniMark program.
The setting of the
sgml-out action determines what happens to record ends in the text of an IGNORE marked section.
SGML comments and
rcdata marked sections are all processed similarly. However,
include marked sections require quite a different approach. Instead of having one rule to process an
include marked section, OmniMark provides two: one for processing the start of a marked section and one for the end. This split is necessary because, unlike other types of marked sections, an
include marked section can start in the context of one element and end in another, and so can overlap the hierarchical structure that ties the components of a parsed SGML document together.
This kind of overlapping cannot happen with
rcdata marked sections because they inhibit the recognition of other markup, including start and end tags, within their text. An important consequence of this is that the whole of the text of an
rcdata marked section is processed with one set of output streams (as used by the
output action and as available using the
#current-output stream set) and inherits the stream destinations and stream modifiers from the
data-content rule that processes the surrounding content.
The contents of an
include marked section, can be part of one or more elements, the
data-content rules for which each may specify different output destinations and stream modifiers. To avoid all the complexity and user confusion that could result from trying to "merge" the specifications of the rules for
include marked sections and the applicable
include marked section rules only apply to the start and end of an
include marked section. The
include marked section's rules have no direct influence on the processing of the marked section's content. The two rules are the
marked-section include-start and
marked-section include-end rules.
The OmniMark program can influence the processing of the content of an
include marked section by setting global variables and testing them in
data-content rules, so that those rules can detect when they occur in an
include marked section.
This is an example of an INCLUDE marked section overlapping the element structure of a document:
<title>Part of the title. <![INCLUDE[More of the title. <p>The first paragraph. <p>Part of the second paragraph. ]] More of the second paragraph.