When parsing SGML or XML data, the parsed data can be formatted in one of two approaches. The first approach is
to capture the parsed data in an OmniMark string
and then use regular OmniMark programming techniques to
format the data. The other is to give instructions to OmniMark on how the data should be formatted. The latter
approach is often faster and cleaner.
OmniMark can be instructed on how to format data using format
items and placing the appropriate format modifiers on the format commands %c
, %v
, and
%q
.
Unless intercepted in a data-content
rule, the parser outputs data content to the current output scope.
Format modifiers can be added to the parse continuation operator, %c
, to change how the data
content is formatted.
The following modifiers are supported.
h
modifier prevents line-breaking rules (insertion-break
and
replacement-break
) from applying to the content of the current component.
l
modifier converts all text to lowercase. It only applies to letters in the processed
document (data characters in content and attribute values) copied from the input to the output. It does not
apply to letters in quoted strings in the OmniMark program.
u
modifier converts all of the text to uppercase. It only applies to letters in the
processed document (data characters in content and attribute values) copied from the input to the output. It
does not apply to letters in quoted strings in the OmniMark program.
s
modifier causes white space to be stripped in the processed content.
s
modifier affects only text received directly from the SGML or XML parser, or from
characters specified with format items that explicitly allow stripping.
z
modifier turns off translate
rules that would otherwise apply to all or part of
the content.
It is possible to override the subcomponents, even those going into the same stream, by removing modifiers with
the following syntax:
put my-stream with "" "%c"
The %q
format item refers to the name of the currently opened element everywhere except in
external-text-entity
and external-data-entity
rules. In functions, even if the
function is called from an external-text-entity
or an external-data-entity
rule,
%q
still refers to the currently opened element. This is to ensure that a function always behaves in the
same way, regardless of what rule it is called from. Outside of an external-data-entity
rule or an
external-text-entity
rule, name of entity
can be used to obtain the name of the currently-active
external entity.
When referring to an element, a number of modifiers can be used to change the default behaviour of %q
.
l
modifier converts all text to lowercase; it cannot be used with the u
modifier.
u
modifier converts all text to uppercase; it cannot be used with the l
modifier.
f
modifier, the field width modifier, is allowed with the %q
format. The
f
modifier must be preceded by a positive number. If the specified number is less than the
minimum number of characters needed to format the value, the modifier is ignored. If it is greater, space
characters are added to the right of the value to fill it out to the field width.
k
modifier is allowed only with the f
modifier: it modifies the behaviour
of f
to put the spaces to the left of the element name, instead of to the right.
%q
refers to the name of the current entity in the body of an
external-data-entity
or an external-text-entity
rule.
When referring to an entity, a number of modifiers can be used to change the default behaviour of %q
.
e
modifier causes OmniMark to access the system identifier from the entity declaration
instead of the entity name. The system identifier can also be obtained using system-identifier of
entity
, a syntax that will work outside of external-data-entity
or an external-text-entity
rule.
o
modifier causes OmniMark to access the notation name from the entity declaration
instead of the entity name. This modifier can only be used in external-data-entity
rules because
external text entities do not have a notation. This is the only format modifier that can be combined with the
f
or k
format modifiers, described above.
p
modifier causes OmniMark to access the public identifier from the entity declaration
instead of the entity name. The public identifier can also be gotten using public-identifier of
entity
, a syntax that will work outside of external-data-entity
or an external-text-entity
rule.
These modifiers can also be used in combinations.
ep
causes OmniMark to access the system identifier obtained by searching for
the entity's public identifier on the #library
shelf. This can also be had using
#library{public-identifier of entity}
eo
causes OmniMark to access the system identifier declared for the notation
associated with the entity. This combination can only be used in an external-data-entity
rule because external text entities do not have notations.
eop
; this causes OmniMark to access the system identifier
obtained by searching for the entity notation's public identifier on the #library
shelf. This
combination can only be used in an external-data-entity
rule because external text entities do not
have notations.
If an entity does not have a system identifier, then the e
format modifier acts the way ep
does. system-identifier of entity
exhibits the same fallback behaviour
If an entity does not have a public identifier, or if the #library
shelf does not have an entry whose
key is the public identifier, then it is an error to use the ep
format modifier combination. If such
an entity also does not declare a system identifier in the entity declaration, then it is also an error to use the
e
format modifier. The same observation applies to the system identifier of the entity's notation
when using the above format modifiers in combination with the o
format modifier. In a similar
fashion, if an entity does not have a public identifier, it is an error to use public-identifier of
entity
, while if an entity does not have a system identifier, it is an error to use system-identifier of
entity
.
When using the formatting modifiers above, all of these combinations may be further combined with the l
or u
format modifiers. Additionally, the o
format modifier can also be combined with
the f
and k
format modifiers, provided that it is not also combined with the e
or p
modifiers. When using the keyword forms, similar results can had by using string operations:
for example, %leq
is equivalent to
"l" % system-identifier of entity
The f
and k
format modifiers can only be used with entity names and notation names.
The %v
format item is used to output an attribute of an element or of an
external-data-entity
.
The following example outputs the section identifier (the attribute named ID
) when processing an
SGML document:
element "section" output "Section: %v(ID) %c"
The DTD for the above example must contain lines similar to the following:
<!element section - o (#pcdata)> <!attlist section id number #required>
In element
rules, the named attribute must be an attribute of the element; in
external-data-entity
rules, it must be a data attribute of the entity being processed. When %v
is used in any other context, the named attribute must be an attribute of the containing element.
A number of modifiers can be used to change the default behaviour of %v
.
l
modifier forces the letters in the attribute value to lowercase.
u
modifier forces the letters in the attribute value to uppercase.
f
modifier, the field-width modifier is allowed in the %v
format; it is
ignored for cdata
. The f
modifier must be preceded by a positive number. If the
number is less than the minimum number of characters needed to print the attribute value, it is ignored. If
it is greater, space characters are added to the right of the value to fill out the field width.
k
modifier is allowed only with f
: it changes the behaviour of f
to put the spaces to the left of the value instead of to the right.
In addition to the above modifiers, the modifiers below can be used if the attribute is declared as cdata
.
h
modifier prevents the insertion of line breaks.
s
modifier minimizes white spaces, as follows:
z
modifier prevents selection of any translate
rules that would otherwise
apply to all or part of the attribute value.
If the attribute's declared type is entity
or entities
, and the entity name refers
to an external entity, you can use the following modifiers (but not with the f
, k
,
l
, and u
modifiers):
e
modifier causes OmniMark to access the system identifier from the entity declaration
instead of accessing the entity name.
o
modifier causes OmniMark to access the notation name from the entity declaration
instead of from the entity name. In this case, the f
, k
, l
, and u
modifiers can be used.
p
modifier causes OmniMark to access the public identifier from the entity declaration
instead of accessing the entity name.
These modifiers can be combined.
ep
causes OmniMark to access the system identifier, as obtained by using the
entity's public identifier to index into the #library
shelf.
eo
causes OmniMark to access the system identifier declared for the notation
associated with the entity.
pov
causes OmniMark to access the public identifier declared for the notation
associated with the entity.
epov
causes OmniMark to access the system identifier, as obtained by using the
entity notation's public identifier to index into the #library
shelf.
If an entity has no system identifier, then e
acts as ep
does. It is an error if
either
e
or ep
is used and a system identifier cannot be obtained using the #library
shelf.
This format accesses letters within system and public identifiers in uppercase or lowercase as they appear in the entity declaration. Letters in element, entity, or notation names appear in uppercase or lowercase as they appear in the processed document, unless the SGML declaration specifies uppercase substitution for that class of name. If so, the name is accessed with letters forced to uppercase. Thus, in the Reference Concrete Syntax, by default, element and notation names appear in uppercase while entity names appear as entered in the document.
For an entities
attribute, if the attribute value contains more than one entity name, the
using
prefix must be used to select one entity whose system or public identifier is to be manipulated or
displayed.
If the value of an entity
or entities
attribute is the name of an internal cdata
or sdata
entity, the %ev
format can be used to determine the replacement text of
the internal entity.
The e
, p
, and ep
formats can also be used with notation
,
under the same conditions as entity
or entities
.
This example illustrates how the %ev
format handles internal and external entities differently.
The element as-is has a single required ENTITY attribute text. The entity named by the
attribute value provides the text that is to replace the element, wherever it occurs in a document.
<!element as-is - o empty> <!element as-is text entity #required>
The element
rule for processing the as-is element does the following:
element
rule
uses the system identifier declared for the entity, or the system identifier found by indexing the #library
shelf with the entity's declared public identifier, as a filename, and replaces the element
with the contents of the file. (OmniMark reports an error if the entity does not have a system identifier
and does not have a public identifier mapped to a system identifier, or if the system identifier names a
nonexistent file. This may or may not be appropriate for a particular OmniMark program.)
element
rule uses the replacement text of the
internal entity and replaces the element with that text.
Note that %ev
returns one of two things, depending on whether the entity named by the attribute to
which it is applied is internal or external.
element "as-is" do when attribute text is external output file "%ev(text)" else output "%ev(text)" done suppress