The OmniMark XML and SGML parsers automatically resolve internal entities. With general text entities, the only kind supported in XML, you cannot tell if a piece of text is the result of an entity being resolved, or if it occurred naturally in the document. With CDATA and SDATA character entities, supported by SGML, it is possible to tell that a piece of text is an entity replacement.
If you are creating XML output from an XML input document, the parser will resolve entities in the input document. If you need equivalent entities in the output document you can create them by matching the replacement text of the entities, and outputting the appropriate entities. You do this with translate rules:
translate cdata "OmniMark Technologies" output "&om;" translate sdata "streaming programming model" output "&str;"
The first rule will match the text "OmniMark Technologies" only if it is the expansion of a CDATA entity. The second rule will match "streaming programming model" only if it is the expansion of an SDATA entity.
You can match text that is the replacement of a CDATA or SDATA entity (but not the replacement of a general text entity) using the
entity pattern modifier:
translate "Company: " entity "OmniMark Technologies"
This pattern matches if "OmniMark Technologies" is the replacement of a CDATA or SDATA entity. Note that the pattern does not match if "OmniMark Technologies" is the expansion of a general text entity.
This pattern will match if the phrase "OmniMark Technologies" is plain data content or the replacement of an SDATA entity, but not if it is the replacement of a CDATA entity.
You can match text only if it does not contain any SDATA or CDATA entity replacement text using the pattern modifier
entity work by first finding an entity replacement string of the specified type and then testing to see if the text matches the specified pattern. They do not work by matching the text and then looking to see if it is an entity replacement. Thus the pattern following
cdata is scanned as a source in its own right. This means:
entitymust match the complete replacement text of a single entity. (It is, in effect, a
matchestest on the replacement text of the entity.)
translate cdata any*.
non-sdata patterns treat text that is the replacement of an excluded entity type as inherently unmatchable, even by
any. This means that you can write a pattern like the following:
This pattern will match all the text upto the first SDATA entity replacement, then the entire text of the SDATA replacement text.
You can also match entity replacement text based on the name of the entity.
The above rule will match any text that is the replacement of a CDATA entity named "om".
As with the matching of replacement text, the pattern following
named is the equivalent of a
matches test on the name of the element. You can use the pattern to limit which entity names are matched:
This rule succeeds for any internal CDATA or SDATA entity whose name consists of a single letter.
You can capture the name of the entity with a pattern variable:
This rule matches all CDATA and SDATA entities by name and outputs the equivalent entity.
You can capture both the name and the text to pattern variables like this:
translate (sdata named any* => name) => text output "The value of entity '" || name || "' is '" || text || "'."
As in the case for matching the replacement text of an internal CDATA or SDATA entity, the pattern that follows the keyword
named must match the whole of an entity's name.
You can match the replacement text of a CDATA or SDATA entity based on both its name and it value. Remember that the entity pattern modifiers work by first identifying an entity replacement string and then testing to see if it meets the specified criteria. Therefore, when you test both the name and the value, the entity must meet both criteria for the pattern as a whole to match. The name test is prefixed by
named and the value test is prefixed by
This gives you another way to capture both the name and value of an entity: