Internal entities: combining internal entity and plain-text matching

Internal SDATA entities are often used to represent characters that are not directly available in the character set being used, either at a particular location, or in a "lowest-common-denominator" interchange file. SDATA and CDATA entities can be matched as part of a larger pattern, as in following example:

  translate "AT" sdata named "amp" "T"
     output "\ITALIC(AT&T)"

A multitude of SDATA entities that represent individual characters are defined in Annex D of ISO 8879. Combining entity and other matches in a translate rule allows an entity to be treated as just another character.

Care must be taken in composing patterns that include entity matching. In the preceding example, the letter T is matched following the SDATA entity—the T is not part of what is matched as the entity's name. Parentheses can be used to modify this behavior. If the pattern were the following, the entity name would have to be ampT:

  translate "AT" sdata named ("amp" "T")
     output "\ITALIC(AT&T)"

Any form of entity match can be combined with other text matching. If, for example, the ampersand character were matched based on its replacement text rather than its name, the following translate rule could be used instead of that in the previous example:

  translate "AT" sdata "[amp     ]" "T"
     output "\ITALIC(AT&T)"

Prerequisite Concepts

Related Topics