Up-translation: translating documents into XML/SGML

An up-translation is a translation whose output is generally a complete XML or SGML document. OmniMark parses the XML/SGML document as it is generated, and any errors are reported. The same XML/SGML document that is parsed is the "main output" of the program. Up-translations work well for relatively simple documents. For complex documents, context-translations are almost always preferable.

OmniMark also provides the ability to send information to the main output or the XML/SGML parser individually. This allows you to send the XML/SGML prolog to the XML/SGML parser without sending it to the main output, for example. That way, the output consists solely of the document instance. This is very useful for environments where the document instances are stored separately from the DTDs.

An up-translation must begin with up-translate.OmniMark places no restrictions on the format of the input to an up-translation.

When writing an up-translation, use find rules to describe the patterns of interest in a document and the actions to take to transform the document into an XML/SGML document.

An up-translation operates as follows:

  1. OmniMark examines each rule in turn, looking for a rule that can match text at the current position. It selects the first rule with no condition or whose condition is true.
  2. If a rule is selected, the actions in that rule are performed in order. If you want to use any of the matched text in an action, you can capture that text in pattern variables.
  3. If no rules can be selected, OmniMark allows text at the current point to "fall through" to the currently selected output targets, which are typically both the main output and the XML/SGML parser.
  4. The text that "fell through" or was matched is consumed, and the cycle begins again.

As markup is found and submitted to the parser, OmniMark will collect context information; that is, it will collect information about the document hierarchy being formed. This context information can be used in find rules to qualify subsequent find rules.

In an up-translation, the XML/SGML document created is strictly a result of the patterns that can be found, in context, in the input document. The final XML/SGML document provided at the output is identical to the document provided to the parser.

If there are errors in the generated markup, the parser will report the markup errors and perform as much error correction as possible. You can customize or even act upon error reports to help the program recover from such markup errors.

Prerequisite Concepts
Related Topics