Aided translation types

Aided translation types are a succint way of encoding certain content transformations in OmniMark. However, with this succintness comes a rigidity to the program, and consequently a proportional loss in expressivity: programs written as aided translation types cannot always be extended to handle new requirements, without polluting the existing (working!) program. For this reason, the use of aided translation types is deprecated, although they are supported for compatibility with earlier versions of OmniMark.

This concept will show you how to transform a program written as an aided translation type into one that uses process rules and OmniMark's coroutining capabilities. Although the resulting program will typically be somewhat longer than the original, the gain in expressivity leads to more flexible and maintainable code. As a simple running example, we will seek to publish a small text file to HTML. The transformations performed will be straightforward and not representative of a production program. We will begin with the input

  Hello, World!
  Salut, Monde!
  Hola, Mundo!

and ultimately aim for the output

  <!DOCTYPE html>
  <html>
    <head><title>TITLE</title></head>
    <body>
      <ul>
        <li>Hello, World!</li>
        <li>Salut, Monde!</li>
        <li>Hola, Mundo!</li>
      </ul>
    </body>
  </html>

To keep things simple, content of the <title> element will be hard-coded.

Cross translation

cross-translate is used for content processing that is based purely on OmniMark's find rules. In a cross-translate program, OmniMark sends #main-input to the program's find rules, and sends the output of the find rules to #main-output.

Written as a cross translation, our program is

  cross-translate
  
  find-start
     output "<!DOCTYPE html>%n"
         || "<html>%n"
         || "  <head><title>TITLE</title></head>%n"
         || "  <body>%n"
         || "    <ul>%n"
  
  find-end
     output "    </ul>%n"
         || "  </body>%n"
         || "</html>%n"
  
  find line-start any-text+ => t ("%n" | value-end)
     output "      <li>" || t || "</li>%n"

In a cross-translate program, find-start and find-end rules can be used to handle processing that should occur before and after pattern matching begins and ends, respectively. Here, we have used these rules to emit the lead-in portion of our target HTML.

This program can be converted to a process program by defining a function to contain the orchestration that would ordinarily be handled by OmniMark:

  define string source function
     cross-translation (value string source s)
  as
     output "<!DOCTYPE html>%n" ; The body of FIND-START rules go here.
         || "<html>%n"
         || "  <head><title>TITLE</title></head>%n"
         || "  <body>%n"
         || "    <ul>%n"
     using group "cross translation"
        submit s
     output "    </ul>%n"       ; The body of FIND-END rules go here. 
         || "  </body>%n"
         || "</html>%n"
  
  process
     output cross-translation (#main-input)
  
  group "cross translation"
     find line-start any-text+ => t ("%n" | value-end)
        output "      <li>" || t || "</li>%n"

The function cross-translation () is typical of a filter function: it takes input in one form via a value string source argument, and outputs the transformed data on its #current-output. In writing our cross translation as a process program, we have used the flexibility to nestle the find rules in to a group named cross translation, thereby protecting them from other rules that might appear in the program; this group is made active in the process rule. Since find-start and find-end rules cannot appear in process programs, we have also indicated by use of comments where their content should be added to replicate the functionality they provided.

Although the process version of our publication utility is slightly longer than the corresponding cross-translate version, it is far more flexible. For example, if a new requirement wanted us to manipulate the output of our publication utility with further processing, we could accommodate this by adding an additional function to our call chain in the process rule:

  process
     output further-processing (cross-translation (#main-input))

where further-processing () would be defined as a string source function:

  define string source function
     further-processing (value string source s)
  as
     ; ...

See Linking chains of streaming filters for additional examples showing how to interconnect chains of coroutines.

Up translation

up-translate is used when converting unstructured content to SGML or XML. In an up-translate program, OmniMark sends #main-input to the program's find rules, and sends the output of the find rules to both #main-output and the selected markup parser for validation. By default, the selected markup parser is the SGML parser; however the XML parser can be invoked by appending the with xml qualifier to up-translate. We will use XML here. The find rules of an up translation program have access to the current markup context collected by the markup parser from the generated output. Although we will not use this functionality here, we will ensure that our process program is structured so that the markup context is available.

In principle, we could write our small publication utility as an up translation; but to satisfy the validation requirements of an up translation, we would be required to resolve and pull in the entire HTML DTD. For the sake of brevity and later development, we will instead target the following custom XML format

  <!DOCTYPE lines [
  <!ELEMENT lines (line)*>
  <!ELEMENT line  (#PCDATA)>
  ]>
  <lines>
  <line>Hello, World!</line>
  <line>Salut, Monde!</line>
  <line>Hola, Mundo!</line>
  </lines>

An up translation program that targets this output might be written as

  up-translate with xml
  
  find-start
     output "<!DOCTYPE lines [%n"
         || "<!ELEMENT lines (line)*>%n"
         || "<!ELEMENT line  (#PCDATA)>%n"
         || "]>%n"
         || "<lines>%n"
  
  find-end
     output "</lines>%n"
  
  find any-text+ => t ("%n" | value-end)
     output "<line>" || t || "</line>%n"

Much as we did in the case of cross-translate, above, we can convert this to a process program by defining a function to orchestrate what would normally be handled by OmniMark:

  define string source function
     up-translation (value string source input,
                     value string sink   destination)
  as
     using output as #current-output & destination
     do
        output "<!DOCTYPE lines [%n"           ; The body of FIND-START rules go here.
            || "<!ELEMENT lines (line)*>%n"
            || "<!ELEMENT line  (#PCDATA)>%n"
            || "]>%n"
            || "<lines>%n"
        using group "up translation"
           submit input
        output "</lines>%n"                    ; The body of FIND-END rules go here.
     done
  
  process
     using group "validate"
     do xml-parse document scan up-translation (#main-input, #main-output)
        suppress
     done
  
  group "up translation"
     find any-text+ => t 
        output "<line>" || t || "</line>"
  
  group "validate"
     element #implied
        output "%c"

Again, as before, we have taken the opportunity to wrap the rules (there is only one left in this example, but a realistic program might have several) into a group named up translation. We have also indicated by use of comments where the content of the find-start and find-end rules should be added to replicate their functionality.

Recalling that an up translation sends the output generated by the find rules to both #main-output and the selected markup parser, we have written our function up-translation () as a string source function that additionally takes a string sink argument: the string sink argument serves to feed #main-output, whereas the function's #current-output feeds the markup parser. The string source argument of up-translation () is attached to #main-input at the call site.

As was the case for cross-translate, above, this structure is more flexible and is more easily adjusted to accommodate requirements changes. Case in point: a strict up-translate program can only validate XML (or SGML) against a DTD; we would be unable to accommodate validation against a RELAX NG schema, should future requirements mandate it. Meanwhile, our equivalent process program can readibly be adjusted, simply by modifying the body of the process rule: allowing ourselves to be inspired by the example in relaxng.validator:

  process
     local relaxng-schema-type schema
  
     set schema to relaxng.compile-schema file "lines.rng"
  
     using output as relaxng.validator against schema 
     do xml-parse scan up-translation (#main-input, #main-output)
        output #content
     done

(See relaxng.validator for a more sophisticated invocation that allows for processing of markup errors.) In this version, the group validate is not needed. The rest of our program remains unchanged.

Down translation

A down-translate is typical of a publication pipeline that converts (say) XML to a formatted output. In a down-translate program, OmniMark sends #main-input to the selected markup-parser, and the output of the rules fired by the markup parser is sent to #main-output. By default, the selected markup parser is the SGML parser; however the XML parser can be invoked by appending the with xml qualifier to down-translate. We will use XML here.

Taking as input the custom XML we generated from our cross translation or up translation programs,

  <!DOCTYPE lines [
  <!ELEMENT lines (line)*>
  <!ELEMENT line  (#PCDATA)>
  ]>
  <lines>
  <line>Hello, World!</line>
  <line>Salut, Monde!</line>
  <line>Hola, Mundo!</line>
  </lines>

a simple publication pipeline to HTML written as a down translation might look like

  down-translate with xml
  
  document-start
     output "<!DOCTYPE html>%n"
         || "<html>%n"
         || "  <head><title>TITLE</title></head>%n"
         || "  <body>%n"
         || "    <ul>%n"
  
  document-end
     output "    </ul>%n"
         || "  </body>%n"
         || "</html>%n"
  
  element "lines"
     output "%c"
  
  element "line"
     output "      <li>%c</li>%n"

Much like where cross-translate and up-translate have find-start and find-end rules that allow for special processing before and after the input has been processed, down-translate has document-start and document-end rules that serve a similar purpose.

Just as we did earlier for cross-translate and up-translate, above, we can convert this to a process program by introducing a function that handles the lauching of the markup parser:

  define string sink function
     down-translation (value string sink s)
  as
     using group "down translation"
        using output as s
        do xml-parse document scan #current-input
           output "<!DOCTYPE html>%n" ; The body of DOCUMENT-START rules go here.
               || "<html>%n"
               || "  <head><title>TITLE</title></head>%n"
               || "  <body>%n"
               || "    <ul>%n"
           output "%c"
           output "    </ul>%n"       ; The body of DOCUMENT-END rules go here.
               || "  </body>%n"
               || "</html>%n"
        done
     
  process
     using output as down-translation (#main-output)
        output #main-input
  
  group "down translation"
     element "lines"
        output "%c"
  
     element "line"
        output "      <li>%c</li>%n"

We have chosen to write down-translation () as a string sink function; this will serve our needs best later when we discuss context translations.

As was the case with find-start and find-end rules, document-start and document-end rules cannot appear in process programs: we have used comments to indicate where their contents should be placed to replicate their functionality.

A down-translate program can also contain a external-text-entity #document rule that is used to feed the input to OmniMark: a trivial example would be

  external-text-entity #document
     output file "input.xml"

This can be handled in our process program by introducing a function called, say, hash-document () that encapsulates the logic from our external-text-entity #document rule,

  define string source function
     hash-document ()
  as
     output file "input.xml" ; The body of the EXTERNAL-TEXT-ENTITY #DOCUMENT rule goes here.

and using it as the input for our down-translation () function

  process
     using output as down-translation (#main-output)
        output hash-document ()

Context translation

context-translate is used to convert data from one format to another using a particular markup as an intermediate format. In a context-translate program, OmniMark sends #main-input to the find rules, and sends the output of the find rules to the markup parser; the output from the rules fired by the markup parser goes to #main-output. For example, we can publish the plain text file we originally started from to HTML via our custom XML format using a context translation program such as

  context-translate with xml
  
  find-start
     output "<!DOCTYPE lines [%n"
         || "<!ELEMENT lines (line)*>%n"
         || "<!ELEMENT line  (#pcdata)>%n"
         || "]>%n"
         || "<lines>%n"
  
  find-end
     output "</lines>%n"
  
  find any-text+ => t ("%n" | value-end)
     output "<line>" || t || "</line>%n"
  
  document-start
     output "<!DOCTYPE html>%n"
         || "<html>%n"
         || "  <head><title>TITLE</title></head>%n"
         || "  <body>%n"
         || "    <ul>%n"
  
  document-end
     output "    </ul>%n"
         || "  </body>%n"
         || "</html>%n"
  
  element "lines"
     output "%c"
  
  element "line"
     output "      <li>%c</li>%n"

To replicate this functionality as a process program, we need only to take our up-translation () function and our down-translation () function (and their associated rules), and combine them into one program, and invoke them together in a process rule:

  process
     using group "validate"
        using output as down-translation (#main-output)
        do xml-parse document scan up-translation (#main-input, #current-output)
           suppress
        done

Everything else remains unchanged. And as was the case above, now that we have restructured our processing into the form of a process program, we have more flexibility to introduce new filters into the pipeline, in whatever fashion we may see fit; see Linking chains of streaming filters for examples.

Prerequisite Concepts

About OmniMark

Related Topics