Aided translation types are a succint way of encoding certain content transformations in OmniMark. However, with this succintness comes a rigidity to the program, and consequently a proportional loss in expressivity: programs written as aided translation types cannot always be extended to handle new requirements, without polluting the existing (working!) program. For this reason, the use of aided translation types is deprecated, although they are supported for compatibility with earlier versions of OmniMark.
This concept will show you how to transform a program written as an aided translation type into one that uses
process
rules and OmniMark's coroutining capabilities. Although the resulting program will typically be
somewhat longer than the original, the gain in expressivity leads to more flexible and maintainable code. As a
simple running example, we will seek to publish a small text file to HTML. The transformations performed will be
straightforward and not representative of a production program. We will begin with the input
Hello, World! Salut, Monde! Hola, Mundo!and ultimately aim for the output
<!DOCTYPE html> <html> <head><title>TITLE</title></head> <body> <ul> <li>Hello, World!</li> <li>Salut, Monde!</li> <li>Hola, Mundo!</li> </ul> </body> </html>To keep things simple, content of the
<title>
element will be hard-coded.
cross-translate
is used for content processing that is based purely on OmniMark's find
rules. In
a cross-translate
program, OmniMark sends #main-input
to the program's find
rules, and sends
the output of the find
rules to #main-output
.
Written as a cross translation, our program is
cross-translate find-start output "<!DOCTYPE html>%n" || "<html>%n" || " <head><title>TITLE</title></head>%n" || " <body>%n" || " <ul>%n" find-end output " </ul>%n" || " </body>%n" || "</html>%n" find line-start any-text+ => t ("%n" | value-end) output " <li>" || t || "</li>%n"
In a cross-translate
program, find-start
and find-end
rules can be used to handle
processing that should occur before and after pattern matching begins and ends, respectively. Here, we have used
these rules to emit the lead-in portion of our target HTML.
This program can be converted to a process
program by defining a function to contain the orchestration
that would ordinarily be handled by OmniMark:
define string source function cross-translation (value string source s) as output "<!DOCTYPE html>%n" ; The body of FIND-START rules go here. || "<html>%n" || " <head><title>TITLE</title></head>%n" || " <body>%n" || " <ul>%n" using group "cross translation" submit s output " </ul>%n" ; The body of FIND-END rules go here. || " </body>%n" || "</html>%n" process output cross-translation (#main-input) group "cross translation" find line-start any-text+ => t ("%n" | value-end) output " <li>" || t || "</li>%n"The function
cross-translation ()
is typical of a filter function: it takes input in
one form via a value string source
argument, and outputs the transformed data on its
#current-output
.
In writing our cross translation as a process
program, we have used the flexibility to nestle the
find
rules in to a group named cross translation
, thereby protecting them from other rules that
might appear in the program; this group is made active in the process
rule. Since find-start
and
find-end
rules cannot appear in process
programs, we have also indicated by use of comments where
their content should be added to replicate the functionality they provided.
Although the process
version of our publication utility is slightly longer than the corresponding
cross-translate
version, it is far more flexible. For example, if a new requirement wanted us to manipulate
the output of our publication utility with further processing, we could accommodate this by adding an additional
function to our call chain in the process
rule:
process output further-processing (cross-translation (#main-input))where
further-processing ()
would be defined as a string source function
:
define string source function further-processing (value string source s) as ; ...See Linking chains of streaming filters for additional examples showing how to interconnect chains of coroutines.
up-translate
is used when converting unstructured content to SGML or XML. In an up-translate
program, OmniMark sends #main-input
to the program's find
rules, and sends the output of the
find
rules to both #main-output
and the selected markup parser for validation. By default, the
selected markup parser is the SGML parser; however the XML parser can be invoked by appending the with xml
qualifier to up-translate
. We will use XML here. The find
rules of an up translation program have
access to the current markup context collected by the markup parser from the generated
output. Although we will not use this functionality here, we will ensure that our process
program is
structured so that the markup context is available.
In principle, we could write our small publication utility as an up translation; but to satisfy the validation
requirements of an up translation, we would be required to resolve and pull in the entire HTML
DTD. For the sake of brevity and later development, we will instead target the following custom XML format
<!DOCTYPE lines [ <!ELEMENT lines (line)*> <!ELEMENT line (#PCDATA)> ]> <lines> <line>Hello, World!</line> <line>Salut, Monde!</line> <line>Hola, Mundo!</line> </lines>An up translation program that targets this output might be written as
up-translate with xml find-start output "<!DOCTYPE lines [%n" || "<!ELEMENT lines (line)*>%n" || "<!ELEMENT line (#PCDATA)>%n" || "]>%n" || "<lines>%n" find-end output "</lines>%n" find any-text+ => t ("%n" | value-end) output "<line>" || t || "</line>%n"
Much as we did in the case of cross-translate
, above, we can convert this to a process
program
by defining a function to orchestrate what would normally be handled by OmniMark:
define string source function up-translation (value string source input, value string sink destination) as using output as #current-output & destination do output "<!DOCTYPE lines [%n" ; The body of FIND-START rules go here. || "<!ELEMENT lines (line)*>%n" || "<!ELEMENT line (#PCDATA)>%n" || "]>%n" || "<lines>%n" using group "up translation" submit input output "</lines>%n" ; The body of FIND-END rules go here. done process using group "validate" do xml-parse document scan up-translation (#main-input, #main-output) suppress done group "up translation" find any-text+ => t output "<line>" || t || "</line>" group "validate" element #implied output "%c"Again, as before, we have taken the opportunity to wrap the rules (there is only one left in this example, but a realistic program might have several) into a group named
up translation
. We have also indicated by use of
comments where the content of the find-start
and find-end
rules should be added to replicate their
functionality.
Recalling that an up translation sends the output generated by the find
rules to both
#main-output
and the selected markup parser, we have written our function up-translation ()
as a
string source
function that additionally takes a string sink
argument: the string sink
argument serves to feed #main-output
, whereas the function's #current-output
feeds the markup
parser. The string source
argument of up-translation ()
is attached to #main-input
at the
call site.
As was the case for cross-translate
, above, this structure is more flexible and is more easily adjusted
to accommodate requirements changes. Case in point: a strict up-translate
program can only validate XML (or
SGML) against a DTD; we would be unable to accommodate validation against a RELAX NG schema, should
future requirements mandate it. Meanwhile, our equivalent process
program can readibly be adjusted, simply
by modifying the body of the process
rule: allowing ourselves to be inspired by the example in
relaxng.validator
:
process local relaxng-schema-type schema set schema to relaxng.compile-schema file "lines.rng" using output as relaxng.validator against schema do xml-parse scan up-translation (#main-input, #main-output) output #content done(See
relaxng.validator
for a more sophisticated invocation that allows for processing of markup errors.) In
this version, the group validate
is not needed. The rest of our program remains unchanged.
A down-translate
is typical of a publication pipeline that converts (say) XML to a formatted output. In
a down-translate
program, OmniMark sends #main-input
to the selected markup-parser, and the output
of the rules fired by the markup parser is sent to #main-output
. By default, the selected markup parser is
the SGML parser; however the XML parser can be invoked by appending the with xml
qualifier to
down-translate
. We will use XML here.
Taking as input the custom XML we generated from our cross translation or up translation programs,
<!DOCTYPE lines [ <!ELEMENT lines (line)*> <!ELEMENT line (#PCDATA)> ]> <lines> <line>Hello, World!</line> <line>Salut, Monde!</line> <line>Hola, Mundo!</line> </lines>a simple publication pipeline to HTML written as a down translation might look like
down-translate with xml document-start output "<!DOCTYPE html>%n" || "<html>%n" || " <head><title>TITLE</title></head>%n" || " <body>%n" || " <ul>%n" document-end output " </ul>%n" || " </body>%n" || "</html>%n" element "lines" output "%c" element "line" output " <li>%c</li>%n"
Much like where cross-translate
and up-translate
have find-start
and find-end
rules that allow for special processing before and after the input has been processed, down-translate
has
document-start
and document-end
rules that serve a similar purpose.
Just as we did earlier for cross-translate
and up-translate
, above, we can convert this to a
process
program by introducing a function that handles the lauching of the markup parser:
define string sink function down-translation (value string sink s) as using group "down translation" using output as s do xml-parse document scan #current-input output "<!DOCTYPE html>%n" ; The body of DOCUMENT-START rules go here. || "<html>%n" || " <head><title>TITLE</title></head>%n" || " <body>%n" || " <ul>%n" output "%c" output " </ul>%n" ; The body of DOCUMENT-END rules go here. || " </body>%n" || "</html>%n" done process using output as down-translation (#main-output) output #main-input group "down translation" element "lines" output "%c" element "line" output " <li>%c</li>%n"We have chosen to write
down-translation ()
as a string sink
function; this will serve our needs
best later when we discuss context translations.
As was the case with find-start
and find-end
rules, document-start
and
document-end
rules cannot appear in process
programs: we have used comments to indicate where their
contents should be placed to replicate their functionality.
A down-translate
program can also contain a external-text-entity #document
rule that is used to
feed the input to OmniMark: a trivial example would be
external-text-entity #document output file "input.xml"This can be handled in our
process
program by introducing a function called, say, hash-document
()
that encapsulates the logic from our external-text-entity #document
rule,
define string source function hash-document () as output file "input.xml" ; The body of the EXTERNAL-TEXT-ENTITY #DOCUMENT rule goes here.and using it as the input for our
down-translation ()
function
process using output as down-translation (#main-output) output hash-document ()
context-translate
is used to convert data from one format to another using a particular markup as an
intermediate format. In a context-translate
program, OmniMark sends #main-input
to the find
rules, and sends the output of the find
rules to the markup parser; the output from the rules fired by the
markup parser goes to #main-output
. For example, we can publish the plain text file we originally started
from to HTML via our custom XML format using a context translation program such as
context-translate with xml find-start output "<!DOCTYPE lines [%n" || "<!ELEMENT lines (line)*>%n" || "<!ELEMENT line (#pcdata)>%n" || "]>%n" || "<lines>%n" find-end output "</lines>%n" find any-text+ => t ("%n" | value-end) output "<line>" || t || "</line>%n" document-start output "<!DOCTYPE html>%n" || "<html>%n" || " <head><title>TITLE</title></head>%n" || " <body>%n" || " <ul>%n" document-end output " </ul>%n" || " </body>%n" || "</html>%n" element "lines" output "%c" element "line" output " <li>%c</li>%n"To replicate this functionality as a
process
program, we need only to take our up-translation ()
function and our down-translation ()
function (and their associated rules), and combine them into one
program, and invoke them together in a process
rule:
process using group "validate" using output as down-translation (#main-output) do xml-parse document scan up-translation (#main-input, #current-output) suppress doneEverything else remains unchanged. And as was the case above, now that we have restructured our processing into the form of a
process
program, we have more flexibility to introduce new filters into the pipeline, in whatever
fashion we may see fit; see Linking chains of streaming filters for examples.