Markup processing often encompasses a spectrum of complexity. OmniMark has a number of features that the programmer can employ to control how markup is processed. This topic explores some of those features, from the simplest element rules, to more complex examples using groups and markup sink functions.
OmniMark is a rule-based language, and the main ingredient of many OmniMark programs is element rules. The body
of every element rule in your program must take care of processing the element's content. The simplest way to
accomplish this is to delegate the processing of content to other markup rules using %c
:
element "simple" output "%c"
The above rule will reproduce the output and effect of all rules that its content will fire. Wrapping a part of
the input in element <simple>
will make no difference to the output.
Alternatively, the rule could perform some actions before and after the processing of its content. For example,
it could output something:
element "parenthesized" output "(" output "%c" output ")"
The effect of having element <parenthesized>
in the input is to wrap the result of
processing the element content with parentheses. You can also redirect the content processing output to another
destination, or discard it completely:
element "redirect" using output as file (attribute "filename") output "%c" element "discard" suppress
None of the rules above alters the result of content processing, they merely add to it or change its
destination. The following rule subjects the output of %c
to another round of processing:
element "indent" repeat scan "%c" match any-text+ => line output " " || line match "%n" output "%n" again
The indent
rule indents each line produced from its content. To accomplish this, it alters the result
of %c
but not the way this result gets produced. This is good: the rule fulfills its purpose with a
localized code change. If you tried to accomplish the same effect in a single pass, you would have to modify every
place where a line could be emitted within an <indent>
element.
The previous rule is an example of post-processing of content. It invokes other rules to process its content
using %c
, and then scans through their output. An alternative approach is to pre-process the content
before invoking other rules by using #content
instead of %c
. Here are a few examples:
element "redirect-content" using output as alternative-content-processor () output #content element "distribute-content" using output as alternative-content-processor () & relaxng.validator against my-schema output #content element "really-discard" put #suppress #content element "half-marked-up" do markup-parse up-translate-content (#content) output "%c" done
The first rule above, redirect-content
, does not invoke any markup rules itself. Instead it sends
its entire content off to alternative-content-processor
, a markup sink function
which may be
imported from another module, to process it in any way it pleases.
The rule distribute-content
is similar but sends its content in parallel to two destinations, the
alternative-content-processor
to be processed and the relaxng.validator
function to be
validated at the same time.
The really-discard
rule is similar to the rule discard
you have seen earlier, but where
the latter discarded the output of content processing, really-discard
discards the content processing
itself. By directing its #content
to #suppress
, this rule avoids invoking any rules that would
process its markup.
Finally, the rule half-marked-up
performs a pre-processing of its content through the function
up-translate-content
. For example, if the content of element <half-marked-up>
was
This is one paragraph. This is <em>another</em> paragraph, as you can tell by the blank line preceding it.
up-translate-content
could convert this input to appear as
<para>This is one paragraph.</para> <para>This is <em>another</em> paragraph, as you can tell by the blank line preceding it.</para>
After this pre-processing step, the rule half-marked-up
applies do markup-parse
and invokes
regular content processing with %c
. Notice that both the original element <em>
and
the newly introduced element <para>
can be processed by the regular element
rules,
as if they were both present in the content from beginning. The function up-translate-content
could
be defined in a different module as follows:
export markup source function up-translate-content (value markup source m) as do xml-parse scan "<up-translated>" || wrap-implicit-paragraphs (split-data-content (m, #current-output)) || "</up-translated>" output "%c" done element "up-translated" output #content
This function in turn relies on two others: split-data-content
to separate the plain text from
markup events which are sent directly to output of up-translate-content
, and wrap-implicit-paragraphs
to insert XML tags in the plain text.
define string source function split-data-content (value markup source m, value markup sink events) as repeat output m take any* exit catch #markup-start event signal to events rethrow catch #markup-point event signal to events rethrow catch #markup-end event signal to events rethrow again define string source function wrap-implicit-paragraphs (value string source s) as repeat scan s match lookahead any-text output "<para>" || s take (any ** lookahead ("%n%n" | value-end)) || "</para>" match "%n" output "%n" again
Dividing the processing of your content into multiple steps is usually the best way to improve your program, as it is less intrusive and lets you reuse the common processing code. Still, sometimes neither post-processing nor pre-processing of content is enough and you need to alter the very way content is processed. The easiest way to achieve this is with groups.
If you have an element whose content is completely different from the rest of your input, you will probably
want to process it using a completely different set of rules from the regular one. To do this, simply put your
%c
into a using group
scope:
element "foreign" using group "process foreign elements" output "%c"
If, on the other hand, the content model of your element is not completely unique, you may want to use both
the common rules and the special ones:
element "half-foreign" using group "process foreign elements" & #group output "%c"
Keep in mind that for every element instance in your content, only a single element
rule can fire:
either a rule from your group "process foreign elements"
or one of the common rules. That means you
cannot have an unguarded element #implied
rule in both groups, for example. But what if you actually
want to perform both rules, because they both perform useful actions? One solution is to merge the body of the
common rule into the other rule. If you would rather avoid the code duplication, you can apply the technique
used by the distribute-content
rule and send your content to be processed by both groups. You just
need to define two markup sink
functions that invoke the proper rules:
define markup sink function common-content-processor (value string sink destination) as do markup-parse #current-input put destination "%c" done define markup sink function foreign-content-processor (value string sink destination) as using group "process foreign elements" do markup-parse #current-input put destination "%c" done element "distribute-half-foreign" using output as foreign-content-processor (#current-output) & common-content-processor (#current-output) output #content
Now that the content is processed by two groups of rules independently, each group is allowed to have an
element #implied
rule, and they can (and must) both fire for each element in the content.
The reason #current-output
is passed as argument to the two content-processor
functions is
to let them output into it. There will be a problem, however, if they should both do that for the same part of
content, because the two outputs will then be merged together. For example, if neither group contains any data-content
or translate
rule, the content of input <distribute-half-foreign>Hello,
World!</distribute-half-foreign>
would be duplicated and the output would be Hello,
World!Hello, World!
.
If you do need both outputs, instead of merging them as they come you may want to order them properly in your
output by temporarily buffering one and outputting it after the other:
element "distribute-half-foreign" local stream common-output open common-output as buffer using output as foreign-content-processor (#current-output) & common-content-processor (common-output) output #content close common-output output common-output
Alternatively, instead of storing the output of content processing you can use a markup-buffer
to
store your content before processing it. This lets you control both the order of your outputs and the order of
processing:
import "ommarkuputilities.xmd" unprefixed element "distribute-half-foreign" local markup-buffer my-content using output as foreign-content-processor (#current-output) & markup sink my-content output #content using output as common-content-processor (#current-output) output my-content
Since the content is not processed in parallel any more there is no need to use the &
operator.
You can write this rule to the same effect without relying on the markup sink
functions to wrap the rule
invocations:
import "ommarkuputilities.xmd" unprefixed element "distribute-half-foreign" local markup-buffer my-content using output as my-content output #content using group "process foreign elements" do markup-parse my-content output "%c" done do markup-parse my-content output "%c" done