![]() |
|
||||
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
|||||
|
|
|||||
| Prerequisite Concepts | Related Topics | ||||
Linking Chains of Streaming Markup Filters |
|||||
Just as you can use string source and string sink to stream character data through a series of
text filters, you can use markup source and markup sink to stream parsed markup data through a
series of markup filters, with no intermediate buffering.
The starting point of a chain of markup filters is always a markup parser. You can use do sgml-parse,
do xml-parse, or an external parser such as do markup-parse xerces.xml. The
beginning of the markup-processing chain is also the only place you should use any of these actions; once the
markup is parsed, there is no need to convert it to plain text only to have it parsed again.
The purpose of the parsing step is to convert a string source to a markup source. Within the body
of the parsing action, #content is a markup source that represents result of the parse. That is the
starting point of the markup-processing pipeline.
define function
handle-markup-source (value markup source input)
elsewhere
process
do sgml-parse document scan #main-input
handle-markup-source (#content)
done
Here, handle-markup-source () is a function that will process the markup source it takes as an
argument. Alternatively, we can launch the markup processing by outputting the #content into a
markup sink function that will consume and process it:
define markup-sink function
handle-markup-as-sink
elsewhere
process
do sgml-parse document scan #main-input
using output as handle-markup-as-sink
output #content
done
The end point of a markup-processing pipeline is typically a set of element and other markup rules. In order to activate the rules, apply do markup-parse to a markup source and
trigger the rules using the %c format item or the suppress action:
define string source function
handle-markup-source (value markup source input)
as
do markup-parse input
output "%c"
done
process
do sgml-parse document scan #main-input
output handle-markup-source (#content)
done
Incidentally, this example is semantically equivalent to the following, much simpler program fragment:
process
do sgml-parse document scan #main-input
output "%c"
done
In this example the separation of markup processing from markup parsing may seem pointless. We shall see how it makes the processing pipeline more flexible in more complicated cases.
Let us use the same example task of converting input text to HTML that has been laid out in Linking chains of streaming filters using string source filters. The following
filtering functions were used in that example:
define string source function
compress-whitespace value string source text
as
repeat scan text
...
define string source function
text2xml value string source text
as
submit text
...
define string source function
tidy-xml value string source markup
as
do xml-parse scan markup
...
define string source function
xml2html value string source markup
as
do xml-parse scan markup
...
The compress-whitespace and text2xml functions deal with processing of plain text before
it gets parsed, so we shall not change them. The function tidy-xml and xml2html, on the
other hand, clearly work on markup, so we shall modify them to
define markup source function
tidy-markup value markup source markup
as
do markup-parse scan markup
...
define string source function
markup2html value markup source markup
as
do markup-parse scan markup
...
The reason for renaming the functions tidy-xml and xml2html to tidy-markup and markup2html, respectively, is to emphasize that they do not operate on the XML representation of a marked-up document any more: they now expect a parsed markup stream. Their input may come from a parsed XML document, but they would accept a parsed SGML document just the same.
The function text2xml produces a string source, whereas tidy-markup expects a
markup source. Although a string source can be used wherever a markup source is required,
we want tidy-markup to be able to react to markup events in its input. The markup events in question
can be inserted into the input by converting the string source to a markup source using, say, an
XML parser:
define markup source function
xml2markup value string source text
as
do xml-parse scan text
output #content
done
Our new chain of streaming filters now looks like this:
process
output markup2html tidy-markup xml2markup text2xml compress-whitespace #main-input
Compared to the old pipeline, the new one may look longer and more complicated. The appearance is misleading, however:
The easiest way to start a markup filter like tidy-markup is by applying do markup-parse to
the input markup stream. This action will cause the markup rules to be fired by markup events in the stream. In
order to generate the output markup stream, markup rules have two built-in variables at their disposal:
#current-markup-event and #content. To demonstrate their use, let us assume that
tidy-markup is required to make the following modifications to its input:
The specified markup filter might be implemented in the following way:
define string source function
tidy-markup value markup source markup
as
do markup-parse markup
output "%c"
done
element "verbatim"
signal throw #markup-start #current-markup-event
output #content
signal throw #markup-end #current-markup-event
element "annotation"
put #suppress #content
element "span"
output "%c"
element #implied
signal throw #markup-start #current-markup-event
output "%c"
signal throw #markup-end #current-markup-event
The span rule and the implied rule in this example invoke %c to delegate the processing of
the element content to other markup rules. This is not any different from how a text-producing rule handles
markup. The rules for verbatim and annotation, on the other hand, use #content
instead of %c. The difference between the two is that #content represents the unprocessed content
of the current element, just as it appears in the input stream, while %c represents the same content
processed by other markup rules. The line output #content produces the unmodified element content, while
output "%c" delegates the processing to other markup rules. Finally, the line put #suppress
#content in the rule handling annotation elements consumes the entire element content without
firing any markup rules and suppresses it.
The lines beginning with signal throw are reproducing the markup events standing for element tags in
the original XML. Both the start and the end tag are represented by the same element event,
#current-markup-event in the example. The beginning of the element region event is signalled with the
catch #markup-start, and its end with #markup-end.
|
Prerequisite Concepts
|
Related Topics
|
Copyright © Stilo International plc, 1988-2010.