xerces.xml

function

Library: Xerces XML parser (OMXERCES)
Import : omxerces.xmd

Declaration

export external markup source function 
   xml    schemas value integer       schema-validation-mode    optional initial { auto-schema-validation }
       namespaces value integer       namespace-processing-mode optional initial { namespace-validation }
             scan value string source data

Argument definitions

schema-validation-mode: XML schema validation mode

namespace-processing-mode: namespace handling mode

data: any OmniMark source

Purpose

The xml markup parser function uses the Xerces XML parser to generate a stream of markup events (that is, a markup source) that can be processed using do markup-parse.

The scan argument is to provide input to the parser. It can be any OmniMark source.

The xml markup parser function invokes element and other OmniMark rules in the same manner as OmniMark's built-in parsers. The information available in those rules differs from that provided by OmniMark's built-in parsers in some respects, as described below.

Further Options

The xml markup parser function takes two optional arguments, schemas and namespaces, which control the W3C schema and XML namespace processing done by the xml markup parser function

omxerces.xmd defines named values for use with the schemas argument:

no-schemas disables all schema processing.
no-schema-validation enables schema processing but disables schema validation.
auto-schema-validation enables schema processing, but disables schema validation if any internal or external DTD subset is found in the parsed document. This is the default value for the schemas argument.
schema-validation enables schema processing and schema validation.
full-schema-validation enables schema processing and schema validation. Additionally it enables full schema constraint checking. If auto-schema-validation or schema-validation is specified, partial constraint checking is done.

omxerces.xmd also defines named values for use with the namespaces argument:

no-namespace-validation disables all namespace processing.
namespace-validation enables namespace processing. This is the default value for the schemas.

The namespaces argument does not affect OmniMark namespace processing, which is done independently of the namespace processing done by the xml markup parser function. The easiest way of distinguishing the two is to observe that the xml markup parser function is responsible for namespace validation, and OmniMark is responsible for making use of the namespace information.

Example

  import "omxerces.xmd" prefixed by xerces.
  
  
  process
     do markup-parse xerces.xml schemas xerces.no-schemas namespaces xerces.no-namespace-validation scan file #args[1]
        output "%c"
     done
  
  
  element #implied
     output "%c"

What You Get from the Xerces-Based XML Parser

The following is a list of what information is available to an OmniMark program from the xml markup parser function. In particular, it describes both new things that the xml markup parser function does for OmniMark programs, and limitations as compared to using the built-in OmniMark markup parsers.

W3C Schemas

The xml markup parser function processes and validates W3C Schemas; what is returned to the OmniMark program is based on how the document is interpreted by any schema used. The most noticeable effect of using a schema is in:

the warnings and errors reported,
the interpretation of entities,
default attribute values, and
the recognition of ignorable whitespace (see below).

Schemas are read into an OmniMark program as external text entities. You can use the external-text-entity #schema rule to control how schemas are found and pre-processed by an OmniMark program.

No information from a schema or from a DTD is available to the OmniMark program, even though it is used by the markup parser in interpreting the document.

Ignorable Whitespace

Any whitespace not deemed to be significant (as defined by the XML specification) is passed to the OmniMark program as the contents of an ignored marked section. The marked-section ignore rule can be used to capture ignorable white space. If there is no marked-section ignore rule, this whitespace is ignored.

Processing Instructions

Processing instructions in the input trigger processing-instruction rules in the OmniMark program, in the normal manner, with one exception: the XML declaration, which is encoded as a processing instruction starting with <?xml, is used by the markup parser and is not returned to the OmniMark program.

Errors and Warnings

Errors and warnings from the xml markup parser function trigger markup-error rules in the OmniMark program, in the same manner as for the built-in markup parsers.

The only difference is that there is just one numeric exception code for errors and one for warnings, so the numeric exception code cannot be used to distinguish between different kinds of errors.

The xml markup parser function does not normally stop when it encounters what may be considered a fatal error—it keeps on going. This is normally appropriate, because the xml markup parser function can recover from most errors. However, there are cases in which it cannot recover. In these cases it is possible to have the same error reported multiple times.

To prevent run-away reporting of errors, the xml markup parser function terminates if it encounters 5 fatal errors in a row, without other information intervening.

What Rules Are Fired

The following lists all the OmniMark markup parser rules that are of use with the xml markup parser function

Other Library Functions

xerces.omxerces-version
xerces.xml