Errors in markup

A markup error is an error in the XML or SGML being parsed. A markup error is not an OmniMark program error, but it is an event of interest to the programmer. When the parser encounters an error in markup it fires a markup-error rule. After firing a markup-error rule, the parser attempts to recover from the error and continue. To recover from an error, the parser will invent an appropriate markup to allow it to continue. Consider the following XML document:

  <!doctype x [
  <!element x (y,z)>
  <!element y (#pcdata)>
  <!element z (#pcdata)>
  ]>
  <x><y>foo</y></x>

Note that the "z" element required by the DTD is missing. To show how OmniMark handles markup errors, process the above document with the following program:

  process
      do xml-parse document
      scan file "foo.xml"
        output "%c"
      done
  
  element "x"
     output "<x>%c</x>"
  
  element "y"
     output "<y>%c</y>"
  
  element "z"
     output "<z>%c</z>"
  
  markup-error
     output "[" || #message || "]"

This program wraps XML tags around the content of XML elements. If you run it on a valid document, it will return the instance (without the DTD) verbatim. If there is an error in the input document, it will output the markup error message from the parser (provided by #message). In this program, the message is wrapped in square brackets to make the message boundaries clear. Here is the result of running this program on the XML document above:

  <x><y>foo</y>[An end tag has been encountered for some element other than the currently opened one and some opened element other than the currently opened one must be closed without its content being fully satisfied or with its inomissible end tag omitted.][A start tag with a start tag minimization of minus ("-") must not be omitted.]<z>[An end tag that has been declared inomissible ("-") must not be omitted.]</z></x>

Because OmniMark uses the same parser for SGML and valid XML documents, the error messages refer to the SGML tag omission feature. But you can clearly see what has happened. When the parser found "</x>" where it expected to find "<z>" it fist complained that it had found the end of the "x" element where it was not allowed to end, and then it complained that it didn't find the start of the "z" element where it was supposed to be. It fired the markup error rule twice, once for each of these events.

Then the parser invented the markup it needed to continue. In this case, it needed a start tag for the "z" element, so it created one. This caused the rule element "z" to fire, outputting the "<z>" tag to the output.

The parser then reported and recovered from the lack of a close tag for element "z". It then completed the processing of the "x" element normally.

If you don't want this recovery behavior, you can use the markup error rule to abort the parse when an error occurs:

  declare catch abort-parse
   because value string message
  
  process
     do xml-parse document
     scan file "foo.xml"
       output "%c"
     done
     catch abort-parse because message
        put #error "Parse aborted because:%n"
                || message
  
  element "x"
     output "<x>%c</x>"
  
  element "y"
     output "<y>%c</y>"
  
  element "z"
     output "<z>%c</z>"
  
  markup-error
     throw abort-parse because #message

You may want to abort parsing after a specific number of errors have occurred. You can do this by testing the value of #markup-error-count:

  markup-error
     do when #markup-error-count > 10
     	throw abort-parse because "Too many errors."
     else
     	output #message
     done

For complete coverage of the information available in a markup-error rule, see markup-error.

Related Topics