Rich Text Format (RTF) (OMRTF)

You can use the rtf markup parser function to parse an RTF document. Like an XML document, an RTF document consists of markup and data content. The rtf markup parser function maps the markup structures of RTF to OmniMark's markup rules according to the rules detailed below.

The following program illustrates the operation of the rtf markup parser function by implementing a crude RTF to XML converter:

  import "omrtf.xmd" unprefixed
  
  
  process
     do markup-parse rtf scan file #args[1]
        output "%c"
     done
  
  
  element #implied
     output "<%q"
     repeat over attributes as a
        output " " || key of attribute a || '="%v(a)"'
     again
     output ">%c</%q>%n"
  
  
  translate "<"
     output "&lt;"
  
  
  translate ">"
     output "&gt;"
  
  
  translate "&"
     output "&amp;"
          

How RTF structures are mapped to markup rules

The only markup rules rules fired by the rtf markup parser function are:

RTF commands are translated into markup events, as follows:

  • When an RTF command designated by the RTF 1.7 Specification as a destination command (see list below) appears at the start of an RTF group (that is, immediately following {), it starts an element: the RTF command is the element name, and the content is the content of that RTF group. If the RTF command has a numeric value specified, this value is provided as the value of the element's value attribute. If the RTF group in question is marked ignorable (that is, using the \* designator), an attribute named ignorable is provided, with a value of yes. Both the value and ignorable attributes are of type cdata. The element type is any.
  • When an RTF command is not specified as a destination, but appears at the start of a group with an ignorable designator, then it is treated as if it were a destination, in the same manner as discussed above.
  • An RTF command not specified or treated as a destination is an empty element (as if its tag were ended with />), with its value, if any, provided by the value attribute. The attribute types are as above, and the element type is empty.
  • The \u (UNICODE character) command is treated specially. It is treated as an empty element, with the UNICODE character number as its value attribute. In addition, it has an alt attribute, which contains the alternative text provided immediately following the \u command. The alt attribute is of type cdata.
  • An RTF group not recognized as grouping the content of an RTF destination is provided as an element, with the name group_, with the content of the group as its content. The only possible attribute for this element is the ignorable one described above, if the { is followed by \*. The _ in the name of this element and following ones is used so that the command does not conflict, or potentially conflict, with any RTF command.
  • An RTF command whose name is a special character rather than a name, such as \_ or \~, is provided as an element with the name special_, and with the name as the value of the value attribute.
  • An RTF comand that consists of \ followed by a new line, which is an alternative to the \par command, is provided as a par_ element.
  • Line ends, which are intended to be ignored by RTF readers, are not returned by the RTF parser.
  • All other data is returned as data content, including that encoded in the RTF document as binary (with the \bin command). Hexadecimal data that uses the \'xx RTF command is returned as data content. However, there are RTF commands whose content is implicitly hexadecimal. In the latter case, the hexadecimal data is made available as is—the RTF parser has no special knowledge of these commands.

RTF commands considered destinations

The OMRTF library is based on version 1.7 of the RTF spec, according to which, the RTF command names that are destinations are the following:

     aftncn
     aftnsep
     aftnsepc
     annotation
     atnauthor
     atndate
     atnicn
     atnid
     atnparent
     atnref
     atntime
     atrfend
     atrfstart
     author
     background
     bkmkend
     bkmkstart
     buptim
     category
     colortbl
     comment
     company
     creatim
     datafield
     do
     doccomm
     docvar
     dptxbxtext
     falt
     fchars
     ffdeftext
     ffentrymcr
     ffexitmcr
     ffformat
     ffhelptext
     ffl
     ffname
     ffstattext
     field
     file
     filetbl
     fldinst
     fldrslt
     fldtype
     fname
     fontemb
     fontfile
     fonttbl
     footer
     footerf
     footerl
     footerr
     footnote
     formfield
     ftncn
     ftnsep
     ftnsepc
     g
     generator
     gridtbl
     header
     headerf
     headerl
     headerr
     htmltag
     info
     keycode
     keywords
     lchars
     levelnumbers
     lfolevel
     list
     listlevel
     listname
     listoverride
     listoverridetable
     listpicture
     listtable
     listtext
     manager
     mhtmltag
     nesttableprops
     nextfile
     nonesttables
     objalias
     objclass
     objdata
     object
     objname
     objsect
     objtime
     oldcprops
     oldpprops
     oldsprops
     oldtprops
     operator
     panose
     pgp
     pgptbl
     picprop
     pict
     pn
     pnseclvl
     pntext
     pntxta
     pntxtb
     printim
     private
     pwd
     pxe
     result
     revtbl
     revtim
     rsidtbl
     rtf
     rxe
     shp
     shpinst
     shppict
     stylesheet
     subject
     tc
     template
     title
     txe
     ud
     upr
     urtf
     userprops
     xe
            

Functions