Rich Text Format (RTF) (OMRTF)

You can use the rtf markup parser function to parse an RTF document. Like an XML document, an RTF document consists of markup and data content. The rtf markup parser function maps the markup structures of RTF to OmniMark's markup rules according to the rules detailed below.

Example

The following program illustrates the operation of the rtf markup parser function by implementing a crude RTF to XML converter:

  import "omrtf.xmd" unprefixed
  
  process
     do markup-parse rtf scan file #args[1]
        output "%c"
     done
  
  element #implied
     output "<%q"
     repeat over attributes as a
        output " " || key of attribute a || '="%v(a)"'
     again
     output ">%c</%q>%n"
  
  translate "<"
     output "&lt;"
  
  translate ">"
     output "&gt;"
  
  translate "&"
     output "&amp;"

RTF Structure Mappings

The only markup rules rules fired by the rtf markup parser function are:

RTF commands are translated into markup events, according to the following rules.

  • When an RTF command designated by the RTF 1.7 Specification as a destination command (see list below) appears at the start of an RTF group (that is, immediately following {), it starts an element: the RTF command is the element name, and the content is the content of that RTF group. If the RTF command has a numeric value specified, this value is provided as the value of the element's value attribute. If the RTF group in question is marked ignorable (that is, using the \* designator), an attribute named ignorable is provided, with a value of yes. Both the value and ignorable attributes are of type cdata. The element type is any.
  • When an RTF command is not specified as a destination, but appears at the start of a group with an ignorable designator, then it is treated as if it were a destination, in the same manner as discussed above.
  • An RTF command not specified or treated as a destination is an empty element (as if its tag were ended with />), with its value, if any, provided by the value attribute. The attribute types are as above, and the element type is empty.
  • The \u (UNICODE character) command is treated specially. It is treated as an empty element, with the UNICODE character number as its value attribute. In addition, it has an alt attribute, which contains the alternative text provided immediately following the \u command. The alt attribute is of type cdata.
  • An RTF group not recognized as grouping the content of an RTF destination is provided as an element, with the name group_, with the content of the group as its content. The only possible attribute for this element is the ignorable one described above, if the { is followed by \*. The _ in the name of this element and following ones is used so that the command does not conflict, or potentially conflict, with any RTF command.
  • An RTF command whose name is a special character rather than a name, such as \_ or \~, is provided as an element with the name special_, and with the name as the value of the value attribute.
  • An RTF comand that consists of \ followed by a new line, which is an alternative to the \par command, is provided as a par_ element.
  • Line ends, which are intended to be ignored by RTF readers, are not returned by the RTF parser.
  • All other data is returned as data content, including that encoded in the RTF document as binary (with the \bin command). Hexadecimal data that uses the \'xx RTF command is returned as data content. However, there are RTF commands whose content is implicitly hexadecimal. In the latter case, the hexadecimal data is made available as is—the RTF parser has no special knowledge of these commands.

RTF commands considered destinations

The OMRTF library is based on version 1.7 of the RTF spec, according to which, the RTF command names that are destinations are the following:

  aftncn
  aftnsep
  aftnsepc
  annotation
  atnauthor
  atndate
  atnicn
  atnid
  atnparent
  atnref
  atntime
  atrfend
  atrfstart
  author
  background
  bkmkend
  bkmkstart
  buptim
  category
  colortbl
  comment
  company
  creatim
  datafield
  do
  doccomm
  docvar
  dptxbxtext
  falt
  fchars
  ffdeftext
  ffentrymcr
  ffexitmcr
  ffformat
  ffhelptext
  ffl
  ffname
  ffstattext
  field
  file
  filetbl
  fldinst
  fldrslt
  fldtype
  fname
  fontemb
  fontfile
  fonttbl
  footer
  footerf
  footerl
  footerr
  footnote
  formfield
  ftncn
  ftnsep
  ftnsepc
  g
  generator
  gridtbl
  header
  headerf
  headerl
  headerr
  htmltag
  info
  keycode
  keywords
  lchars
  levelnumbers
  lfolevel
  list
  listlevel
  listname
  listoverride
  listoverridetable
  listpicture
  listtable
  listtext
  manager
  mhtmltag
  nesttableprops
  nextfile
  nonesttables
  objalias
  objclass
  objdata
  object
  objname
  objsect
  objtime
  oldcprops
  oldpprops
  oldsprops
  oldtprops
  operator
  panose
  pgp
  pgptbl
  picprop
  pict
  pn
  pnseclvl
  pntext
  pntxta
  pntxtb
  printim
  private
  pwd
  pxe
  result
  revtbl
  revtim
  rsidtbl
  rtf
  rxe
  shp
  shpinst
  shppict
  stylesheet
  subject
  tc
  template
  title
  txe
  ud
  upr
  urtf
  userprops
  xe

Usage Note

To use OMRTF, you must import it into your program using an import declaration such as:

  import "omrtf.xmd" prefixed by rtf.

Functions