Functions

UTF-8 (OMUTF8)

The UTF-8 encoding library is used to process data files that contain UTF-8 encoding.

The function utf8.char matches UTF-8 characters in the data. The function utf8.single-byte-char matches only ASCII UTF-8 characters, whereas the function utf8.multi-byte-char matches double-byte UTF-8 characters.

The function utf8.code-point is used to convert a UTF-8 character (that is, a sequence of bytes that represents a character in UTF-8) to its binary character value, while the function utf8.encoding converts a binary character value to UTF-8 (that is, to that sequence of bytes which represents that character value in UTF-8).

Example:

  import "omutf8.xmd" prefixed by utf8.
  
  process
     repeat scan "flamb%195#%169#"
     match utf8.single-byte-char+ => c
        output c
  
     match utf8.multi-byte-char => c
        local integer n initial { utf8.code-point of c }
  
        do when n > 255
           output "&#x" || "16rud" % n || ";"
  
        else
           output "b" % n
        done
     again

Functions

utf8.byte-order-mark

utf8.char

utf8.code-point

utf8.encoding

utf8.invalid-code-point

utf8.multi-byte-char

utf8.omutf8-version

utf8.overlong-sequence

utf8.single-byte-char

[ INDEX ] [ CONCEPTS ] [ TASKS ] [ SYNTAX ] [ LIBRARIES ] [ LEGACY LIBRARIES ] [ ERRORS ]

OmniMark 9.1.0 Documentation Generated: September 2, 2010 at 1:54:44 pm
If you have any comments about this section of the documentation, please use this form.