swirl
Guide to OmniMark 9   OmniMark home
docs home 
IndexConceptsTasksSyntaxLibrariesLegacy LibrariesErrors
 
Functions    
 

UTF-8 (OMUTF8)

 
 

The UTF-8 encoding library is used to process data files that contain UTF-8 encoding.

The function utf8.char matches UTF-8 characters in the data. The function utf8.single-byte-char matches only ASCII UTF-8 characters, whereas the function utf8.multi-byte-char matches double-byte UTF-8 characters.

The function utf8.code-point is used to convert a UTF-8 character (that is, a sequence of bytes that represents a character in UTF-8) to its binary character value, while the function utf8.encoding converts a binary character value to UTF-8 (that is, to that sequence of bytes which represents that character value in UTF-8).

Example:


  import "omutf8.xmd" prefixed by utf8.
  
  process
     repeat scan "flamb%195#%169#"
     match utf8.single-byte-char+ => c
        output c
  
     match utf8.multi-byte-char => c
        local integer n initial { utf8.code-point of c }
  
        do when n > 255
           output "&#x" || "16rud" % n || ";"
  
        else
           output "b" % n
        done
     again
            

Functions
 
 

Top [ INDEX ] [ CONCEPTS ] [ TASKS ] [ SYNTAX ] [ LIBRARIES ] [ LEGACY LIBRARIES ] [ ERRORS ]

OmniMark 9.1.0 Documentation Generated: September 2, 2010 at 1:54:44 pm
If you have any comments about this section of the documentation, please use this form.

Copyright © Stilo International plc, 1988-2010.