utf8.encoding

function

Library: UTF-8 (OMUTF8)
Import : omutf8.xmd

Returns: a string representing the UTF-8 encoding of the specified Unicode character code

Declaration

export string function 
   encoding                  of value integer             code-point
            invalid-code-points value error-handling-type invalid-code-points optional

Purpose

Use the function utf8.encoding to convert an integer containing a Unicode character code point to a string of bytes containing its UTF-8 encoding.

If the argument invalid-code-points is unspecified (or specified as utf8.not-allowed, which is the default value), invalid values will cause utf8.encoding to return the UTF-8 encoding of U+FFFD (which happens to be 0xEFBFBD). If the argument is specified as utf8.not-allowed-with-throw, invalid values will cause utf8.encoding to throw to utf8.invalid-code-point. If the argument is specified as utf8.allowed the result may not be valid UTF-8.

Example

The following program converts a long character encoding (2 bytes per character) to a UTF-8 encoding:

  import "omutf8.xmd" prefixed by utf8.
  
  process
     submit #main-input
  
  find any{2} => char
     output utf8.encoding of binary char

Usage Note

To use utf8.encoding, you must import OMUTF8 into your program using an import declaration such as:

  import "omutf8.xmd" prefixed by utf8.

Related Topics

Character set encoding

Other Library Functions