utf8.encoding

function

Library: UTF-8 (OMUTF8)
Import : omutf8.xmd

Returns: A string representing the UTF-8 encoding of the specified Unicode character code.


Declaration
export string function 
        encoding of                  value integer             code-point
                 invalid-code-points value error-handling-type invalid-code-points optional
    


Purpose

The function utf8.encoding converts an integer containing a Unicode character code to a string of bytes containing the UTF-8 encoding of that character code.

If the argument invalid-code-points is unspecified (or specified as not-allowed, which is the default value), invalid values will cause utf8.encoding to return the UTF-8 encoding of U+FFFD (which happens to be 0xEFBFBD). If the argument is specified as not-allowed-with-throw, invalid values will cause utf8.encoding to throw to utf8.invalid-code-point. If the argument is specified as allowed the result may not be valid UTF-8.

The following program converts a long character encoding (2 bytes per character) to a UTF-8 encoding:

  import "omutf8.xmd" prefixed by utf8.
  
  process
     submit #main-input
  
  find any{2} => char
     output utf8.encoding of binary char
          

Related Topics