utf8.code-point

function

Library: UTF-8 (OMUTF8)
Import : omutf8.xmd

Returns: an integer value corresponding to the specified UTF-8 character

Declaration

export integer function 
   code-point                of value string              utf8-string
              overlong-handling value error-handling-type overlong-handling optional

Purpose

Use the function utf8.code-point to convert a string containing a single UTF-8 character to its Unicode code point.

If the string does not consist of a single UTF-8 encoded character, the character number of the Unicode Replacement Character (U-FFFD) is returned instead. By default, overlong values for UTF-8 encodings will return U-FFFD. If the argument overlong-handling is specified as utf8.not-allowed-with-throw, overlong sequences will trigger a throw to utf8.overlong-sequence. If overlong-handling is utf8.allowed, utf8.code-point will convert an overlong value to a value which might not be valid Unicode.

Example

The following program converts a UTF-8 encoded file to a long character encoding (2 bytes for every character):

  import "omutf8.xmd" prefixed by utf8.
  
  process
     submit #main-input
  
  find utf8.char => char
     output "2f0b" % utf8.code-point of char

Usage Note

To use utf8.code-point, you must import OMUTF8 into your program using an import declaration such as:

  import "omutf8.xmd" prefixed by utf8.

Related Topics

Character set encoding

Other Library Functions