This library contains one string source
function
and one string sink
function
implementation, as follows:
reader
is a string source
function
that reads from its value string
source
argument, and returns the text of that file converted from a UTF-16 encoding to a UTF-8 encoding.
That is, the provided source is in UTF-16, but the program sees UTF-8.
Any malformed input data is read as a Unicode NOT-A-CHARACTER
character (0xFFFD
).
The only malformed case recognized is if only half of a surrogate pair is found.
Read-in UTF-16 data is assumed by default to be big-endian, but leading and embedded byte order marks (BOM) in the data are recognized and acted upon. A leading BOM is removed from the input, but embedded ones are left in.
writer
is a string sink
function
that accepts UTF-8 encoded data and writes
that data to its value string sink
argument, converted from a UTF-8 encoding to a UTF-16 encoding.
That is, the program writes UTF-8, but the provided output receives UTF-16.
writer
has two further switch
-valued arguments, placed ahead of the output argument.
true
is used as a default value in both cases.
The two arguments are:
bom
: true
if a byte order mark (BOM) is to be written as the first character in the output,
false
if not.
big-endian
: true
if the output is to be written big-endian.
Any malformed output data is written as a Unicode NOT-A-CHARACTER
character (0xFFFD
). The only malformed cases recognized are characters too large to be encodable as UTF-16 (i.e., larger
than 0xFFFF
), and characters whose UTF-16 encodings would be the value of half of a surrogate pair.
The OMFFUTF16 library exports a
shelf of constants that corresponds to the byte-order-marks of the possible byte orderings:
export constant string byte-order-marks initial { "%16r{FE,FF}" with key "little-endian", "%16r{FF,FE}" with key "big-endian" }
A good place to find information on the details of UTF-16 encoding is the UNICODE Consortium.
To use OMFFUTF16, you must import it into your program using an import declaration such as:
import "omffutf16.xmd" prefixed by utf16.