function
Library: ISO/IEC 8859 (OMFF8859)
Import : omff8859.xmd |
Returns: A scannable input source for streaming data. |
export string source function reader in value encoding-type encoding optional from value string source input-data
Use reader
to read a string source
and return the text of that source converted from an
ISO/IEC 8859 encoding to a UTF-8 encoding. So, although the provided source is in one of the ISO/IEC 8859
encodings, the program sees UTF-8.
If the source input-data contains a byte that is an unused code point in the selected encoding, that byte will be suppressed. The unused code points are
0x00
through 0x1f
, and 0x7f
through 0x9f
.
0x00
through 0x1f
, and 0x7f
through 0x9f
.
0x00
through 0x1f
, 0x7f
through 0xc0
except for 0xa0
, 0xa4
, 0xac
, 0xad
, 0xbb
, and 0xbf
, as well as 0xdb
through 0xdf
and 0xf3
through 0xff
.
0x00
through 0x1f
, 0x7f
through 0x9f
, 0xae
,
0xd2
, and 0xff
.
0x00
through 0x1f
, and 0x7f
through 0x9f
, 0xa1
,
0xbf
through 0xde
, 0xfb
, 0xfc
, and 0xff
0x00
through 0x1f
, 0x7f
through 0x9f
, 0xdc
through 0xde
, and 0xfc
through 0xff
.
The following example converts a file from ISO/IEC 8859-5 to UTF-8 for further processing by find
rules:
import "omff8859.xmd" prefixed by iso8859. process using group "process input" submit iso8859.reader in iso8859.encoding-8859-5 from file #args[1] group "process input" ; ...