This library contains one string source function
and one string sink function
, as follows:
reader
is a string source function
that reads its value string source
argument, and returns the text of that data converted from a JIS encoding to a UTF-8 encoding. That is, the
provided data is in JIS, but the program sees UTF-8.
writer
is a string sink function
that accepts UTF-8 encoded data and writes that
data to its value string sink
argument, converted from a UTF-8 encoding to a JIS encoding. That is,
the program writes UTF-8, but the provided output receives JIS.
writer
has an optional first argument, heralded by encoding-sequence
, that is the
escape sequence used to switch into two-byte JIS X 0208/JIS C 6226 mode. It must be at least one byte long and
must be no more than eight bytes long.
The data formats are interpreted/produced according to the Japanese Industry Standards JIS X 0201, JIS X 0208 and JIS X 0212. The JIS data format uses ISO 2022 (a.k.a. JIS X 0202) based escape sequences to shift between the encodings defined by the three standards. On input, some laxity is shown in recognizing not-quite-valid escape sequences and those defined by older versions of the standards, meaning that it should do a good job on a variety of input files. On output, the escape sequences defined by the latest versions of the standards are used.
The module exports the following constants, which specify the version of the standard being used.
constant string jis-old initial {"%27#$@"}
: Old JIS
constant string jis-new initial {"%27#$B"}
: New JIS
constant string jis-1978 initial {"%27#$@"}
: Old JIS, a.k.a. JIS C 6226-1978
constant string jis-1983 initial {"%27#$B"}
: New JIS, a.k.a. JIS X 0208-1983
constant string jis-1990 initial {"%27#&@%27#$B"}
: JIS X 0208-1990
The only kinds of errors that can occur are in conversion: finding a character that does not have a conversion
in the other character set. In this case, the converted value use is DEL
(0x7F
) in
the JIS encoding, and NOT-A-CHARACTER
(0xFFFD
) in the Unicode (UTF-8) encoding.
These functions are based on [1] Ken Lunde, “Understanding Japanese Information Processing”, O'Reilly 1993, ISBN 1-56592-043-0. with one exception: the use of shift-out/shift-in for switching to and from half-width Katakana characters follows Ken Lunde's errata at [2] https://resources.oreilly.com/examples/9781565922242/tree/master/errata/ujip-errata-1-3.txt rather than what the book has on p. 70 (which has the JIS7 and JIS8 forms switched around with respect to this issue).
To use OMFFJIS, you must import it into your program using an import declaration such as:
import "omffjis.xmd" prefixed by jis.