JIS (OMFFJIS)

This library contains one string source function and one string sink function, as follows:

reader is a string source function that reads its value string source argument, and returns the text of that data converted from a JIS encoding to a UTF-8 encoding. That is, the provided data is in JIS, but the program sees UTF-8.
writer is a string sink function that accepts UTF-8 encoded data and writes that data to its value string sink argument, converted from a UTF-8 encoding to a JIS encoding. That is, the program writes UTF-8, but the provided output receives JIS.

writer has an optional first argument, heralded by encoding-sequence, that is the escape sequence used to switch into two-byte JIS X 0208/JIS C 6226 mode. It must be at least one byte long and must be no more than eight bytes long.

The data formats are interpreted/produced according to the Japanese Industry Standards JIS X 0201, JIS X 0208 and JIS X 0212. The JIS data format uses ISO 2022 (a.k.a. JIS X 0202) based escape sequences to shift between the encodings defined by the three standards. On input, some laxity is shown in recognizing not-quite-valid escape sequences and those defined by older versions of the standards, meaning that it should do a good job on a variety of input files. On output, the escape sequences defined by the latest versions of the standards are used.

The module exports the following constants, which specify the version of the standard being used.

constant string jis-old initial {"%27#$@"} : Old JIS
constant string jis-new initial {"%27#$B"} : New JIS
constant string jis-1978 initial {"%27#$@"} : Old JIS, a.k.a. JIS C 6226-1978
constant string jis-1983 initial {"%27#$B"} : New JIS, a.k.a. JIS X 0208-1983
constant string jis-1990 initial {"%27#&@%27#$B"} : JIS X 0208-1990

The only kinds of errors that can occur are in conversion: finding a character that does not have a conversion in the other character set. In this case, the converted value use is DEL (0x7F) in the JIS encoding, and NOT-A-CHARACTER (0xFFFD) in the Unicode (UTF-8) encoding.

These functions are based on [1] Ken Lunde, “Understanding Japanese Information Processing”, O'Reilly 1993, ISBN 1-56592-043-0. with one exception: the use of shift-out/shift-in for switching to and from half-width Katakana characters follows Ken Lunde's errata at [2] https://resources.oreilly.com/examples/9781565922242/tree/master/errata/ujip-errata-1-3.txt rather than what the book has on p. 70 (which has the JIS7 and JIS8 forms switched around with respect to this issue).

Usage Note

To use OMFFJIS, you must import it into your program using an import declaration such as:

  import "omffjis.xmd" prefixed by jis.

Functions