JIS (OMFFJIS)

This library contains one OmniMark external source function and one OmniMark external output function implementation for converting from and to the "JIS" encoding of Japanese text, as follows:

reader is an external source function that reads a value string source, its argument, and returns the text of that file converted from a JIS encoding to a UTF-8 encoding. That is, the provided source is in JIS, but the program sees UTF-8.
writer is an external output function that accepts UTF-8 encoded data and writes that data to a value string sink, its first argument, converted from a UTF-8 encoding to a JIS encoding. That is, the program writes UTF-8, but the provided output receives JIS.

writer has an optional first argument, heralded by encoding-sequence, that is the escape sequence used to switch into two-byte JIS X 0208/JIS C 6226 mode. It must be at least one byte long and must be no more than eight bytes long.

The data formats are interpreted/produced according to the Japanese Industry Standards JIS X 0201, JIS X 0208 and JIS X 0212. The JIS data format uses ISO 2022 (a.k.a. JIS X 0202) based escape sequences to shift between the encodings defined by the three standards. On input, some laxity is shown in recognizing not-quite-valid escape sequences and those defined by older versions of the standards, meaning that it should do a good job on a variety of input files. On output, the escape sequences defined by the latest versions of the standards are used.

The module exports the following constants, which specify the version of the standard being used.

constant string jis-old initial {"%27#$@"} ; Old JIS
constant string jis-new initial {"%27#$B"} ; New JIS
constant string jis-1978 initial {"%27#$@"} ; Old JIS, a.k.a. JIS C 6226-1978
constant string jis-1983 initial {"%27#$B"} ; New JIS, a.k.a. JIS X 0208-1983
constant string jis-1990 initial {"%27#&@%27#$B"} ; JIS X 0208-1990

The only kinds of errors that can occur are in conversion: finding a character that doesn't have a conversion in the other character set. In this case, the converted value use is DEL (0x7F) in the JIS encoding, and NOT-A-CHARACTER (0xFFFD) in the Unicode (UTF-8) encoding.

These functions are based on the book "Understanding Japanese Information Processing" by Ken Lunde, O'Reilly 1993, ISBN 1-56592-043-0, with one exception: the use of shift-out/shift-in for switching to and from half-width Katakana characters follows Ken Lunde's errata at <ftp://ftp.ora.com/pub/examples/nutshell/ujip/errata/ujip-errata-1-3.txt>, rather than what the book has on page 70 (which has the JIS7 and JIS8 forms switched around with respect to this issue).

Usage Note

To use omffjis, you must import it into your program using a statement like this:

  import "omffjis.xmd" prefixed by jis.

(Please see the import topic for more on importing.)

Functions