define external string source function jis-input-file
value string filename
exceptions-to value io-exception exceptions-to optional
This external string source function reads the file named by the "filename" argument and returns the text of that file converted from a JIS encoding to a UTF-8 encoding. The file is in JIS, but the program receives the UTF-8 conversion.
- "filename". This is the name of the JIS encoded file you want to read and convert to UTF-8. If a zero-length "filename" is used (that is, ""), then
jis-input-file does not open a file, but reads from standard input. The zero-length file name option allows the conversion functionality to be used in an OmniMark program that is being used as a filter.
- "exceptions-to". This optional argument indicates that errors are to be recorded in the passed "io-exception" object, and that the OmniMark program is not to be immediately terminated. There are three types of errors, categorized according to how they are handled:
- Whenever an invalid or out-of-range encoding is found, it is converted to the UTF-8 encoding of the Unicode "REPLACEMENT CHARACTER" (0xFFFD). If "exceptions-to" is specified, the "io-exception" object is marked for a data encoding error, and the function continues processing.
- If the external string source function cannot be created, either because the declaration does not match what is expected or because there is not enough memory to create the source object, an error is signaled to OmniMark, and your program is terminated.
- If "exceptions-to" is specified, then for any other type of error that occurs during memory allocation, file opening or closing, or reading or writing, the "io-exception" object is marked for the error found, and processing continues. If "exceptions-to" is not specified, an error is signaled to OmniMark and your program is terminated.
The file format is interpreted according to the Japanese Industry Standards JIS X 0201, JIS X 0208, and JIS X 0212. The file format uses escape sequences based on ISO 2022 (also known as JIS X 0202) to shift between the encodings defined by the three standards. On input, some laxity is shown in recognizing not-quite-valid escape sequences and those defined by older versions of the standards.
; Outputting just the ASCII characters in a JIS-encoded file.
repeat scan jis-input-file "myfile.jis"
match ["%0#" to "%127#"]+ => ascii-text
match ["%128#" to "%255#"]+
; Ignore high-order characters (which are non-ASCII).