unicode.general-category

function

Library: Unicode (OMUNICODE)
Import : omunicode.xmd

Returns: the two-letter general category value of the argument character

Declaration

export string function
   general-category of value integer character

Purpose

Use unicode.general-category to find the Unicode general category property of a character code point, as defined in Unicode 5.1.0. The following general category values can be returned by this function:

Lu : Letter, Uppercase
Ll : Letter, Lowercase
Lt : Letter, Titlecase
Lm : Letter, Modifier
Lo : Letter, Other
Mn : Mark, Nonspacing
Mc : Mark, Spacing Combining
Me : Mark, Enclosing
Nd : Number, Decimal Digit
Nl : Number, Letter
No : Number, Other
Pc : Punctuation, Connector
Pd : Punctuation, Dash
Ps : Punctuation, Open
Pe : Punctuation, Close
Pi : Punctuation, Initial quote (may behave like Ps or Pe depending on usage)
Pf : Punctuation, Final quote (may behave like Ps or Pe depending on usage)
Po : Punctuation, Other
Sm : Symbol, Math
Sc : Symbol, Currency
Sk : Symbol, Modifier
So : Symbol, Other
Zs : Separator, Space
Zl : Separator, Line
Zp : Separator, Paragraph
Cc : Other, Control
Cf : Other, Format
Cs : Other, Surrogate
Co : Other, Private Use
Cn : Other, Not Assigned

Example

The following pattern function matches a UTF-8 encoded white space character:

  import "omunicode.xmd" prefixed by unicode.
  import "omutf8.xmd"    prefixed by utf8.
  
  define switch function
     unicode-whitespace ()
  as
     return #current-input matches (utf8.char => character
                                    (when unicode.general-category of utf8.code-point of character matches "Z"))

Usage Note

To use unicode.general-category, you must import OMUNICODE into your program using an import declaration such as:

  import "omunicode.xmd" prefixed by unicode.

Other Library Functions