function
Library: Unicode (OMUNICODE)
Import : omunicode.xmd |
Returns: The two-letter general category value of the argument character. |
export string function general-category of value integer character
Use general-category
to find the Unicode general category property of a
character code point, as defined in Unicode 5.1.0. The following general category values
can be returned by this function:
Lu
: Letter, Uppercase
Ll
: Letter, Lowercase
Lt
: Letter, Titlecase
Lm
: Letter, Modifier
Lo
: Letter, Other
Mn
: Mark, Nonspacing
Mc
: Mark, Spacing Combining
Me
: Mark, Enclosing
Nd
: Number, Decimal Digit
Nl
: Number, Letter
No
: Number, Other
Pc
: Punctuation, Connector
Pd
: Punctuation, Dash
Ps
: Punctuation, Open
Pe
: Punctuation, Close
Pi
: Punctuation, Initial quote (may behave like Ps or Pe depending on usage)
Pf
: Punctuation, Final quote (may behave like Ps or Pe depending on usage)
Po
: Punctuation, Other
Sm
: Symbol, Math
Sc
: Symbol, Currency
Sk
: Symbol, Modifier
So
: Symbol, Other
Zs
: Separator, Space
Zl
: Separator, Line
Zp
: Separator, Paragraph
Cc
: Other, Control
Cf
: Other, Format
Cs
: Other, Surrogate
Co
: Other, Private Use
Cn
: Other, Not Assigned
The following pattern function matches a UTF-8 encoded white space character:
import "omunicode.xmd" prefixed by unicode. import "omutf8.xmd" prefixed by utf8. define switch function unicode-whitespace as return #current-input matches (utf8.char => character (when unicode.general-category of utf8.code-point of character matches "Z"))