Reach level 1 conformance with UTS #18 #12

pschiffmann · 2019-01-19T11:47:16Z

http://unicode.org/reports/tr18/#Basic_Unicode_Support

RL1.1 Hex Notation
To meet this requirement, an implementation shall supply a mechanism for specifying any Unicode code point (from U+0000 to U+10FFFF), using the hexadecimal code point representation.
RL1.2 Properties
To meet this requirement, an implementation shall provide at least a minimal list of properties, consisting of the following: General_Category, Script and Script_Extensions, Alphabetic, Uppercase, Lowercase, White_Space, Noncharacter_Code_Point, Default_Ignorable_Code_Point, ANY, ASCII, ASSIGNED
RL1.2a Compatibility Properties
To meet this requirement, an implementation shall provide the properties listed in Annex C: Compatibility Properties, with the property values as listed there. Such an implementation shall document whether it is using the Standard Recommendation or POSIX-compatible properties.
RL1.3 Subtraction and Intersection
To meet this requirement, an implementation shall supply mechanisms for union, intersection and set-difference of sets of characters within regular expression character class expressions.
RL1.4 Simple Word Boundaries
To meet this requirement, an implementation shall extend the word boundary mechanism so that:
The class of <word_character> includes all the Alphabetic values from the Unicode character database, from UnicodeData.txt, plus the decimals (General_Category=Decimal_Number, or equivalently Numeric_Type=Decimal), and the U+200C ZERO WIDTH NON-JOINER and U+200D ZERO WIDTH JOINER (Join_Control=True). See also Annex C: Compatibility Properties.
Nonspacing marks are never divided from their base characters, and otherwise ignored in locating boundaries.
RL1.5 Simple Loose Matches
To meet this requirement, if an implementation provides for case-insensitive matching, then it shall provide at least the simple, default Unicode case-insensitive matching, and specify which properties are closed and which are not.
To meet this requirement, if an implementation provides for case conversions, then it shall provide at least the simple, default Unicode case folding.
RL1.6 Line Boundaries
To meet this requirement, if an implementation provides for line-boundary testing, it shall recognize not only CRLF, LF, CR, but also NEL (U+0085), PARAGRAPH SEPARATOR (U+2029) and LINE SEPARATOR (U+2028).
RL1.7 Supplementary Code Points
To meet this requirement, an implementation shall handle the full range of Unicode code points, including values from U+FFFF to U+10FFFF. In particular, where UTF-16 is used, a sequence consisting of a leading surrogate followed by a trailing surrogate shall be handled as a single code point in matching.

pschiffmann pinned this issue Feb 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reach level 1 conformance with UTS #18 #12

Reach level 1 conformance with UTS #18 #12

pschiffmann commented Jan 19, 2019 •

edited

Loading

Reach level 1 conformance with UTS #18 #12

Reach level 1 conformance with UTS #18 #12

Comments

pschiffmann commented Jan 19, 2019 • edited Loading

pschiffmann commented Jan 19, 2019 •

edited

Loading