Copyright © 2010-2011 Michael Uvarov
Authors: Michael Uvarov (arcusfelis@gmail.com).
UNICODE COLLATION ALGORITHM see Unicode Technical Standard #10
1. Hangul Collation Requirements PS: There is the main source of information.
2. Terminator weight for Hangul
3. Theory vs. practice for Korean text collation PS: there is no any practice. They do not the UCA :/
4. Wiki
6. Unicode implementer's guide part 3: Conjoining jamo behavior
7. Unicode implementer's guide part 5: Collation
8. Unicode collation works now PS: I found it so late. :(
9. ICU
10. String Sorting (Natural) in Erlang Cookbook
For hangul collation: 11. Hangul Collation Requirements 12. UTR 10 13. KSX1001 on Wiki
http://unicode.org/reports/tr10/#Multi_Level_Comparison
* L1 Base characters * L2 Accents * L3 Case * L4 Punctuation
Example using levels: C = ux_uca_options:get_options([{strength, 3}]).
ux_uca:sort_key(C, "Get L1-L3 weights").
Variable collation elements are not reset to be ignorable, but get the weights explicitly mentioned in the file.
* SPACE would have the value [.0209.0020.0002] * Capital A would be unchanged, with the value [.06D9.0020.0008] * Ignorables are unchanged.
Example:C = ux_uca_options:get_options(non_ignorable). ux_uca:sort_key(C, "Non-ignorable collation sort key").
Variable collation elements and any subsequent ignorables are reset so that their weights at levels one through three are zero. For example,
* SPACE would have the value [.0000.0000.0000] * A combining grave accent after a space would have the value [.0000.0000.0000] * Capital A would be unchanged, with the value [.06D9.0020.0008] * A combining grave accent after a Capital A would be unchanged
Example:C = ux_uca_options:get_options(non_ignorable). ux_uca:sort_key(C, "Blanked collation sort key").
Variable collation elements are reset to zero at levels one through three. In addition, a new fourth-level weight is appended, whose value depends on the type, as shown in Table 12. Any subsequent primary or secondary ignorables following a variable are reset so that their weights at levels one through four are zero.
* A combining grave accent after a space would have the value [.0000.0000.0000.0000]. * A combining grave accent after a Capital A would be unchanged.
Example:C = ux_uca_options:get_options(shifted). ux_uca:sort_key(C, "Shifted collation sort key").
This option is the same as Shifted, except that all trailing FFFFs are trimmed from the sort key. This could be used to emulate POSIX behavior.
Example:C = ux_uca_options:get_options(shift_trimmed). ux_uca:sort_key(C, "Shift-trimmed collation sort key").
result() = {[uca_elem()], string()}
search_result() = {string(), string(), string()}
uca_alternate() = shifted | shift_trimmed | non_ignorable | blanked
uca_array() = [uca_elem()]
uca_case_first() = lower | upper | off
uca_compare_result() = lower | greater | equal
uca_elem() = [atom() | uca_weight()]
uca_sort_key_format() = binary | list | uncompressed
uca_strength() = 1 | 2 | 3 | 4
uca_weight() = integer()
uca_weights() = [uca_weight()]
| compare/2 | Compare two strings and return: lower, greater or equal. |
| compare/3 | |
| search/2 | |
| search/3 | |
| search/4 | |
| sort/1 | Sort a list of strings. |
| sort/2 | Sort a list of strings. |
| sort_array/1 | Convert the unicode string to the collation element array |
| sort_array/2 | |
| sort_key/1 | Convert the unicode string to the sort key. |
| sort_key/2 |
compare(S1::string(), S2::string()) -> uca_compare_result()
Compare two strings and return: lower, greater or equal.
compare(Uca_options::#uca_options{}, S1::string(), S2::string()) -> uca_compare_result()
search(Target::string(), Pattern::string()) -> search_result()
search(Target::string(), Pattern::string(), MatchStyle::atom()) -> search_result()
search(Uca_options::#uca_options{}, Target::string(), Pattern::string()) -> search_result()
search(Uca_options::#uca_options{}, Target::string(), Pattern::string(), MatchStyle::atom()) -> search_result()
sort(Strings::[string()]) -> [string()]
Sort a list of strings.
sort(Uca_options::#uca_options{}, Strings::[string()]) -> [string()]
Sort a list of strings.
sort_array(S) -> any()
Convert the unicode string to the collation element array
sort_array(C, S) -> any()
sort_key(S) -> any()
Convert the unicode string to the sort key.
sort_key(C, S) -> any()
Generated by EDoc