Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4 byte utf8 and 2 utf16 support required for some Unicode 15.0.0 areas #886

Open
rovasiras opened this issue Nov 6, 2022 · 3 comments
Open

Comments

@rovasiras
Copy link

In the Unicode Standard 15.0.0 has two important area: U+10EC0 - U+10EFF arabic extended-C
U+1E030 - U+1E08F cyrillic extended-D

@rovasiras
Copy link
Author

@caolanm Required for capability the following steps in the u8_u16 function:4 byte Utf8 code transform to utf32, then divide it two surrogate word. The u16_u8 function needs this mirrored method. You can found about the correct method in unicode faq "utf8 utf16 utf32".

@rovasiras
Copy link
Author

@laszlonemeth what do you about it? #886 (comment)

@laszlonemeth
Copy link
Contributor

A temporary and back-compatible solution could be to use ICONV and OCONV to convert the non-BMP characters e.g. to user-defined characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants