Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TileDB library is not locale safe #3555

Open
jjimenezshaw opened this issue Oct 9, 2022 · 1 comment
Open

TileDB library is not locale safe #3555

jjimenezshaw opened this issue Oct 9, 2022 · 1 comment

Comments

@jjimenezshaw
Copy link

(Fully explained in PDAL/PDAL#3873 )

Many C/C++ functions like atof, strtod, to_string, stringstream, etc do take into consideration the C locale (in both directions: number to string and string to number). The C locale is a global setting that can be set with a function like std::locale::global(std::locale("de_DE.utf8")); (in this example for the German locale... that has to be installed in the OS at execution time). https://en.cppreference.com/w/cpp/locale/locale/global

If the application that uses tiledb is setting a global locale (for any reason), then maybe TileDB is not working as expected.
Just add std::locale::global(std::locale("de_DE.utf8")); at the beginning of test/src/unit.cc, and you will see that tests are failing.
Please, notice that you need the German locale de_DE.utf8 installed in your system at run time.

That happens with many locales of languages that use , (comma) as decimal separator and . as thousand separator, like German, Spanish, French, Dutch, etc.

See that the intention is not making TileDB locale aware (that is, to use the locale of the system), but the other way around: regardless the locale configured, the library should work the same way (probably reading and writing numbers in the classic locale). Unfortunately many C/C++ formatting functions do not help on that :(

./tiledb/test/tiledb_unit 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tiledb_unit is a Catch v2.13.8 host application.
Run with -? for options

-------------------------------------------------------------------------------
Backwards compatibility: Write to an array of older version
-------------------------------------------------------------------------------
/home/jshaw/jjimenezshaw/TileDB/test/src/unit-backwards_compat.cc:678
...............................................................................

/home/jshaw/jjimenezshaw/TileDB/test/src/unit-backwards_compat.cc:720: FAILED:
  REQUIRE( a_read[0] == 100 )
with expansion:
  1 == 100

-------------------------------------------------------------------------------
Backwards compatibility: Upgrades an array of older version and write/read it
-------------------------------------------------------------------------------
/home/jshaw/jjimenezshaw/TileDB/test/src/unit-backwards_compat.cc:1.220
...............................................................................

/home/jshaw/jjimenezshaw/TileDB/test/src/unit-backwards_compat.cc:1.279: FAILED:
  CHECK( fragment_info.version(0) == 1 )
with expansion:
  16 == 1

/home/jshaw/jjimenezshaw/TileDB/test/src/unit-backwards_compat.cc:1.281: FAILED:
  CHECK( fragment_info.version(1) == tiledb::sm::constants::format_version )
with expansion:
  1 == 16

/home/jshaw/jjimenezshaw/TileDB/test/src/unit-backwards_compat.cc:1.304: FAILED:
  REQUIRE( a_read2[i] == i + 11 )
with expansion:
  1 == 11

-------------------------------------------------------------------------------
C API: Test opening array at timestamp, reads
-------------------------------------------------------------------------------
/home/jshaw/jjimenezshaw/TileDB/test/src/unit-capi-array.cc:865
...............................................................................

/home/jshaw/jjimenezshaw/TileDB/test/src/unit-capi-array.cc:997: FAILED:
  CHECK( rc == 0 )
with expansion:
  -1 == 0

/home/jshaw/jjimenezshaw/TileDB/test/src/unit-capi-array.cc:1.115: FAILED:
  {Unknown expression after the reported line}
due to a fatal error condition:
  SIGSEGV - Segmentation violation signal

===============================================================================
test cases:  18 |  15 passed | 3 failed
assertions: 670 | 664 passed | 6 failed

Segmentation fault (core dumped)

Those tests worked perfectly in my computer (checkout of dev in Ubuntu 20.04) without setting the global locale to German.

See the PR I made in PDAL to see a clear example and a solution.

Unfortunately I was not able to fix it in TileDB. I do not know how the format to and from string is done in TileDB.

IMHO the proper solution is to use a dedicated parser that does not consider the locale (as many JSON parsers are doing), or use the imbue function in C++ streams with std::locale::classic(), as I did in PDAL. Changing the C locale in a library is not a good idea (it is a global variable), as it is not threadsafe, and can seriously impact the application threads (even when TileDB is run singlethreaded, there can be other threads from the main application).

@ihnorton ihnorton self-assigned this Oct 9, 2022
@ihnorton
Copy link
Member

ihnorton commented Oct 12, 2022

Hi @jjimenezshaw, thank you for opening this issue. It's an important point to address, especially for embedded library usage where we don't control the process-global locale. We are looking in to this further and will keep this issue open until we ship a fix.


SC-23074

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants