Floating point compressor: write tests that test special floating point cases #3384

abigalekim · 2022-07-22T05:40:35Z

Here, I write extra test cases that test special types of floating point numbers for the float scaling filter.

Test cases to write:

Zeros
Denormalized numbers
Infinity

TYPE: IMPROVEMENT
DESC: Floating point compressor additional testing

shortcut-integration · 2022-07-22T05:40:37Z

This pull request has been linked to Shortcut Story #19397: Floating point compressor: write tests that test special floating point cases.

jp-dark

I think subnormals should be allowed. We can discuss this further if needed.

tiledb/sm/filter/float_scaling_filter.h

tiledb/sm/filter/float_scaling_filter.cc

jp-dark · 2022-07-25T14:01:39Z

test/src/unit-cppapi-float-scaling-filter.cc

    double scale = 2.53;
    double offset = 0.138;


Why test with just these specific values?

I tried playing with random values just now, and found out that the compiler optimizes the result out when the values are too large, causing a test failure that is not actually my float scaling filter codes fault, but the tests fault. I wish there was a way to compile tests with less than O3 but alas...

BTW, this is only problematic in the CPP API test suite, not the tile one.

I have submitted a change where the values are between -64 and 64, and this seems to pass. Perhaps we should investigate this further. I think I talked to Isaiah about this already and he just said to note that this is a lossy compressor...

the compiler optimizes the result out when the values are too large

That really doesn't sound right. Do you have specific values that were generated that we causing failures? (If not, it illustrates why using non-repeatable random numbers is bad for testing and using seedable, repeatable pseudorandom sequences is good for testing.)

"Too large" with scaling could mean that you're getting overflow, in which case the testing is correctly identifying a defect in the compressor, failing to recognize overflow and failing on compression, which is better than writing out bad data without a failure.

I checked the original PR #3243. It has an overflow/underflow defect. There's no checking at all for either condition. The reason the error is not showing up with fixed test parameters is that the data is within the realm of validity for the filter parameters.

A simple example. With 56-bit mantissas (double), scale = 1 and offset = 2^11, then data max_float - (2^10) will have an addition overflow.

is it better that I check for overflow/underflow with https://en.cppreference.com/w/cpp/numeric/fenv/feclearexcept or manually?

test/src/unit-filter-pipeline.cc

…lekim/sc-19397/floating-point-compressor-write-tests-that

jp-dark

There are a few implicit integer to floating-point compiler warnings that block compiling on clang when warnings-as-errors is on.

test/src/unit-filter-pipeline.cc

test/src/unit-cppapi-float-scaling-filter.cc

ihnorton · 2022-08-16T19:59:43Z

tiledb/sm/filter/float_scaling_filter.cc

@@ -87,6 +90,19 @@ Status FloatScalingFilter::run_forward(
    const T* part_data = static_cast<const T*>(i.data());
    for (uint32_t j = 0; j < num_elems_in_part; ++j) {
      T elem = part_data[j];
+
+      // We should only handle numbers that are either normalized or 0.
+      switch (std::fpclassify(elem)) {


@abigalekim this check needs to happen after the computation, and we need to check that the classification of the input matches the classification of the output (nan -> nan, infinite -> infinite)

(per @eric-hughes-tiledb -- we also need to check for integer overflow, if not done already)

We are addressing overflow in a different PR.

Wait, also how? We shouldn't store the input data, and since we're converting and storing as integers, it doesn't really make sense to support infinite/NaN numbers.

Check fpclassify(elem) == fpclassify( [the floating point expression inside round ] )

because of this change, zeros and denormalized array tests fail. is this expected? should I make the change in this PR? There are other PRs coming up to address this specific change.

eric-hughes-tiledb

There remain major problems with this PR. The following are all blocking problems for me. I will continue to request changes on this PR until they're fixed.

There's no documentation about which tests do what. These tests are subtle enough that we need documentation, both for the tests and the test support classes.
- The documentation needs to include inequalities that show that the generated data won't overflow if it's not supposed to overflow (or will overflow if that's what's being tested).
There's a large amount of copypasta. Even some of the comments are copypasta. It's a bad idea to make a maintenance burden in fresh code. Where there's copypasta, it needs to be turned into support functions or classes.
- The constructor of a PRNG class will need a seed value as an argument.
- For tests with slightly different tests inside basically the same loop, a lambda is appropriate to instantiate the difference.
Problems with random numbers to generate test cases:
- Most importantly, there are tests that use non-repeatable seeds for PRNG. These are unacceptable in unit tests, exactly because they are non-repeatable. One of the consequences of this is that any reference to std::random_device needs to be eliminated.
  - Printing out a seed is not an acceptable substitute by itself. Perhaps in concert with other mechanisms to make replication reliable and straightforward it would be; I'd need to see it.
- Continued use of std::mt19937, a 32-bit generator, to generate 64-length numbers. Use std::mt19937_64.
Non-C.41 use of set_option. The filter class already has a good constructor; use it.

eric-hughes-tiledb · 2022-08-17T13:24:35Z

test/src/unit-cppapi-float-scaling-filter.cc

+
+    f.set_option(TILEDB_SCALE_FLOAT_BYTEWIDTH, &byte_width)
+        .set_option(TILEDB_SCALE_FLOAT_FACTOR, &scale)
+        .set_option(TILEDB_SCALE_FLOAT_OFFSET, &offset);


Do not use non-C.41 functions like set_option when a C.41 constructor is available. Generate PRNG-originated test values first and construct the filter object (with the full constructor) second.

eric-hughes-tiledb · 2022-08-17T13:49:37Z

test/src/unit-cppapi-float-scaling-filter.cc

-    f.set_option(TILEDB_SCALE_FLOAT_OFFSET, &offset);
+    INFO(
+        "Scale: " + std::to_string(scale) + ", Offset: " +
+        std::to_string(offset) + ", Byte Width: " + std::to_string(byte_width));


Printing out the seeds is not a substitute for using a fixed seed. Replication of an error would require manual intervention in a debugger or custom code, neither of which counts as "actually repeatable".

eric-hughes-tiledb · 2022-08-17T14:01:21Z

test/src/unit-cppapi-float-scaling-filter.cc

+    T dis_max = std::min(
+        std::numeric_limits<T>::max(),
+        static_cast<T>(std::numeric_limits<W>::max()));
+    std::uniform_real_distribution<T> dis(dis_min, dis_max);


This distribution has two problems.

It's going to overflow on generated values in all cases. The floating scale is in the interval [-64, 64], so value within [max/64,max] will overflow on multiplication. The offset adds to the complication. This all needs documentation with an inequality estimate showing that the generated values won't generate an overflow.

It's going to overflow on generated values for certain type arguments T and W, for example double and int8_t.

If you're not seeing failures, it's that you're getting lucky and not generating enough values. int dim_hi = 10 and that means only 100 values are being generated, which isn't enough to see problems.

eric-hughes-tiledb · 2022-08-17T14:11:26Z

test/src/unit-filter-pipeline.cc

@@ -3963,7 +3964,7 @@ TEST_CASE("Filter: Test encryption", "[filter][encryption]") {
 }

 template <typename FloatingType, typename IntType>
-void testing_float_scaling_filter() {
+void testing_float_scaling_filter(bool negative) {


Adding an argument without using it?

eric-hughes-tiledb · 2022-08-17T14:24:55Z

test/src/unit-filter-pipeline.cc

@@ -3963,7 +3964,7 @@ TEST_CASE("Filter: Test encryption", "[filter][encryption]") {
 }

 template <typename FloatingType, typename IntType>
-void testing_float_scaling_filter() {
+void testing_float_scaling_filter(bool negative) {


This function is extremely poorly named. What's it testing?
It needs both a new name and documentation.

eric-hughes-tiledb · 2022-08-17T14:30:19Z

test/src/unit-filter-pipeline.cc

+  }
+}
+
+TEMPLATE_TEST_CASE(


None of these tests belong in unit-filter-pipeline. They're not testing FilterPipeline

eric-hughes-tiledb · 2022-08-17T14:33:35Z

test/src/unit-filter-pipeline.cc

+
+  Tile tile;
+  Datatype t = Datatype::FLOAT32;
+  switch (sizeof(FloatingType)) {


This switch statement is copypasta. Figure out how to get rid of it.

eric-hughes-tiledb · 2022-08-17T14:37:33Z

test/src/unit-filter-pipeline.cc

+  FilterPipeline pipeline;
+  ThreadPool tp(4);
+  CHECK(pipeline.add_filter(FloatScalingFilter()).ok());
+  pipeline.get_filter<FloatScalingFilter>()->set_option(


More non-C.41

eric-hughes-tiledb · 2022-08-17T14:42:17Z

There remain major problems with this PR.

There's more. That's all I have time for this morning.

abigalekim added 2 commits July 15, 2022 18:23

wip

5c0b1b4

this took a surprisingly long time to get how to do

eb895d2

abigalekim requested a review from ihnorton July 22, 2022 05:40

make format [skip ci]

002dfd9

abigalekim marked this pull request as draft July 22, 2022 05:42

zeroes added

0a30ffc

abigalekim marked this pull request as ready for review July 22, 2022 18:43

abigalekim requested a review from jp-dark July 22, 2022 19:25

fixing build

48b2e5c

jp-dark reviewed Jul 25, 2022

View reviewed changes

addressed julia's comments

4e67c4a

abigalekim requested a review from jp-dark July 26, 2022 03:17

abigalekim added 4 commits August 5, 2022 00:56

merge changes

0dcd96a

Merge branch 'dev' of https://github.com/TileDB-Inc/TileDB into abiga…

7bdc7fa

…lekim/sc-19397/floating-point-compressor-write-tests-that

added tests

3458198

make format

9049675

abigalekim requested a review from eric-hughes-tiledb August 9, 2022 14:10

jp-dark approved these changes Aug 15, 2022

View reviewed changes

jp-dark requested changes Aug 16, 2022

View reviewed changes

test/src/unit-filter-pipeline.cc Outdated Show resolved Hide resolved

test/src/unit-filter-pipeline.cc Outdated Show resolved Hide resolved

test/src/unit-filter-pipeline.cc Outdated Show resolved Hide resolved

test/src/unit-cppapi-float-scaling-filter.cc Show resolved Hide resolved

ihnorton reviewed Aug 16, 2022

View reviewed changes

addressed changes

2f24142

abigalekim requested a review from jp-dark August 17, 2022 03:57

eric-hughes-tiledb requested changes Aug 17, 2022

View reviewed changes

wip [skip ci]

71b86a9

ihnorton removed the request for review from jp-dark February 17, 2023 19:04

abigalekim added 3 commits January 24, 2024 18:07

rebasing

7550ab3

wip

24e3f0f

refactor

70aa10f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Floating point compressor: write tests that test special floating point cases #3384

Floating point compressor: write tests that test special floating point cases #3384

abigalekim commented Jul 22, 2022 •

edited

shortcut-integration bot commented Jul 22, 2022

jp-dark left a comment

jp-dark Jul 25, 2022

abigalekim Jul 26, 2022 •

edited

eric-hughes-tiledb Jul 26, 2022

eric-hughes-tiledb Jul 26, 2022

abigalekim Aug 5, 2022

jp-dark left a comment

ihnorton Aug 16, 2022 •

edited

ihnorton Aug 16, 2022

abigalekim Aug 16, 2022

abigalekim Aug 17, 2022

ihnorton Aug 17, 2022

abigalekim Aug 17, 2022

eric-hughes-tiledb left a comment

eric-hughes-tiledb Aug 17, 2022

eric-hughes-tiledb Aug 17, 2022

eric-hughes-tiledb Aug 17, 2022

eric-hughes-tiledb Aug 17, 2022

eric-hughes-tiledb Aug 17, 2022

eric-hughes-tiledb Aug 17, 2022

eric-hughes-tiledb Aug 17, 2022

eric-hughes-tiledb Aug 17, 2022

eric-hughes-tiledb commented Aug 17, 2022

Floating point compressor: write tests that test special floating point cases #3384

Are you sure you want to change the base?

Floating point compressor: write tests that test special floating point cases #3384

Conversation

abigalekim commented Jul 22, 2022 • edited

shortcut-integration bot commented Jul 22, 2022

jp-dark left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abigalekim Jul 26, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jp-dark left a comment

Choose a reason for hiding this comment

ihnorton Aug 16, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-hughes-tiledb commented Aug 17, 2022

abigalekim commented Jul 22, 2022 •

edited

abigalekim Jul 26, 2022 •

edited

ihnorton Aug 16, 2022 •

edited