Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processes for Event-based data loading and pre-processing #514

Open
wants to merge 42 commits into
base: main
Choose a base branch
from

Conversation

gkarray
Copy link
Contributor

@gkarray gkarray commented Dec 2, 2022

Issue Number:

Objective of pull request: This PR adds Processes for Event-based data loading and pre-processing.

Pull request checklist

Your PR fulfills the following requirements:

  • Issue created that explains the change and why it's needed
  • Tests are part of the PR (for bug fixes / features)
  • Docs reviewed and added / updated if needed (for bug fixes / features)
  • PR conforms to Coding Conventions
  • PR applys BSD 3-clause or LGPL2.1+ Licenses to all code files
  • Lint (flakeheaven lint src/lava tests/) and (bandit -r src/lava/.) pass locally
  • Build tests (pytest) passes locally

Pull request type

Please check your PR type:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation changes
  • Other (please describe):

What is the current behavior?

  • No standard Processes for handling Event-based data.

What is the new behavior?

Standard Processes for handling Event-based data (loading and pre-processing) would become available.

  • Currently supported Event IO Processes:
    • AedatStream : Streams event data loaded from aedat4 files.
    • DvStream : Streams event data from the DV software for INIvation cameras, through a TCP connection
  • Currently supported Event pre-processor Processes:
    • BinaryToUnaryPolarity : Converts binary polarity events (0 for negative events, 1 for positive events) to unary polarity events (1 for both negative and positive events)
    • EventsToFrame : Transforms sparsely represented list of Events into a densely represented frame of Events. Two modes of operation supported:
      • 2D Output (shape (W, H, 1)) : In this case, input is assumed to be unary. Output is of shape (W, H, 1) where 0 means no Event and 1 means Event (positive or negative) in the unique channel.
        Negative and positive polarities are both encoded as 1.
      • 3D Output (shape (W, H, 2)) : In this case, input is assumed to be binary. Output is of shape (W, H, 2) where 0 means no Event and 1 means negative Event in the first channel and 0 means no Event and 1 means positive Event in the second channel.
  • Other Process that is more general but could be used as an Event pre-processor:
    • MaxPooling: Applies the max-pooling operation on incoming data with shape (W, H, C) where C is not necessarily 1 or 2.

Does this introduce a breaking change?

  • Yes
  • No

Supplemental information

Copy link
Contributor

@mathisrichter mathisrichter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot here. So far, I have only had time to review the AedatDataLoader and its tests. This needs extensive revision but most of the items are small or cosmetic. Please make sure you understand the issues and fix other areas of this PR that are affected by similar problems.

I will review other parts of the PR later, ideally once you have already updated it.

pyproject.toml Show resolved Hide resolved
# Stopping
data_loader.stop()

self.assertFalse(data_loader.runtime._is_running)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed. Replace with asserts on the output - see my comment above.

Comment on lines 118 to 141
data_history = [
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[0],
[1, 1, 1],
[1],
[1],
[1]
]
indices_history = [
[1597, 2308, 2486, 2496, 2498, 1787, 2642, 2633, 2489,
2488, 1596, 1729, 1727, 2500, 1780],
[1600, 1732, 2297, 1388, 2290, 2305, 3704, 3519, 1911],
[7138, 2301, 2471, 1601, 2982, 1364, 1379, 1386, 1384,
2983, 1390, 2289, 1401, 1362, 2293],
[1910, 1382, 1909, 1562, 1606, 1381],
[464],
[2323, 1908, 1393],
[4062],
[1792],
[3889]
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this comes from the DVS file we are using. Can we reduce this in size to make the test smaller and more to the point?

Copy link

@j0shcd j0shcd Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now only using data from a single timestep

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may not be enough then. The test should make sure that we can read events, from multiple time steps, with different polarities, at different positions/indices. The test data should cover all of those cases (and others, if I forgot something) but does not need more. In the previous version, it seemed like it contained a bunch more cases.

Copy link

@j0shcd j0shcd Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmh, good point. Then I guess a good middle ground could be to go up to timestep 5 (with this data). We tried to reduce the resolution of the aedat4 file, but it seems like trying to use the "crop scale" options on dv for resizing leads to weird behavior with the metadata of the file, like reducing resolution doesn't actually change the x and y indices in the expected way. Do you think it's worth trying to crop another part of the video where we may get multiple polarities, and varying event batch sizes in the first 2-3 events?

Comment on lines 170 to 174
if expected_data.shape[0] > max_num_events:
data_idx_array = np.arange(0, expected_data.shape[0])
sampled_idx = rng.choice(data_idx_array,
max_num_events,
replace=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to reimplement what you are testing, which is not a good way to test. Instead, make sure that the outcome is correct. Here, it should be fine to just assert that after subsampling, the number of events is the maximum. If that is not enough, you could hard-code the expected events after subsampling.

Additionally, you could write another test that runs the Process twice with different seeds and asserts that the events have been subsampled differently. This makes sure that the randomness works.


self.assertFalse(data_loader.runtime._is_running)

def test_end_of_file(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this test do?


self.assertFalse(data_loader.runtime._is_running)

def test_index_encoding(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this test do?

mathisrichter and others added 5 commits December 8, 2022 15:45
Signed-off-by: Mathis Richter <mathis.richter@intel.com>
Signed-off-by: Mathis Richter <mathis.richter@intel.com>
Signed-off-by: Mathis Richter <mathis.richter@intel.com>
Signed-off-by: Mathis Richter <mathis.richter@intel.com>
Copy link
Contributor

@weidel-p weidel-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the feedback from @mathisrichter, I like most of it.
All classes and function are documented and, as far as I can see, tested.

I would like to change the folder structure, though. I dislike the "event_data" folder as the dataloader should be part of IO.
My suggestion:

lava/proc/io/
    event_data/
        aedat_stream.py
        dv_stream.py
    sink.py
    ...

lava/proc/transformations
    event_data/
        binary_to_unary.py
        event_to_frame.py 
    max_pooling.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants