New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processes for Event-based data loading and pre-processing #514
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot here. So far, I have only had time to review the AedatDataLoader
and its tests. This needs extensive revision but most of the items are small or cosmetic. Please make sure you understand the issues and fix other areas of this PR that are affected by similar problems.
I will review other parts of the PR later, ideally once you have already updated it.
src/lava/proc/event_data/event_data_loader/aedat_data_loader.py
Outdated
Show resolved
Hide resolved
src/lava/proc/event_data/event_data_loader/aedat_data_loader.py
Outdated
Show resolved
Hide resolved
src/lava/proc/event_data/event_data_loader/aedat_data_loader.py
Outdated
Show resolved
Hide resolved
src/lava/proc/event_data/event_data_loader/aedat_data_loader.py
Outdated
Show resolved
Hide resolved
# Stopping | ||
data_loader.stop() | ||
|
||
self.assertFalse(data_loader.runtime._is_running) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed. Replace with asserts on the output - see my comment above.
data_history = [ | ||
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], | ||
[1, 1, 1, 1, 1, 1, 1, 1, 1], | ||
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], | ||
[1, 1, 1, 1, 1, 1], | ||
[0], | ||
[1, 1, 1], | ||
[1], | ||
[1], | ||
[1] | ||
] | ||
indices_history = [ | ||
[1597, 2308, 2486, 2496, 2498, 1787, 2642, 2633, 2489, | ||
2488, 1596, 1729, 1727, 2500, 1780], | ||
[1600, 1732, 2297, 1388, 2290, 2305, 3704, 3519, 1911], | ||
[7138, 2301, 2471, 1601, 2982, 1364, 1379, 1386, 1384, | ||
2983, 1390, 2289, 1401, 1362, 2293], | ||
[1910, 1382, 1909, 1562, 1606, 1381], | ||
[464], | ||
[2323, 1908, 1393], | ||
[4062], | ||
[1792], | ||
[3889] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this comes from the DVS file we are using. Can we reduce this in size to make the test smaller and more to the point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now only using data from a single timestep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may not be enough then. The test should make sure that we can read events, from multiple time steps, with different polarities, at different positions/indices. The test data should cover all of those cases (and others, if I forgot something) but does not need more. In the previous version, it seemed like it contained a bunch more cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmh, good point. Then I guess a good middle ground could be to go up to timestep 5 (with this data). We tried to reduce the resolution of the aedat4 file, but it seems like trying to use the "crop scale" options on dv for resizing leads to weird behavior with the metadata of the file, like reducing resolution doesn't actually change the x and y indices in the expected way. Do you think it's worth trying to crop another part of the video where we may get multiple polarities, and varying event batch sizes in the first 2-3 events?
if expected_data.shape[0] > max_num_events: | ||
data_idx_array = np.arange(0, expected_data.shape[0]) | ||
sampled_idx = rng.choice(data_idx_array, | ||
max_num_events, | ||
replace=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to reimplement what you are testing, which is not a good way to test. Instead, make sure that the outcome is correct. Here, it should be fine to just assert that after subsampling, the number of events is the maximum. If that is not enough, you could hard-code the expected events after subsampling.
Additionally, you could write another test that runs the Process twice with different seeds and asserts that the events have been subsampled differently. This makes sure that the randomness works.
|
||
self.assertFalse(data_loader.runtime._is_running) | ||
|
||
def test_end_of_file(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this test do?
|
||
self.assertFalse(data_loader.runtime._is_running) | ||
|
||
def test_index_encoding(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this test do?
…/lava into dev/event_data_processes � Conflicts: � src/lava/proc/event_data/io/dv_stream.py
…/lava into dev/event_data_processes � Conflicts: � tests/lava/proc/event_data/io/test_dv_stream.py
…/lava into dev/event_data_processes
…/lava into dev/event_data_processes
Signed-off-by: Mathis Richter <mathis.richter@intel.com>
Signed-off-by: Mathis Richter <mathis.richter@intel.com>
…/lava into dev/event_data_processes
Signed-off-by: Mathis Richter <mathis.richter@intel.com>
Signed-off-by: Mathis Richter <mathis.richter@intel.com>
Signed-off-by: Mathis Richter <mathis.richter@intel.com>
Signed-off-by: Mathis Richter <mathis.richter@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the feedback from @mathisrichter, I like most of it.
All classes and function are documented and, as far as I can see, tested.
I would like to change the folder structure, though. I dislike the "event_data" folder as the dataloader should be part of IO.
My suggestion:
lava/proc/io/
event_data/
aedat_stream.py
dv_stream.py
sink.py
...
lava/proc/transformations
event_data/
binary_to_unary.py
event_to_frame.py
max_pooling.py
Issue Number:
Objective of pull request: This PR adds Processes for Event-based data loading and pre-processing.
Pull request checklist
Your PR fulfills the following requirements:
flakeheaven lint src/lava tests/
) and (bandit -r src/lava/.
) pass locallypytest
) passes locallyPull request type
Please check your PR type:
What is the current behavior?
What is the new behavior?
Standard Processes for handling Event-based data (loading and pre-processing) would become available.
Negative and positive polarities are both encoded as 1.
Does this introduce a breaking change?
Supplemental information