Skip to content

Commit

Permalink
Merge pull request #1399 from s-trinh/add_NPZ_doc_tutorial
Browse files Browse the repository at this point in the history
Add documentation and tutorial about NumPy NPZ format
  • Loading branch information
fspindle committed May 13, 2024
2 parents 5a34d72 + d48b8b4 commit f1df5cb
Show file tree
Hide file tree
Showing 15 changed files with 1,434 additions and 206 deletions.
124 changes: 124 additions & 0 deletions doc/tutorial/misc/tutorial-npz.dox
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
/**
\page tutorial-npz Tutorial: Read / Save arrays of data from / to NPZ file format

\tableofcontents

\section tuto-npz-intro Introduction

\note
Please refer to the <a href="tutorial-read-write-NPZ-format.html">Python tutorial</a> for a short overview of the NPZ
format from a Python point of view.


The NPY / NPZ ("a zip file containing multiple NPY files") file format is a "standard binary file format in NumPy",
appropriate for binary serialization of large chunks of data.
A description of the NPY format is available
<a href="https://numpy.org/doc/stable/reference/generated/numpy.lib.format.html">here</a>.

The C++ implementation of this binary format relies on the <a href="https://github.com/rogersce/cnpy">rogersce/cnpy</a>
library, available under the MIT license. Additional example code can be found directly from the
<a href="https://github.com/rogersce/cnpy/blob/master/example1.cpp">rogersce/cnpy repository</a>.


\subsection tuto-npz-intro-comparison Comparison with some other file formats

The NPZ binary format is intended to provide a quick and efficient mean to read/save large arrays of data, mostly for
debugging purpose. While the first and direct option for saving data would be to use file text, the choice of the NPZ
format presents the following advantages:
- it is a binary format, that is the resulting file size will be smaller compared to a plain text file (especially
with floating-point numbers),
- it provides exact floating-point representation, that is there is no need to bother with floating-point precision
(see for instance the <a href="https://en.cppreference.com/w/cpp/io/manip/setprecision">setprecision</a> or
<a href="https://en.cppreference.com/w/cpp/io/manip/fixed">std::hexfloat</a> functions),
- it provides some basic compatibility with the NumPy NPZ format
(<a href="https://numpy.org/doc/stable/reference/generated/numpy.load.html">numpy.load</a> and
<a href="https://numpy.org/doc/stable/reference/generated/numpy.savez.html">numpy.savez</a>),
- large arrays of data can be easily appended, with support for multi-dimensional arrays.

On the other hand, the main disadvantages are:
- it is a non-human readable format, suitable for saving large arrays of data, but not for easy debugging,
- saving `string` data is not direct, since it must be treated as vector of `char` data,
- the current implementation only works on little-endian platform (which is the major endianness nowadays).

You can refer to this Wikipedia page for an exhaustive comparison of
<a href="https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats">data-serialization formats</a>.


\section tuto-npz-hands-on Hands-on

\subsection tuto-npz-hands-on-save-string How to save/read string data

Saving C++ `std::string` data can be achieved the following way:
- create a `string` object and convert it to a `vector<char>` object:
\snippet tutorial-npz.cpp Save_string_init
- add and save the data to the `.npz` file, the identifier is the variable name and the `"w"` means `write`
(`"a"` means `append` to the archive):
\snippet tutorial-npz.cpp Save_string_save


Reading back the data can be done easily:
- load the data:
\snippet tutorial-npz.cpp Read_string_load
- the identifier is then needed,
- a conversion from `vector<char>` to `std::string` object is required:
\snippet tutorial-npz.cpp Read_string


\note
In the previous example, there is no need to save a "null-terminated" character since it is handled at reading using
a specific constructor which uses iterators to the begenning and ending of the `string` data.
Additional information can be found <a href="https://stackoverflow.com/a/45491652">here</a>.
The other approach would consist to
- append the null character "\0" to the vector: "vec_save_string.push_back(`\0`);"
- and uses the constructor that accepts a pointer of data: "std::string read_string(arr_string_data.data<char>());"


\subsection tuto-npz-hands-on-save-basic How to save basic data types

Saving C++ basic data type such as `int32_t`, `float` or even `std::complex<double>` is straightforward:

\snippet tutorial-npz.cpp Save_basic_types

Reading back the data can be done easily:

\snippet tutorial-npz.cpp Read_basic_types


\subsection tuto-npz-hands-on-save-img How to save a vpImage

Finally, one of the advantages of the `NPZ` is the possibility to save multi-dimensional arrays easily.
As an example, we will save first a `vpImage<vpRGBa>`.

Following code shows how to read an image:

\snippet tutorial-npz.cpp Save_image_read

Then, saving a color image can be achieved as easily as:

\snippet tutorial-npz.cpp Save_image

We have passed the address to the bitmap array, that is a vector of `vpRGBa`. The shape of the array is thus
"height x width" since all basic elements of the bitmap are already of `vpRGBa` type (4 `unsigned char` elements).

Reading back the image is done with:

\snippet tutorial-npz.cpp Read_image

The `vpImage` constructor accepting a `vpRGBa` pointer is used, with the appropriate image height and width values.

Finally, the image is displayed.


\subsection tuto-npz-hands-on-save-multi How to save a multi-dimensional array

Similarly, the following code shows how to save a multi-dimensional array with a shape corresponding to `{H x W x 3}`:

\snippet tutorial-npz.cpp Save_multi_array

Finally, the image can be read back and displayed with:

\snippet tutorial-npz.cpp Read_multi_array

A specific conversion from `RGB` to `RGBa` must be done for compatibility with the ViSP `vpRGBa` format.

*/
111 changes: 111 additions & 0 deletions doc/tutorial/python/tutorial-read-write-NPZ-format.dox
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
/**

\page tutorial-read-write-NPZ-format Tutorial: NumPy NPY/NPZ file format for reading/writing large arrays of data

\tableofcontents

\section tutorial-read-write-NPZ-format-intro Introduction

\note
Please refer to the <a href="tutorial-npz.html">C++ tutorial</a> for an overview of the NPZ format and a quick usage
from a C++ point of view.


NumPy offers the possibility to save and read arrays of data in binary format. This is an alternative to the NumPy
<a href="https://numpy.org/doc/stable/reference/generated/numpy.savetxt.html">`numpy.savetxt`</a> function, which
allows the user to save 1D/2D arrays in plain text.

The NPY format, and the NPZ format which is a collection of NPY data zipped into a single file, offer the following
advantages:
- easy usage with <a href="https://numpy.org/doc/stable/reference/generated/numpy.savez.html">`numpy.savez`</a>
and <a href="https://numpy.org/doc/stable/reference/generated/numpy.load.html">`numpy.load`</a> for saving and reading
arrays of data,
- binary format contrary to using a plain text file, which reduces the file size,
- no data loss when saving, this can be problematic when dealing with floating-point numbers,
- easy access to the different saved variables since the returned loaded object is a dictionnary.

In contrary, the main disadvantages are:
- it is a non-human readable format,
- it is meant to be use with arrays of basic data type, hierarchical structures of data are not suitables for instance.

\note
You can refer to this Wikipedia page for an exhaustive comparison of
<a href="https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats">data-serialization formats</a>.
\n You can refer to the following page for a more thorough description of the
<a href="https://numpy.org/doc/stable/reference/generated/numpy.lib.format.html">NPY format</a>.


\section tutorial-read-write-NPZ-format-examples Examples

\subsection tutorial-read-write-NPZ-format-examples-quick Quick overview in Python

The following code snippet illustrates how to save 1-D vector, multi-dimensional array and append data to file.

\code{.py}

#! python3
# -*- coding: utf-8 -*-
import numpy as np
from tempfile import TemporaryFile
import matplotlib.pyplot as plt

def main():
# https://numpy.org/doc/stable/reference/generated/numpy.savez.html
outfile = TemporaryFile()
x_vec = np.arange(10)
sin_x = np.sin(x_vec)
np.savez(outfile, x=x_vec, y=sin_x)

_ = outfile.seek(0) # Only needed here to simulate closing & reopening file
npzfile = np.load(outfile)
print(f"npzfile.filesz: {npzfile.files}")

# append data to the file: https://stackoverflow.com/a/71183105
img = np.random.randint(low=0, high=256, size=(48, 64, 3), dtype=np.uint8)
print(f"img: {img.shape}")
data_dict = dict(npzfile)
data_dict["img"] = img
np.savez(outfile, **data_dict)

_ = outfile.seek(0) # Only needed here to simulate closing & reopening file
npzfile = np.load(outfile)
print(f"npzfile.filesz: {npzfile.files}")

plt.imshow(npzfile["img"])
plt.show()

if __name__ == '__main__':
main()

\endcode


\subsection tutorial-read-write-NPZ-format-examples-realsense Demo: read and display data from RealSense sensors

In this demo, we will first use \ref example/device/framegrabber/saveRealSenseData.cpp "saveRealSenseData.cpp" to
save data on disk:
- save "[-s]" color "[-c]" infrared "[-i]" depth "[-d]" and pointcloud "[-p]" data on disk:
- \code ./saveRealSenseData -s -c -i -d -p \endcode
- use "[-e <pattern>]" to specify the filename pattern:
- \code ./saveRealSenseData -s -c -i -d -p -e %06d \endcode
- use "[-o <output folder>]" to specify the output folder, a folder with the current timestamp will be
automatically created inside it:
- \code ./saveRealSenseData -s -c -i -d -p -o output_dir \endcode
- use "[-C]" to save data on user click:
- \code ./saveRealSenseData -s -c -i -d -p -C \endcode
- use "[-f <fps>]" to specify the acquisition framerate:
- \code ./saveRealSenseData -s -c -i -d -p -f 60 \endcode
- use "[-b]" to force depth and pointcloud data to be saved in little-endian binary format:
- \code ./saveRealSenseData -s -c -i -d -p -b \endcode
- use "[-z]" to save pointcloud data in NumPy NPZ format (if this option is not passed and ViSP is not built with
the PCL library as dependency, the NPZ format is used by default, unless the "[-b]" option is passed):
- \code ./saveRealSenseData -s -c -i -d -p -z \endcode

\note
Saving pointcloud data is very time consuming. If you need acquisition data to be as close as possible to the camera
framerate, you can save instead the depth data and compute the 3D pointcloud later using the stereo-camera parameters.

Then, you can use the PlotRGBIrDepthData.py Python script to display the data:
- \code python3 PlotRGBIrDepthData.py -i <folder> \endcode

*/
1 change: 1 addition & 0 deletions doc/tutorial/tutorial-users.dox
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ This page introduces the user to other tools that may be useful.
- \subpage tutorial-json <br>This tutorial explains how to read and save data in the portable JSON format. It focuses on saving the data generated by a visual servoing experiment and exporting it to Python in order to generate plots.
- \subpage tutorial-synthetic-blenderproc <br> This tutorial shows you how to easily generate synthetic data from the 3D model of an object and obtain various modalities. This data can then be used to train a neural network for your own task.
- \subpage tutorial-spc <br> This tutorial shows you how to monitor if a signal is "in control" using Statistical Process Control methods.
- \subpage tutorial-npz <br> This tutorial shows you how to read / save arrays of data from / to NPZ file format, a binary format compatible with the NumPy library.
*/

/*! \page tutorial_munkres Munkres Assignment Algorithm
Expand Down
3 changes: 3 additions & 0 deletions doc/tutorial/tutorial_python.dox
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,7 @@
This page introduces the user to the way to exploit ViSP with Python.

- \subpage tutorial-install-python-bindings <br>In this tutorial you will learn how to install ViSP Python bindings.

- \subpage tutorial-read-write-NPZ-format <br>In this tutorial you will learn how to use NumPy NPY/NPZ file format
to read and save large arrays of data.
*/

0 comments on commit f1df5cb

Please sign in to comment.