Expose add_scalar(ndarray) #643

cowwoc · 2021-09-12T14:22:54Z

I'd like to log an entire array to tensorboard. This is supported by https://tensorboardx.readthedocs.io/en/latest/tensorboard.html#tensorboardX.SummaryWriter.add_scalar but is not exposed by tensorboardX. Can you please expose this functionality?

The workaround of invoking add_scalars() multiple times in a for loop is so slow it is unusable. It takes minutes to plot data that should take milliseconds.

The text was updated successfully, but these errors were encountered:

lanpa · 2021-09-13T12:09:14Z

What is the typical size of your data? Sometimes the slowness is caused by tensorboard itself. Another question is how do we define the global_step (the x-axis value of a point) if an entire array is passed to the add_numpy_array()? Should that be implicitly inferred from each element's order?

cowwoc · 2021-09-13T15:38:26Z

@lanpa I just realized that I filed this bug report against the wrong codebase :) I thought that pytorch uses this library under the hood but I see now that it uses torch.utils.tensorboard. I looked at

tensorboardX/tensorboardX/writer.py

Line 416 in 054f1f3

def add_scalar(

but I could not figure out how https://tensorboardx.readthedocs.io/en/latest/tensorboard.html#tensorboardX.SummaryWriter.add_scalar logs an ndarray to tensorboard. As far as I can tell

tensorboardX/tensorboardX/writer.py

Line 457 in 054f1f3

    
           scalar(tag, scalar_value, display_name, summary_description), global_step, walltime)

forces the value to be a float or a single-dimensional ndarray.

I'm going to explain what I am trying to do in case you are aware of a better way to do it.

I am trying to predict a time series consisting of 300 points per sample. I want to plot the predicted vs target output of each sample to tensorboard every validation step so I can visually inspect how predictions improve over time.

I've got ~2500 samples in my validation set so I want to log 300 * 2500 = 750000 points. Currently, it takes ~3 seconds per 10 samples I log. Since I cannot plot an entire sample at a time, I am force to plot a single point of a sample at a time as follows:

        for index, tensor in enumerate(actual_predictions[:10]):
            for x, y in enumerate(tensor):
                self.logger.experiment.add_scalars(f"predictions/{index}",
                                                   {f"val_actual/epoch/{self.current_epoch}": y},
                                                   global_step=x)

Ideally, I want to invoke:

        for index, tensor in enumerate(actual_predictions[:10]):
                self.logger.experiment.add_scalars(f"predictions/{index}",
                                                   {f"val_actual": tensor},
                                                   global_step=self.current_epoch)

and have tensorboard plot the entire tensor to its own entry in the "Time Series" tab. Any ideas?

lanpa · 2021-09-16T16:50:00Z

lanpa · 2021-09-16T16:59:18Z

Hi, if I understand correctly, the data you want to plot has four dimension: 1. time 2. value at that time 3. different samples 4. different training epoch. As far as I know, TensorBoard's scalar can show you each trace in the x-z slice in a plot. And if you tag the plots correctly, you can overlay several trace in one plot. In your case, if one trace in a plot is the predicted time series, the other trace should be its corresponding ground truth, and for each validation epoch, you have ~2500 plots to look at, correct? Exposing the interface and write hundreds of points at one time to TensorBoard event file is easy, I think the problem is how you look through so many visualized data.

I looked around the "Time Series" plot, look like those visualization is very similar to the visualization of the ordinary "scalar" plot.

lanpa · 2021-09-16T17:13:31Z

def add_numpy_array(tag, numpy_array_of_length_larger_than_one):
  pass

So I think the exposed function should infer the global_step implicitly by counting the numpy array.

cowwoc · 2021-09-17T03:33:41Z

@lanpa Your design doesn't match what I had in mind. I don't want a single plot to compare the performance of different samples. Instead, I want to:

Evaluate the performance of the model for a specific sample at a specific epoch.
I want to compare the performance of a single sample across epochs to see whether predictions improve over time (and how fast).

Visually, I think I want a graph similar to:

without the gray line.

For each sample (time series) I want the x-axis to denote a time, the y-axis denotes the value at time x.
Each plot compares the expected vs predicted values.
I would have one plot per sample per epoch.

Maybe there is a better way to represent this visually but this is what I had in mind. I've already got this working in Tensorboard but plotting is extremely slow.

Yes, you can infer global_step implicitly. That said, I don't see how you could implement the above function. As far as I can see, there is no way to plot an entire array in tensorboard. The only way I found is plotting one point at a time which is very slow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose add_scalar(ndarray) #643

Expose add_scalar(ndarray) #643

cowwoc commented Sep 12, 2021

lanpa commented Sep 13, 2021

cowwoc commented Sep 13, 2021

lanpa commented Sep 16, 2021

lanpa commented Sep 16, 2021 •

edited

lanpa commented Sep 16, 2021

cowwoc commented Sep 17, 2021

Expose add_scalar(ndarray) #643

Expose add_scalar(ndarray) #643

Comments

cowwoc commented Sep 12, 2021

lanpa commented Sep 13, 2021

cowwoc commented Sep 13, 2021

lanpa commented Sep 16, 2021

lanpa commented Sep 16, 2021 • edited

lanpa commented Sep 16, 2021

cowwoc commented Sep 17, 2021

lanpa commented Sep 16, 2021 •

edited