You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
binary
TensorFlow version
tf 2.16.1
Custom code
Yes
OS platform and distribution
No response
Mobile device
No response
Python version
3.10.9
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
I'd like to bring to attention an issue concerning the numerical precision of several operators (selu, leaky, relu) when operating on Bfloat16 versus float32 data types. I conducted comparisons using 20,000 random tensors for these operators, assessing the outputs in both Bfloat16 and float32 and computing the discrepancies. My observations indicate that differences generated by TensorFlow are generally more pronounced compared to PyTorch. Particularly noteworthy is the significant error produced by the SeluGrad operator. The results are summarized in the table below:
Operator
TensorFlow
PyTorch
selu
0.24918
0.12243
leakyrelu
0.01875
0.00094
softplus
0.05488
0.01554
seluGrad
10.41794
0.12406
leakyreluGrad
0.01875
0.00094
softplusGrad
0.13502
0.12484
In a standalone code to reproduce the issue, I provide illustrative instances for seluGrad operators, where the output discrepancy between Bfloat16 and float32 can be as high as 10.4.
Standalone code to reproduce the issue
import tensorflow as tf
import numpy as np
features = tf.convert_to_tensor(np.array([-0.00112915]), dtype=tf.float32)
gradients = tf.convert_to_tensor(np.array([-14.6875]), dtype=tf.float32)
x = tf.Variable(features)
k = tf.constant(gradients)
with tf.GradientTape(persistent=True) as tape:
y = tf.nn.selu(features=x)
z = k*y
fianl = tf.reduce_mean(z)
print('float32 gradient:',tape.gradient(z, x))
features = tf.cast(features, dtype=tf.bfloat16)
gradients = tf.cast(gradients, dtype=tf.bfloat16)
x = tf.Variable(features)
k = tf.constant(gradients)
with tf.GradientTape(persistent=True) as tape:
y = tf.nn.selu(features=x)
z = k*y
fianl = tf.reduce_mean(z)
print('float16 gradient:',tape.gradient(z, x))
I have tested the code with tf-nightly and replicated the precisional differences wrt float32 and bfloat16. Attached gist for reference. There seems to be an issue.
The precision discrepancies may stem from the fact that activation operators' computations don't incorporate precision improvement. Similar issues have arisen in PyTorch, which were fixed by using precision improvement. Perhaps TensorFlow could address this issue similarly?
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
binary
TensorFlow version
tf 2.16.1
Custom code
Yes
OS platform and distribution
No response
Mobile device
No response
Python version
3.10.9
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
I'd like to bring to attention an issue concerning the numerical precision of several operators (selu, leaky, relu) when operating on Bfloat16 versus float32 data types. I conducted comparisons using 20,000 random tensors for these operators, assessing the outputs in both Bfloat16 and float32 and computing the discrepancies. My observations indicate that differences generated by TensorFlow are generally more pronounced compared to PyTorch. Particularly noteworthy is the significant error produced by the SeluGrad operator. The results are summarized in the table below:
In a standalone code to reproduce the issue, I provide illustrative instances for seluGrad operators, where the output discrepancy between Bfloat16 and float32 can be as high as 10.4.
Standalone code to reproduce the issue
Relevant log output
The text was updated successfully, but these errors were encountered: