Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerical precision issue of operators selu, leakyRelu, softplus and their corresponding backward operators on Bfloat16 vs float32 #67440

Open
wzzll123 opened this issue May 13, 2024 · 2 comments
Assignees
Labels
comp:ops OPs related issues TF 2.16 type:bug Bug

Comments

@wzzll123
Copy link

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

tf 2.16.1

Custom code

Yes

OS platform and distribution

No response

Mobile device

No response

Python version

3.10.9

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

I'd like to bring to attention an issue concerning the numerical precision of several operators (selu, leaky, relu) when operating on Bfloat16 versus float32 data types. I conducted comparisons using 20,000 random tensors for these operators, assessing the outputs in both Bfloat16 and float32 and computing the discrepancies. My observations indicate that differences generated by TensorFlow are generally more pronounced compared to PyTorch. Particularly noteworthy is the significant error produced by the SeluGrad operator. The results are summarized in the table below:

Operator TensorFlow PyTorch
selu 0.24918 0.12243
leakyrelu 0.01875 0.00094
softplus 0.05488 0.01554
seluGrad 10.41794 0.12406
leakyreluGrad 0.01875 0.00094
softplusGrad 0.13502 0.12484

In a standalone code to reproduce the issue, I provide illustrative instances for seluGrad operators, where the output discrepancy between Bfloat16 and float32 can be as high as 10.4.

Standalone code to reproduce the issue

import tensorflow as tf
import numpy as np

features = tf.convert_to_tensor(np.array([-0.00112915]), dtype=tf.float32)
gradients = tf.convert_to_tensor(np.array([-14.6875]), dtype=tf.float32)

x = tf.Variable(features)
k = tf.constant(gradients)
with tf.GradientTape(persistent=True) as tape:
    y = tf.nn.selu(features=x)
    z = k*y
    fianl = tf.reduce_mean(z)
    
print('float32 gradient:',tape.gradient(z, x))


features = tf.cast(features, dtype=tf.bfloat16)
gradients = tf.cast(gradients, dtype=tf.bfloat16)
x = tf.Variable(features)
k = tf.constant(gradients)
with tf.GradientTape(persistent=True) as tape:
    y = tf.nn.selu(features=x)
    z = k*y
    fianl = tf.reduce_mean(z)
print('float16 gradient:',tape.gradient(z, x))

Relevant log output

float32 gradient: tf.Tensor([-25.792944], shape=(1,), dtype=float32)
bfloat16 gradient: tf.Tensor([-15.375], shape=(1,), dtype=bfloat16)
@SuryanarayanaY
Copy link
Collaborator

Hi @wzzll123 ,

I have tested the code with tf-nightly and replicated the precisional differences wrt float32 and bfloat16. Attached gist for reference. There seems to be an issue.

@wzzll123
Copy link
Author

Hi @SuryanarayanaY , thanks for replying.

The precision discrepancies may stem from the fact that activation operators' computations don't incorporate precision improvement. Similar issues have arisen in PyTorch, which were fixed by using precision improvement. Perhaps TensorFlow could address this issue similarly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:ops OPs related issues TF 2.16 type:bug Bug
Projects
None yet
Development

No branches or pull requests

2 participants