We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
This is for YOCO/yoco/models/decoder/kernel/gate_recurrent.py
I assumed that this code is aimed to accelerate some calculation by triton.
after
python3 gate_recurrent.py
I got some printout:
naive time: 0.04773402214050293 triton time: 0.5681734085083008 False tensor(0.0078, device='cuda:0', dtype=torch.float16, grad_fn=<MaxBackward1>) tensor(0.0001, device='cuda:0', dtype=torch.float16, grad_fn=<MeanBackward0>) False tensor(0.0078, device='cuda:0', dtype=torch.float16) tensor(0.0001, device='cuda:0', dtype=torch.float16) False tensor(0.0078, device='cuda:0', dtype=torch.float16) tensor(0.0002, device='cuda:0', dtype=torch.float16) False
It seems that triton takes more time than naive?
The text was updated successfully, but these errors were encountered:
donglixp
No branches or pull requests
This is for YOCO/yoco/models/decoder/kernel/gate_recurrent.py
I assumed that this code is aimed to accelerate some calculation by triton.
after
I got some printout:
It seems that triton takes more time than naive?
The text was updated successfully, but these errors were encountered: