-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance, WIP] Faster SAC #1958
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1958
Note: Links to docs will display an error until the docs builds have been completed. ❌ 13 New Failures, 1 Unrelated FailureAs of commit 3923bc2 with merge base 492091a (): NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_single | 63.3136ms | 62.3617ms | 16.0355 Ops/s | 16.1287 Ops/s | |
test_sync | 34.7507ms | 33.9539ms | 29.4517 Ops/s | 30.2222 Ops/s | |
test_async | 54.7522ms | 31.8410ms | 31.4061 Ops/s | 30.3396 Ops/s | |
test_simple | 0.5147s | 0.4492s | 2.2260 Ops/s | 2.3177 Ops/s | |
test_transformed | 0.6481s | 0.5928s | 1.6870 Ops/s | 1.7210 Ops/s | |
test_serial | 1.5109s | 1.4496s | 0.6899 Ops/s | 0.6929 Ops/s | |
test_parallel | 1.4889s | 1.4214s | 0.7036 Ops/s | 0.7154 Ops/s | |
test_step_mdp_speed[True-True-True-True-True] | 0.2132ms | 21.0614μs | 47.4803 KOps/s | 47.3464 KOps/s | |
test_step_mdp_speed[True-True-True-True-False] | 44.4630μs | 13.0363μs | 76.7086 KOps/s | 76.7211 KOps/s | |
test_step_mdp_speed[True-True-True-False-True] | 55.2930μs | 12.4198μs | 80.5165 KOps/s | 80.6801 KOps/s | |
test_step_mdp_speed[True-True-True-False-False] | 0.1192ms | 7.7825μs | 128.4930 KOps/s | 133.9221 KOps/s | |
test_step_mdp_speed[True-True-False-True-True] | 80.2100μs | 22.4400μs | 44.5633 KOps/s | 44.5725 KOps/s | |
test_step_mdp_speed[True-True-False-True-False] | 61.0650μs | 14.1236μs | 70.8034 KOps/s | 70.2443 KOps/s | |
test_step_mdp_speed[True-True-False-False-True] | 40.4460μs | 13.6073μs | 73.4901 KOps/s | 73.3795 KOps/s | |
test_step_mdp_speed[True-True-False-False-False] | 34.8850μs | 8.6858μs | 115.1299 KOps/s | 115.3467 KOps/s | |
test_step_mdp_speed[True-False-True-True-True] | 60.8140μs | 23.7955μs | 42.0248 KOps/s | 42.0212 KOps/s | |
test_step_mdp_speed[True-False-True-True-False] | 45.8460μs | 15.4878μs | 64.5668 KOps/s | 64.1428 KOps/s | |
test_step_mdp_speed[True-False-True-False-True] | 65.3940μs | 13.5961μs | 73.5503 KOps/s | 73.1392 KOps/s | |
test_step_mdp_speed[True-False-True-False-False] | 42.0490μs | 8.7865μs | 113.8110 KOps/s | 116.0894 KOps/s | |
test_step_mdp_speed[True-False-False-True-True] | 84.4580μs | 25.0620μs | 39.9011 KOps/s | 40.2385 KOps/s | |
test_step_mdp_speed[True-False-False-True-False] | 43.1100μs | 16.5471μs | 60.4337 KOps/s | 60.7340 KOps/s | |
test_step_mdp_speed[True-False-False-False-True] | 70.4620μs | 14.7481μs | 67.8053 KOps/s | 68.1263 KOps/s | |
test_step_mdp_speed[True-False-False-False-False] | 45.3440μs | 9.8687μs | 101.3304 KOps/s | 102.4897 KOps/s | |
test_step_mdp_speed[False-True-True-True-True] | 75.6410μs | 23.8951μs | 41.8495 KOps/s | 42.0234 KOps/s | |
test_step_mdp_speed[False-True-True-True-False] | 64.0300μs | 15.5364μs | 64.3648 KOps/s | 63.7064 KOps/s | |
test_step_mdp_speed[False-True-True-False-True] | 57.6480μs | 16.1735μs | 61.8296 KOps/s | 62.8367 KOps/s | |
test_step_mdp_speed[False-True-True-False-False] | 41.4270μs | 9.9448μs | 100.5549 KOps/s | 101.0372 KOps/s | |
test_step_mdp_speed[False-True-False-True-True] | 49.2720μs | 25.7035μs | 38.9052 KOps/s | 39.5274 KOps/s | |
test_step_mdp_speed[False-True-False-True-False] | 72.6160μs | 16.6261μs | 60.1463 KOps/s | 60.3409 KOps/s | |
test_step_mdp_speed[False-True-False-False-True] | 69.3700μs | 17.2078μs | 58.1133 KOps/s | 58.9463 KOps/s | |
test_step_mdp_speed[False-True-False-False-False] | 55.7650μs | 11.1837μs | 89.4157 KOps/s | 91.3082 KOps/s | |
test_step_mdp_speed[False-False-True-True-True] | 88.9670μs | 26.3782μs | 37.9101 KOps/s | 38.3077 KOps/s | |
test_step_mdp_speed[False-False-True-True-False] | 43.9430μs | 17.9520μs | 55.7040 KOps/s | 55.4219 KOps/s | |
test_step_mdp_speed[False-False-True-False-True] | 78.4170μs | 17.0489μs | 58.6548 KOps/s | 58.8033 KOps/s | |
test_step_mdp_speed[False-False-True-False-False] | 51.6270μs | 11.2141μs | 89.1736 KOps/s | 91.2309 KOps/s | |
test_step_mdp_speed[False-False-False-True-True] | 78.3670μs | 27.3497μs | 36.5635 KOps/s | 36.9461 KOps/s | |
test_step_mdp_speed[False-False-False-True-False] | 70.6230μs | 19.0296μs | 52.5496 KOps/s | 52.9155 KOps/s | |
test_step_mdp_speed[False-False-False-False-True] | 46.5670μs | 18.0980μs | 55.2547 KOps/s | 55.5382 KOps/s | |
test_step_mdp_speed[False-False-False-False-False] | 70.7620μs | 12.2561μs | 81.5923 KOps/s | 83.1106 KOps/s | |
test_values[generalized_advantage_estimate-True-True] | 12.1009ms | 9.4415ms | 105.9152 Ops/s | 108.2171 Ops/s | |
test_values[vec_generalized_advantage_estimate-True-True] | 38.8651ms | 35.4655ms | 28.1964 Ops/s | 28.3041 Ops/s | |
test_values[td0_return_estimate-False-False] | 0.2426ms | 0.1905ms | 5.2483 KOps/s | 5.8080 KOps/s | |
test_values[td1_return_estimate-False-False] | 26.1736ms | 23.5475ms | 42.4674 Ops/s | 42.3752 Ops/s | |
test_values[vec_td1_return_estimate-False-False] | 38.3187ms | 35.7191ms | 27.9962 Ops/s | 28.1771 Ops/s | |
test_values[td_lambda_return_estimate-True-False] | 37.1314ms | 33.7970ms | 29.5884 Ops/s | 29.9753 Ops/s | |
test_values[vec_td_lambda_return_estimate-True-False] | 38.7340ms | 35.6116ms | 28.0807 Ops/s | 28.2432 Ops/s | |
test_gae_speed[generalized_advantage_estimate-False-1-512] | 8.3967ms | 8.0123ms | 124.8087 Ops/s | 125.0718 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 2.1196ms | 1.9108ms | 523.3458 Ops/s | 537.1060 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.5463ms | 0.3515ms | 2.8450 KOps/s | 2.8874 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 41.7600ms | 40.4430ms | 24.7262 Ops/s | 21.9625 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 3.8006ms | 3.0247ms | 330.6157 Ops/s | 329.9328 Ops/s | |
test_dqn_speed | 75.1670ms | 1.5054ms | 664.2842 Ops/s | 700.4505 Ops/s | |
test_ddpg_speed | 3.5141ms | 2.7851ms | 359.0581 Ops/s | 355.6452 Ops/s | |
test_sac_speed | 7.9146ms | 6.6538ms | 150.2896 Ops/s | 120.5107 Ops/s | |
test_redq_speed | 14.9687ms | 13.4668ms | 74.2566 Ops/s | 75.6784 Ops/s | |
test_redq_deprec_speed | 15.8664ms | 13.8909ms | 71.9895 Ops/s | 75.3803 Ops/s | |
test_td3_speed | 8.6492ms | 8.3911ms | 119.1735 Ops/s | 119.2888 Ops/s | |
test_cql_speed | 46.7520ms | 37.6100ms | 26.5886 Ops/s | 27.0084 Ops/s | |
test_a2c_speed | 9.0478ms | 7.6515ms | 130.6942 Ops/s | 129.8356 Ops/s | |
test_ppo_speed | 8.8246ms | 7.9917ms | 125.1293 Ops/s | 125.8436 Ops/s | |
test_reinforce_speed | 7.0654ms | 6.7056ms | 149.1297 Ops/s | 143.7766 Ops/s | |
test_iql_speed | 34.3134ms | 32.7320ms | 30.5512 Ops/s | 30.0442 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 2.9270ms | 2.2424ms | 445.9490 Ops/s | 418.4807 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.6214ms | 0.4961ms | 2.0159 KOps/s | 1.9791 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 3.6479ms | 0.4763ms | 2.0993 KOps/s | 2.0709 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.4090ms | 2.1959ms | 455.4040 Ops/s | 434.1561 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.8430ms | 0.4910ms | 2.0368 KOps/s | 2.0072 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6379ms | 0.4664ms | 2.1440 KOps/s | 2.0768 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.3908ms | 2.3507ms | 425.4044 Ops/s | 420.0111 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.9816ms | 0.6084ms | 1.6438 KOps/s | 1.6223 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7899ms | 0.5879ms | 1.7009 KOps/s | 1.6766 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 2.4708ms | 2.2274ms | 448.9460 Ops/s | 447.9141 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 91.7842ms | 0.5873ms | 1.7027 KOps/s | 1.9895 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6263ms | 0.4708ms | 2.1239 KOps/s | 1.6737 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.8062ms | 2.2424ms | 445.9411 Ops/s | 418.1765 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.6603ms | 0.4916ms | 2.0342 KOps/s | 1.9826 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 3.7952ms | 0.4744ms | 2.1080 KOps/s | 2.0397 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.0084ms | 2.3720ms | 421.5805 Ops/s | 402.3527 Ops/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.8866ms | 0.6123ms | 1.6332 KOps/s | 1.3287 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 3.7089ms | 0.5881ms | 1.7004 KOps/s | 1.6522 KOps/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1070s | 5.6916ms | 175.6985 Ops/s | 177.9815 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 14.3034ms | 11.8043ms | 84.7148 Ops/s | 80.8783 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 1.7175ms | 1.0112ms | 988.9444 Ops/s | 917.7412 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 91.1024ms | 7.1327ms | 140.1988 Ops/s | 132.7681 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 14.4478ms | 11.7311ms | 85.2435 Ops/s | 81.1391 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 1.4960ms | 1.0044ms | 995.6560 Ops/s | 963.6852 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 87.8195ms | 5.6324ms | 177.5448 Ops/s | 127.2357 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 96.2808ms | 13.7930ms | 72.5005 Ops/s | 78.5341 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 1.4209ms | 1.3127ms | 761.7985 Ops/s | 695.8720 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_single | 0.1137s | 0.1129s | 8.8577 Ops/s | 8.9874 Ops/s | |
test_sync | 95.9671ms | 95.5702ms | 10.4635 Ops/s | 10.4553 Ops/s | |
test_async | 0.1791s | 90.5864ms | 11.0392 Ops/s | 10.9200 Ops/s | |
test_single_pixels | 0.2015s | 0.1373s | 7.2810 Ops/s | 7.8124 Ops/s | |
test_sync_pixels | 80.7839ms | 79.5872ms | 12.5648 Ops/s | 12.6943 Ops/s | |
test_async_pixels | 0.1406s | 71.0142ms | 14.0817 Ops/s | 13.7751 Ops/s | |
test_simple | 0.8433s | 0.8366s | 1.1953 Ops/s | 1.2306 Ops/s | |
test_transformed | 1.0568s | 1.0499s | 0.9525 Ops/s | 0.9701 Ops/s | |
test_serial | 2.5563s | 2.5140s | 0.3978 Ops/s | 0.4272 Ops/s | |
test_parallel | 2.1236s | 2.0925s | 0.4779 Ops/s | 0.4870 Ops/s | |
test_step_mdp_speed[True-True-True-True-True] | 0.1114ms | 33.5262μs | 29.8274 KOps/s | 30.3164 KOps/s | |
test_step_mdp_speed[True-True-True-True-False] | 93.7710μs | 19.8922μs | 50.2710 KOps/s | 49.9213 KOps/s | |
test_step_mdp_speed[True-True-True-False-True] | 36.7100μs | 18.7915μs | 53.2156 KOps/s | 52.6518 KOps/s | |
test_step_mdp_speed[True-True-True-False-False] | 37.5500μs | 11.2829μs | 88.6294 KOps/s | 89.0560 KOps/s | |
test_step_mdp_speed[True-True-False-True-True] | 53.9700μs | 34.6304μs | 28.8764 KOps/s | 28.6986 KOps/s | |
test_step_mdp_speed[True-True-False-True-False] | 40.8000μs | 21.6169μs | 46.2601 KOps/s | 45.5288 KOps/s | |
test_step_mdp_speed[True-True-False-False-True] | 87.3210μs | 20.5006μs | 48.7791 KOps/s | 47.9027 KOps/s | |
test_step_mdp_speed[True-True-False-False-False] | 32.2900μs | 13.2425μs | 75.5146 KOps/s | 75.3388 KOps/s | |
test_step_mdp_speed[True-False-True-True-True] | 58.6600μs | 36.5319μs | 27.3733 KOps/s | 26.7418 KOps/s | |
test_step_mdp_speed[True-False-True-True-False] | 46.0910μs | 23.4353μs | 42.6707 KOps/s | 41.8743 KOps/s | |
test_step_mdp_speed[True-False-True-False-True] | 36.7200μs | 20.4337μs | 48.9388 KOps/s | 48.5306 KOps/s | |
test_step_mdp_speed[True-False-True-False-False] | 28.3090μs | 13.1617μs | 75.9782 KOps/s | 76.2806 KOps/s | |
test_step_mdp_speed[True-False-False-True-True] | 0.1035ms | 38.2838μs | 26.1207 KOps/s | 25.9683 KOps/s | |
test_step_mdp_speed[True-False-False-True-False] | 48.3610μs | 25.3850μs | 39.3933 KOps/s | 39.3215 KOps/s | |
test_step_mdp_speed[True-False-False-False-True] | 44.3890μs | 22.3896μs | 44.6636 KOps/s | 44.5489 KOps/s | |
test_step_mdp_speed[True-False-False-False-False] | 30.4710μs | 15.1130μs | 66.1684 KOps/s | 66.1795 KOps/s | |
test_step_mdp_speed[False-True-True-True-True] | 53.6200μs | 36.3783μs | 27.4890 KOps/s | 26.9074 KOps/s | |
test_step_mdp_speed[False-True-True-True-False] | 52.1320μs | 23.8126μs | 41.9946 KOps/s | 42.3032 KOps/s | |
test_step_mdp_speed[False-True-True-False-True] | 46.4900μs | 24.4557μs | 40.8902 KOps/s | 40.6136 KOps/s | |
test_step_mdp_speed[False-True-True-False-False] | 39.7600μs | 15.0544μs | 66.4256 KOps/s | 64.8878 KOps/s | |
test_step_mdp_speed[False-True-False-True-True] | 63.3900μs | 39.4098μs | 25.3744 KOps/s | 25.1848 KOps/s | |
test_step_mdp_speed[False-True-False-True-False] | 85.7500μs | 25.4525μs | 39.2889 KOps/s | 38.7929 KOps/s | |
test_step_mdp_speed[False-True-False-False-True] | 43.6310μs | 26.6649μs | 37.5025 KOps/s | 38.3021 KOps/s | |
test_step_mdp_speed[False-True-False-False-False] | 40.9510μs | 16.7527μs | 59.6918 KOps/s | 60.0428 KOps/s | |
test_step_mdp_speed[False-False-True-True-True] | 64.6310μs | 40.1907μs | 24.8814 KOps/s | 24.2912 KOps/s | |
test_step_mdp_speed[False-False-True-True-False] | 44.2100μs | 27.3583μs | 36.5520 KOps/s | 36.1082 KOps/s | |
test_step_mdp_speed[False-False-True-False-True] | 41.3300μs | 26.2644μs | 38.0743 KOps/s | 37.9869 KOps/s | |
test_step_mdp_speed[False-False-True-False-False] | 31.8010μs | 16.7534μs | 59.6895 KOps/s | 58.1345 KOps/s | |
test_step_mdp_speed[False-False-False-True-True] | 69.1900μs | 41.5118μs | 24.0896 KOps/s | 23.2268 KOps/s | |
test_step_mdp_speed[False-False-False-True-False] | 0.1048ms | 29.1136μs | 34.3483 KOps/s | 33.6908 KOps/s | |
test_step_mdp_speed[False-False-False-False-True] | 48.1110μs | 27.4936μs | 36.3721 KOps/s | 35.7889 KOps/s | |
test_step_mdp_speed[False-False-False-False-False] | 32.2610μs | 18.6056μs | 53.7473 KOps/s | 53.0234 KOps/s | |
test_values[generalized_advantage_estimate-True-True] | 26.6774ms | 25.7626ms | 38.8160 Ops/s | 41.1868 Ops/s | |
test_values[vec_generalized_advantage_estimate-True-True] | 81.3992ms | 3.1925ms | 313.2324 Ops/s | 298.9929 Ops/s | |
test_values[td0_return_estimate-False-False] | 0.1035ms | 59.8672μs | 16.7036 KOps/s | 16.0368 KOps/s | |
test_values[td1_return_estimate-False-False] | 57.4423ms | 56.7939ms | 17.6075 Ops/s | 19.2721 Ops/s | |
test_values[vec_td1_return_estimate-False-False] | 2.0778ms | 1.7639ms | 566.9104 Ops/s | 570.5980 Ops/s | |
test_values[td_lambda_return_estimate-True-False] | 91.6125ms | 90.5662ms | 11.0416 Ops/s | 11.2732 Ops/s | |
test_values[vec_td_lambda_return_estimate-True-False] | 3.9813ms | 1.7884ms | 559.1715 Ops/s | 562.2791 Ops/s | |
test_gae_speed[generalized_advantage_estimate-False-1-512] | 25.6011ms | 25.2348ms | 39.6279 Ops/s | 44.1631 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 0.8660ms | 0.7310ms | 1.3679 KOps/s | 1.4629 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.7102ms | 0.6554ms | 1.5259 KOps/s | 1.5645 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.5081ms | 1.4559ms | 686.8788 Ops/s | 695.3938 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.9841ms | 0.6982ms | 1.4322 KOps/s | 1.5075 KOps/s | |
test_dqn_speed | 4.0253ms | 1.4909ms | 670.7410 Ops/s | 632.8267 Ops/s | |
test_ddpg_speed | 3.9905ms | 2.9209ms | 342.3579 Ops/s | 356.4531 Ops/s | |
test_sac_speed | 6.3411ms | 6.1225ms | 163.3322 Ops/s | 124.0976 Ops/s | |
test_redq_speed | 11.7863ms | 10.7155ms | 93.3230 Ops/s | 95.7104 Ops/s | |
test_redq_deprec_speed | 12.6486ms | 11.8784ms | 84.1863 Ops/s | 88.1863 Ops/s | |
test_td3_speed | 8.3257ms | 8.2592ms | 121.0765 Ops/s | 123.7944 Ops/s | |
test_cql_speed | 27.6618ms | 25.9894ms | 38.4772 Ops/s | 38.4951 Ops/s | |
test_a2c_speed | 5.9871ms | 5.7081ms | 175.1909 Ops/s | 175.1570 Ops/s | |
test_ppo_speed | 6.4150ms | 6.0078ms | 166.4500 Ops/s | 166.9402 Ops/s | |
test_reinforce_speed | 4.9272ms | 4.6860ms | 213.3998 Ops/s | 214.5077 Ops/s | |
test_iql_speed | 20.8964ms | 20.2761ms | 49.3192 Ops/s | 49.9575 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.0078ms | 2.9027ms | 344.5053 Ops/s | 343.3985 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.7073ms | 0.5531ms | 1.8081 KOps/s | 1.8271 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 4.2794ms | 0.5281ms | 1.8937 KOps/s | 1.9013 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.1642ms | 2.9357ms | 340.6312 Ops/s | 341.7802 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.0728ms | 0.5414ms | 1.8470 KOps/s | 1.8477 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.7078ms | 0.5222ms | 1.9149 KOps/s | 1.9210 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.1589ms | 3.0538ms | 327.4571 Ops/s | 329.4825 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.8258ms | 0.6733ms | 1.4852 KOps/s | 1.5126 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 4.3169ms | 0.6582ms | 1.5194 KOps/s | 1.5566 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.0641ms | 2.9172ms | 342.7936 Ops/s | 343.7550 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.7118ms | 0.5484ms | 1.8236 KOps/s | 1.5691 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.1054s | 0.6192ms | 1.6151 KOps/s | 1.9200 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.1334ms | 2.9522ms | 338.7316 Ops/s | 337.6772 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.6861ms | 0.5457ms | 1.8325 KOps/s | 1.8693 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 4.3423ms | 0.5282ms | 1.8931 KOps/s | 1.9469 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.1860ms | 3.0707ms | 325.6637 Ops/s | 329.4977 Ops/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.1080s | 0.7741ms | 1.2919 KOps/s | 1.5061 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8400ms | 0.6601ms | 1.5150 KOps/s | 1.5561 KOps/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1078s | 6.8764ms | 145.4248 Ops/s | 139.3695 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 16.5830ms | 14.3599ms | 69.6386 Ops/s | 69.4648 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 2.1521ms | 1.0835ms | 922.8977 Ops/s | 881.2340 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.1089s | 8.9002ms | 112.3568 Ops/s | 115.3196 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 16.3843ms | 14.3825ms | 69.5292 Ops/s | 69.6360 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 2.1838ms | 1.0747ms | 930.5148 Ops/s | 800.7408 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.1029s | 7.1323ms | 140.2081 Ops/s | 141.3490 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 0.1138s | 16.6857ms | 59.9315 Ops/s | 68.3529 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 1.5196ms | 1.3893ms | 719.7613 Ops/s | 686.3659 Ops/s |
Related issue: pytorch/pytorch#120572 |
Nice! I am not super familiar with the code here, but I am happy to help benchmark this. Let's keep in mind readability though. Ideally we would like a normal torch user to be able to read and understand the loss classes. |
@matteobettini there's a way to make vmap functional calls much much faster!
we'll need to make sure this works across the board but if it does speed up could be 2x for many losses 🤯