GPU TPC: improved GPU TPC track-model decoding #13122

cima22 · 2024-05-08T15:55:48Z

Hides DMA transfer latencies in GPU TPC track-model decoding, pipelining data transfers and kernel calls to multiple streams.

…reams, unattached clusters input transfer in separate stream

tunings

…rs per kernel call

github-actions · 2024-05-08T15:55:59Z

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass3
async-2023-pbpb-apass4
async-2022-pp-apass6-2023-PbPb-apass2
async-2022-pp-apass4
async-2022-pp-apass4-accepted
async-2022-pp-apass6-2023-PbPb-apass2-accepted
async-2023-pbpb-apass3-accepted
async-2023-pbpb-apass4-accepted
async-2023-pp-apass4
async-2023-pp-apass4-accepted

davidrohr

looks good, just two comments for possible future improvements

davidrohr · 2024-05-27T11:50:28Z

GPU/GPUTracking/Global/GPUChainTrackingCompression.cxx

    }
    mIOPtrs.clustersNative = mClusterNativeAccess.get();
    mClusterNativeAccess->clustersLinear = mInputsHost->mPclusterNativeOutput;
    mClusterNativeAccess->setOffsetPtrs();

-    runKernel<GPUTPCDecompressionKernels, GPUTPCDecompressionKernels::step1unattached>(GetGridAuto(inputStream));
+    unsigned int batchSize = doGPU ? 6 : NSLICES;
+    for (unsigned int iSlice = 0; iSlice < NSLICES; iSlice = iSlice + batchSize) {


You could do an outer OMP loop on the CPU, and set a nested OMP nThreads for the inner loop that is used for the kernel, as done here:

AliceO2/GPU/GPUTracking/Global/GPUChainTrackingClusterizer.cxx

Line 777 in b9f0dfc

GPUCA_OPENMP(parallel for if(!doGPU && GetProcessingSettings().ompKernels != 1) num_threads(mRec->SetAndGetNestedLoopOmpFactor(!doGPU, GetProcessingSettings().nTPCClustererLanes)))

davidrohr · 2024-05-27T11:51:38Z

GPU/GPUTracking/Global/GPUChainTrackingCompression.cxx

+    bool toGPU = true;
+    runKernel<GPUMemClean16>({GetGridAutoStep(inputStream, RecoStep::TPCDecompression), krnlRunRangeNone, &mEvents->init}, DecompressorShadow.mNativeClustersIndex, NSLICES * GPUCA_ROW_COUNT * sizeof(DecompressorShadow.mNativeClustersIndex[0]));
+    std::exclusive_scan(cmprClsHost.nTrackClusters, cmprClsHost.nTrackClusters + cmprClsHost.nTracks, Decompressor.mAttachedClustersOffsets, 0u); // computing clusters offsets for first kernel
+    int nStreams = doGPU ? mRec->NStreams() - 1 : 1;


nStreams here could perhaps depend on the data size. For very small cases, one might want to use less than nStreams() - 1, or even just one.
Perhaps, you also want std::max(1, NStreams() - 1), just in case nStreams could be 1 one a GPU model (which is currently not the case).

cima22 added 4 commits May 8, 2024 16:34

GPU TPC: Decompression: attached clusters decoding run in multiple st…

8c3bd93

…reams, unattached clusters input transfer in separate stream

GPU TPC: Decompression: scattered DMA transfer to host

aadeb98

tunings

GPU TPC: Decompression: unattached clusters kernels on multiple streams

e709f3a

GPU TPC: Decompression: unattached clusters kernels handle more secto…

a371168

…rs per kernel call

cima22 requested a review from davidrohr as a code owner May 8, 2024 15:55

davidrohr approved these changes May 27, 2024

View reviewed changes

davidrohr merged commit 6572437 into AliceO2Group:dev May 27, 2024
14 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU TPC: improved GPU TPC track-model decoding #13122

GPU TPC: improved GPU TPC track-model decoding #13122

cima22 commented May 8, 2024

github-actions bot commented May 8, 2024

davidrohr left a comment

davidrohr May 27, 2024

davidrohr May 27, 2024

GPU TPC: improved GPU TPC track-model decoding #13122

GPU TPC: improved GPU TPC track-model decoding #13122

Conversation

cima22 commented May 8, 2024

github-actions bot commented May 8, 2024

davidrohr left a comment

Choose a reason for hiding this comment

davidrohr May 27, 2024

Choose a reason for hiding this comment

davidrohr May 27, 2024

Choose a reason for hiding this comment