hpc.social admins @admin

Recent searches

Search options

Only available when logged in.

**Dr. Moritz Lehmann** @ProjectPhysX · Mar 20, 2023

Mar 20, 2023

There is a supppsed vendor-independent (but not cross-vendor) way of #GPU P2P communication in #OpenCL: have all GPUs in the same context & pass buffers to other devices' kernels.
Drivers should automatically handle P2P comm via PCIe/SLI/NVLink/CrossFire/InfFabric.
1/9

**Dr. Moritz Lehmann** @ProjectPhysX · Mar 20, 2023

Mar 20, 2023

Dr. Moritz Lehmann @ProjectPhysX

On #Nvidia GPUs (2x 2080 Ti, 4x #A100 SXM, 8x #RTX #A6000), performance is half compared to when buffers are explicitely copied via PCIe+CPU. Extra overhead probably due to VRAM re-allocation (unclear in #OpenCL migration spec). No P2P, no RDMA, no SLI/NVLink.
2/9

**Dr. Moritz Lehmann** @ProjectPhysX · Mar 20, 2023

Mar 20, 2023

Dr. Moritz Lehmann @ProjectPhysX

When running #FluidX3D with the #CUDA backend of #PoCL + P2P cudaMemcpy, performance is 40% faster compared to #OpenCL PCIe copy over CPU memory. PoCL's P2P backend is >3x faster than Nvidias own runtime here. This is the perf delta #Nvidia are giving up on.
3/9

**Dr. Moritz Lehmann** @ProjectPhysX · Mar 20, 2023

Mar 20, 2023

Dr. Moritz Lehmann @ProjectPhysX

On 8x #AMD #Instinct #MI200 #GPUs, performance with the P2P mode is unchanged compared to explicit copy over PCIe/CPU. No P2P/RDMA through PCIe/InfFabric either.
Explicit P2P copy with the clEnqueueCopyBufferP2PAMD extension is broken and produces a segfault.
4/9

**Dr. Moritz Lehmann** @ProjectPhysX · Mar 20, 2023

Mar 20, 2023

Dr. Moritz Lehmann @ProjectPhysX

With #Intel Graphics HD 4600 + i7-4720HQ, P2P mode actually is faster, 16% in that particular benchmark. The compiler is smart enough to not copy data in unified CPU/iGPU RAM.
On 2x #IntelArc A770 there is no speedup, so no P2P/RDMA either.
Have no access to test PVC.
5/9

**Dr. Moritz Lehmann** @ProjectPhysX · Mar 20, 2023

Mar 20, 2023

Dr. Moritz Lehmann @ProjectPhysX

For now, the way I initially did #FluidX3D multi-GPU communication over PCIe to CPU memory & back is still the best (&only) option. The major #GPU vendors have yet to implement/fix P2P communication in their compilers to claim that performance advantage on their platform.
6/9

**Dr. Moritz Lehmann** @ProjectPhysX · Mar 20, 2023

Mar 20, 2023

Dr. Moritz Lehmann @ProjectPhysX

The clEnqueueMigrateMemObjects mechanism in the #OpenCL spec is too vague and should be improved. It is unclear on whether memory on the target device is extra allocated/freed.
Better would be explicit data copy between existing buffers on source & target devices.
7/9

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

Credit and many thanks to Jan Solanti from Tampere University for visiting me at University of Bayreuth and testing this together with me, in his endeavour to implement/optimize #PoCL-Remote.
Thanks to @ShmarvDogg for testing P2P mode on his 2x A770 16GB "bigboi" PC!
8/9

Mar 20, 2023, 01:57 PM··Mastodon for Android

0boosts·2favorites

**Dr. Moritz Lehmann** @ProjectPhysX · Mar 20, 2023

Mar 20, 2023

Dr. Moritz Lehmann @ProjectPhysX

9/9
The source code for the experimental @FluidX3D P2P is available in this branch on #GitHub: https://github.com/ProjectPhysX/FluidX3D/tree/experimental-p2p

The PR for #PoCL with cudaMemcpy is available here: https://github.com/pocl/pocl/pull/1189

GitHubGitHub - ProjectPhysX/FluidX3D at experimental-p2pThe fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL. - GitHub - ProjectPhysX/FluidX3D at experimental-p2p

**Dr. Moritz Lehmann** @ProjectPhysX · Aug 28, 2023

Aug 28, 2023

Dr. Moritz Lehmann @ProjectPhysX

10/9
On PVC (4x GPU Max 1100), P2P transfer does not work either. Both the implicit and explicit #OpenCL buffer migration variants cut performance in half compared to when sending buffers over the CPU.

**Johann-Tobias Schäg** @freemin7 · Aug 28, 2023

Aug 28, 2023

Johann-Tobias Schäg @freemin7

@ProjectPhysX the NEC VectorEngine can do AFAIK host<--> VE p2p, VE<-->VE p2p, VE<-->Melanox p2p and i recall reading something about copying to NVIDIA GPUs.
Not sure all those are available in OpenCL though or how good the opencl support is.

**FCLC** @fclc · Aug 28, 2023

Aug 28, 2023

FCLC @fclc

@freemin7 @ProjectPhysX AFAIK there's no OpenCL support for VEC, "only" SYCL

**FCLC** @fclc · Aug 28, 2023

Aug 28, 2023

FCLC @fclc

@ProjectPhysX Sounds like they need a clever OpenCL graphics engineer to patch that ;)

Specifically on the PVC front, that means having 1100s support GPU direct RDMA over PCIe, but it also means enabling the use of the XE links

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back