Can anybody explain why CUDA and Rocm are necessary and why OpenCL isn't the solution?
-
Now, I don't write code. So I can't really tell you if this is the truth or not — but:
I hear that OpenCL is much more difficult and less accessible to write than CUDA code. CUDA is easier to write, and thus gets picked up and used by more developers.
Someone mentions CUDA "sometimes" having better performance, but I don't think it's only sometimes. I think that due to the existence of the tensor cores (which are really good at neural nets and matrix multiplication), CUDA has vastly better performance when taking advantage of those hardware features.
Tensor cores are not Nvidia specific, but they are the "most ahead". They have the most in their GPU's, and probably most importantly: CUDA only supports Nvidia, and therefore by extension, their tensor cores.
There are alternative projects, like how leela chess zero mentions tensorflow for google's Tensor Processing Units, but those aren't anywhere near as popular due to performance and software support.
-
You really piqued my interest. I use docker/podman.
W/ an AMD graphics card, eglinfo on the host shows the card is AMD Radeon and driver is matching that.
In the container, without --gpus=all, it shows the card is unknown and the driver is "swrast" (so just CPU fallback).
To make --gpus=all work, it gives the error
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]
I was doing a bad job searching before. I found that AMD can share the GPU, it just works a little differently in terms of how to launch the container.
https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/amdgpu-install.html#amdgpu-install-dkmsBut sadly my AMD GPU is too old/junk to have current driver support.
Anyways, appreciate the reply! Now I can mod my code to run on cheaper cloud instances.
(Note I'm an OpenGL/3D app developer, but probably OpenCL works about the same architecturally)
-
ROCm is an implementation/superset of OpenCL.
ROCm ships its installable client driver (ICD) loader and an OpenCL implementation bundled together. As of January 2022, ROCm 4.5.2 ships OpenCL 2.2
Shaders are computational post-processing - think pixel position based adjustments to rendering.
OpenCL and CUDA are computation frameworks where you can use the GPU for other processing than rendering. You can use it for more general computing.
-
Check implementations before saying shit like that.
Nvidia has historical bad open source driver support, which makes it hard for people to implement vGPU usage.
They actually actively blocked us from using their cards remotely, until COVID hit. Then they gave out the code to do it. They are still limiting customer level cards usage on virtualization cases. They had to give out a toolkit for us to be able to use their cards on docker. Other cards can be accessed just by sharing dev driver files to the volume. -
Check Wolf implementation for context. It's a mess with Nvidia.
https://games-on-whales.github.io/wolf/stable/user/quickstart.html
-
Can you share sample code I can try or documentation I can follow of using an AMD GPU in that way (shared, virtualized, using only open source drivers)?
-
Check Wolf (in my other comment), it's the best example of GPU virtualization usage.
Otherwise you can check other docker images using GPU for computing, like jellyfin for instance, or nextcloud recognize, nextcloud memories and its transcoding instance,...
-
Nvidia has the money and influence to make CUDA a standard. Popular means better...
-
Calling nVidia popular seems wrong. There's not much choice in this space and that choice is pushed further in their favor by anticompetitive bullshit.
-
I'm pretty sure OpenCL was just a play by Apple to standardize heterogeneous compute across different hardware companies and prevent CUDA from dominating
But then they deprecated it in favor of Metal which is just an Apple-specific thing. Probably because they were going to their own hardware anyway
So the main company pushing OpenCL is no longer pushing it, pretty sure it's dying out at this point