agnos.is Forums

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

zai-org/GLM-4.5-Air · Hugging Face

LocalLLaMA

21 Posts 5 Posters 0 Views

D [email protected]

I'm just gonna try vllm, seems like ik_llama.cpp doesnt have a quick docker method
B This user is from outside of this forum
B This user is from outside of this forum
[email protected]

wrote last edited by [email protected]

#21

It should work in any generic cuda container, but yeah it’s more of a hobbyist engine. Honestly I just run it raw since it’s dependency free, except for system CUDA.

Vllm absolutely cannot CPU offload AFAIK, but small models will fit in your vram with room to spare.
1 Reply Last reply

0