Faster Ollama alternative

[email protected]

There are many projects out there optimizing the speed significantly. Ollama is unbeaten in the convenience though

[email protected]

It's not, by far. But vllm or SGLang don't support a lot of the requested features unfortunately.

[email protected]

Btw, Ollama is a software to run AI models. Deepseek is just a company. Or a model file or a service. But that's not what OP is looking for. They want to run a model. And that needs software like Ollama.

[email protected]

I'm also aware of LocalAI with automatic model swapping and OpenAI compatible API.

But unless I'm mistaken, they all use ggml behind the scenes? So you might want to look for something that uses vllm or exllama or something if you want a completely different backend.

[email protected]

Try llamafile from Mozilla.

[email protected]

Vllm unfortunately doesn't support switching the model without a restart.

[email protected]

Are you using a tiny model (1.5B-7B parameters)? ollama pulls 4bit quant by default. It looks like vllm does not used quantized models by default so this is likely the difference.

[email protected]

I would not recommend LocalAI. There documentation is somewhat lacking and it’s an all in one utility with many moving parts. The parts also tend to break, quite often.

[email protected]

It was multiple models, mainly 32-70B

[email protected]

Can you try setting the num_ctx and num_predict using a Modelfile with ollama?

[email protected]

I’ve read about this method in the GitHub issues, but to me it seemed impractical to have different models just to change the context size, and that was the point I started looking for alternatives

[email protected]

You can overwrite the model by using the same name instead of creating one with a new name if it bothers you. Either way there is no duplication of the llm model file

agnos.is Forums

Faster Ollama alternative