agnos.is Forums

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

LocalLLaMA

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they’re still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

73 Topics 523 Posts

E

zai-org/GLM-4.5-Air · Hugging Face
Watching Ignoring Scheduled Pinned Locked Moved localllama
9

1

13 Votes

9 Posts

0 Views

B

Just read the L1 post and I’m just now realizing this is mainly for running quants which I generally avoid ik_llama.cpp supports special quantization formats incompatible with mainline llama.cpp. You can get better performance out of them than regular GGUFs. That being said... are you implying you run LLMs in FP16? If you're on a huge GPU (or running a small model fast), you should be running sglang or vllm instead, not llama.cpp (which is basically designed for quantization and non-enterprise hardware), especially if you are making parallel calls.
C

My 2.5 year old laptop can write Space Invaders in JavaScript now, using GLM-4.5 Air and MLX
Watching Ignoring Scheduled Pinned Locked Moved localllama
5

1

20 Votes

5 Posts

0 Views

B

Be sure to grab a DWQ quant, like this: https://huggingface.co/nightmedia/Qwen3-30B-A3B-Instruct-2507-dwq3-mlx https://huggingface.co/models?sort=modified&search=DWQ DWQ is like an enhanced MLX that's much stronger with tight quants around 3-4bpw.
G

Audio Flamingo 3 - Fully Open Large Audio Language Models
Watching Ignoring Scheduled Pinned Locked Moved localllama
54

21 Votes

54 Posts

3 Views

H

If the use case doesn't matter, then quotes and parody are illegal, as well as historical archiving and scientific analysis. Well, there is a distinction between use and obtaining it. For stealing, the use doesn't matter. For later use, it does. That's also what licenses are concerned with. That means that every time some new use becomes desirable, the law must be changes. This is obviously stifling for progress in science and culture. Yes, that's obviously the wrong way round. Usually things should be allowed per default, unless they're specifically prohibited or handled by law. We got it the wrong way around, here. However, I don't think it's the other way around in the USA either. While Fair Use is a broad limitation/exemption, it's still concerned with specific exemptions. For example AI wouldn't be allowed by default unless it gets incorporated into law, but they're referring back to the already existing, specific exemption to do "transformative" work. Very much alike our exemptions. Just that it is way more broad. Actually, no. Theft is prosecuted by the government; police and courts. Copyright infringement is generally a civil matter. Damages are paid but there is no criminal prosecution. Well, it is. In the United States, willful copyright infringement carries a maximum fine of $150,000 per instance. In Germany it seems to be prison sentence up to 3 years or a fine. I think laws should either be enforced or abolished. The current situation is not healthy. Maybe you would like to see copyright infringement to be punished more harshly and enforced more strictly? No, copyright should be toned down. Preferably for regular citizens as well and not just the industry. That's an interesting idea. It's not how we do anything else. You don't usually have to pay more for the same thing, depending on who you are or how much you use it. You're wrong here. People do have to pay more if they license a picture to show to their 20 million customers or use it in an advertising campaign, than I do for putting it up in the hallway. Airbus pays like 100x the price for the same set of nuts and bolts than someone else. A kitchen appliance for industrial use costs like 3x the price of an end user kitchen appliance. Because it's more sturdy and made for 24/7 use. A DVD rental business pays more for a DVD than the average customer. Should there be exceptions for celebrities and such, or will they be able to demand licensing fees? No exceptions, no licensing, no fees. This is strictly to avoid bad things like doxxing, ruining people's lives... Then much public content can't be used, after all. The likes of Reddit, Facebook, or Discord will be able to charge licensing fees for their content, after all. [...] They already do. There's a big war going on in the internet. I've told you how my server was targeted by Alibaba and it nearly took down the database. All other people have implemented countermeasures as well. Try scraping Reddit or downloading 5 Youtube videos. It's a thing of the past, you'll get rate-limited and your downloads will quickly start to fail. Unless you pay. So it is defacto that way already and can barely get worse. And the rich can buy their way into things, the monopolists are already in, while I can't do anything any more. My IP addresses get rate-limited or blocked and my accounts banned for "suspicious activity". Which was me making use of my Fair Use rights or the German version of something like that. But I'm prevented from exercising my rights. It's very typically European. You rage against Meta's monopoly but you also call for laws to enforce and strengthen it. I think it's the echo of feudalism in the culture. Well, I think taking authors' livelihood in favour of mega corporations is enforcing and strengthening their monopoly and the echo of feudalism. I'd be less concerned if it was some small research institute doing something for the public or progress. Or if a programming book author was making more than 100,000€ a year and they're "the monopoly". But it's the other way around. This application of Fair Use is in favour of the feudal lord companies and to the detriment of the average person. Also defacto I as a citizen get none of the Fair Use the big companies get, and that's just different rules for different people.
A

AI and You Against the Machine: Guide so you can own Big AI and Run Local - YouTube
Watching Ignoring Scheduled Pinned Locked Moved localllama
6

22 Votes

6 Posts

1 Views

B

This is an excellent guide, way above most YT guides. ik_llama.cpp is a small miracle, and its a crime so few know of it.
R

Do you quantize models yourself?
Watching Ignoring Scheduled Pinned Locked Moved localllama
2

1

0 Votes

2 Posts

0 Views

H

unsloth and bartowski does such a good job that I usually don't have to. But gguf-my-repo work very well.
E

Mistral AI parie sur la transparence en rendant public son impact environnemental
Watching Ignoring Scheduled Pinned Locked Moved localllama
1

1

11 Votes

1 Posts

0 Views

No one has replied
S

Courage was truly precient.
Watching Ignoring Scheduled Pinned Locked Moved localllama
1

2

21 Votes

1 Posts

0 Views

No one has replied
C

Running Local LLMs with Ollama on openSUSE Tumbleweed
Watching Ignoring Scheduled Pinned Locked Moved localllama
21

1

13 Votes

21 Posts

4 Views

B

Honestly I don’t use Qwen3 instruct unless it’s for code or “logic.” Even the 32B is soo dry and focused on that, and countering it with sampling dumbs it down. Not sure if it’s too big, but I have been super impressed with Jamba 52B. It knows tons of fiction trivia and writing styles for such a “small” model, though I haven’t tried to manipulate its prompt for writing yet. And it’s an MoE model like A3B.
E

Le Chat dives deep. | Mistral AI
Watching Ignoring Scheduled Pinned Locked Moved localllama
1

1

12 Votes

1 Posts

0 Views

No one has replied
G

Voxtral - Audio Understanding Based on Mistral
Watching Ignoring Scheduled Pinned Locked Moved localllama
2

1

20 Votes

2 Posts

0 Views

E

more comments here: https://news.ycombinator.com/item?id=44594156 seems pretty good !
B

And deepseek arch!
Watching Ignoring Scheduled Pinned Locked Moved
1

2 Votes

1 Posts

0 Views

No one has replied
B

need help understanding if this setup is even feasible.
Watching Ignoring Scheduled Pinned Locked Moved localllama
21

10 Votes

21 Posts

3 Views

B

Most likely i’ll use podman to make a container for each engine. IDK about windows, but on linux I find it easier to just make a python venv for each engine. Theres less CPU/RAM(/GPU?) overhead that way anyway, and its best to pull bleeding edge git versions of engines. As an added benefit, Python that ships with some OSes (like CachyOS) is more optimized that what podman would pull. Podman is great if security is a concern though. AKA if you don't 'trust' the code of the engine runtimes. ST is good, though its sampling presets are kinda funky and I don't use it personally.
T

Very large amounts of gaming gpus vs AI gpus
Watching Ignoring Scheduled Pinned Locked Moved localllama
8

6 Votes

8 Posts

1 Views

Z

I spent chunks of 2023 and 2024 investigating and testing image gen models after a cryptobro coworker kept talking about it. I rigged up an old system and ran it locally to see wtf these things are doing. Honestly producing slop at 5 seconds per image v 5 minutes is meaningless in terms of value if 0% of the slop can be salvaged. And still, a human has to figure out what to so with the best candidates. In fact at a certain speed it begins to work against itself as no one can realistically analyze AI gen output as fast as it is produced. Conclusion: AI is mostly worthless. It just forces you to accept that human effort is the only thing with intrinsic value. And it's as tough to get out of AI as it is to put any in. And that's looking past all the other gargantuan problems with AI models.
S

How to calculate cost-per-tokens output of local model compared to enterprise model API access
Watching Ignoring Scheduled Pinned Locked Moved localllama
16

2

31 Votes

16 Posts

1 Views

S

Thank you for all your insight!! This is really helpful
E

Apple needs to spend real money to bring in outside talent — and that likely means acquiring a leading AI startup. It has already kicked the tires on Perplexity and will seriously consider Mistral
Watching Ignoring Scheduled Pinned Locked Moved localllama
4

4 Votes

4 Posts

0 Views

W

I could see apple buying Anthropic. Apple has tech that would benefit Anthropic on speech to text/text to speech and Anthropic seems to align well with Apple's corporate philosophy (at least the public version of it)
E

ETH Zurich and EPFL will release a large language model (LLM) developed on public infrastructure. Trained on the “Alps” supercomputer at the Swiss National Supercomputing Centre (CSCS)
Watching Ignoring Scheduled Pinned Locked Moved localllama
2

1

28 Votes

2 Posts

0 Views

B

More open models are good! Granite needs a competitor. I do hope they try an 'exotic' architecture. It doesn't have to be novel; another bitnet or Jamba/Falcon hybrid model would be sick. Is there anywhere I can submit suggestions, heh.
B

Been trying to play with this in ik_llama.cpp, and it's a *temperamental* model.
Watching Ignoring Scheduled Pinned Locked Moved
1

2 Votes

1 Posts

0 Views

No one has replied
S

Creating a e-ink display optimized web interface to access local llm
Watching Ignoring Scheduled Pinned Locked Moved localllama
1

2

5 Votes

1 Posts

0 Views

No one has replied
T

LLMs and their efficiency, can they really replace humans?
Watching Ignoring Scheduled Pinned Locked Moved localllama
11

1

2 Votes

11 Posts

0 Views

H

LLMs are great at automating tasks where we know the solution. And there are a lot of workflows that fall in this category. They are horrible at solving new problems, but that is not where the opportunity for LLMs is anyway.
1

Current best local models for tool use?
Watching Ignoring Scheduled Pinned Locked Moved localllama
8

9 Votes

8 Posts

1 Views

H

For VLMs I love Moondream2. It's a tiny model that packs a punch way above its size. Llama.cpp supports it.

1 / 4