agnos.is Forums

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

LocalLLaMA

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they’re still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

73 Topics 523 Posts

H

Nice.
Watching Ignoring Scheduled Pinned Locked Moved
1

8 Votes

1 Posts

0 Views

No one has replied
E

Show HN: Clippy, 90s UI for local LLMs
Watching Ignoring Scheduled Pinned Locked Moved localllama
1

13 Votes

1 Posts

0 Views

No one has replied
M

Specialize LLM
Watching Ignoring Scheduled Pinned Locked Moved localllama
7

14 Votes

7 Posts

29 Views

S

I would receommend you read over the work of the person who finetuned a mistral model on many us army field guides to understand what fine tuning on a lot of books to bake in knowledge looks like. If you are a newbie just learning how this technology works I would suggest trying to get RAG working with a small model and one or two books converted to a big text file just to see how it works. One you have a little more experience and if you are financially well off to the point 1-2 thousand dollars to train a model is who-cares whatever play money to you then go for finetuning.
S

NousResearch is quietly cooking some fascinating stuff presumably for full release of DeepHermes!
Watching Ignoring Scheduled Pinned Locked Moved localllama
1

2

11 Votes

1 Posts

0 Views

No one has replied
X

Qwen3-32b: Windows95 Starfield Screensaver Web App With Warp Drive On Click
Watching Ignoring Scheduled Pinned Locked Moved localllama
5

2

28 Votes

5 Posts

1 Views

X

Fair point. My original prompt asked for more, but the model wasn't capable enough. Not sure if the "warp drive" part would be part of any standard algo. Any ideas on challenges that are new and more fun than the "balls rolling in a hexa-,hepta-,octagon" or "simulate a solar system" prompts everyone's using these days?
E

Technically correct
Watching Ignoring Scheduled Pinned Locked Moved localllama
3

2

91 Votes

3 Posts

0 Views

S

"Hey, how's it going" stares blankly at you like a deer looking at headlights for 15.50 seconds with an uncomfortable silence. "Good." walks away
H

Uh, wow.
Watching Ignoring Scheduled Pinned Locked Moved
3

2 Votes

3 Posts

0 Views

H

Yeah, thanks but I've already tried that. It will write a short amount of text but very quickly fall back to refusal. Both if I do it within the thinking step aand also if I do it in the output. This time the alignment doesn't seem to be slapped on halfheartedly.
B

I'm actually more medium on this!
Watching Ignoring Scheduled Pinned Locked Moved
5

0 Votes

5 Posts

0 Views

B

It could be an oversight, no one has answered yet. Not many people asking either, heh.
S

Context length is disappointing, but the fact that it trades blows with R1 despite being 30B MoE is insane.
Watching Ignoring Scheduled Pinned Locked Moved
2

7 Votes

2 Posts

0 Views

B

Some kind of presentation talks about longer context: https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2F1nos591czhxe1.jpeg Maybe its a work in progress, with Qwen 2.5 14B 1M (really 256K in that case) being the first test?
B

Qwen3 "Leaked"
Watching Ignoring Scheduled Pinned Locked Moved localllama
1

1

18 Votes

1 Posts

0 Views

No one has replied
B

Niche Model of the Day: Nemotron 49B 3bpw exl3
Watching Ignoring Scheduled Pinned Locked Moved localllama
6

1

23 Votes

6 Posts

57 Views

B

That, and exl2 has ROCm support. There was always the bugaboo of uttering a prayer to get rocm flash attention working (come on, AMD...), but exl3 has plans to switch to flashinfer, which should eliminate that issue.
B

Yeah, it's an Nvidia model trained for STEM, and *really* good at that for a '3090 sized model.' For reference, this was a zero-temperature answer.
Watching Ignoring Scheduled Pinned Locked Moved
1

1 Votes

1 Posts

0 Views

No one has replied
C

DeepSeek R2 AI Model Rumors Begin to Swirl Online; Reported to Feature 97% Lower Costs Compared to GPT-4 & Fully Trained on Huawei's Ascend Chips
Watching Ignoring Scheduled Pinned Locked Moved localllama
2

1

56 Votes

2 Posts

2 Views

B

1.2T param, 78B active, hybrid MoE That's enormous, very much not local, heh. Here's the actual article translation (which seems right comparing to other translations): ::: spoiler Translation DeepSeek R2: Unit Cost Drops 97.3%, Imminent Release + Core Specifications Author: Chasing Trends Observer Veteran Crypto Investor Watching from Afar 2025-04-25 12:06:16 Sichuan Three Core Technological Breakthroughs of DeepSeek R2: Architectural Innovation Adopts proprietary Hybrid MoE 3.0 architecture, achieving 1.2 trillion dynamically activated parameters (actual computational consumption: 78 billion parameters). Validated by Alibaba Cloud tests: 97.3% reduction in per-token cost compared to GPT-4 Turbo for long-text inference tasks (Data source: IDC Computing Power Economic Model) Data Engineering Constructed 5.2PB high-quality corpus covering finance, law, patents, and vertical domains. Multi-stage semantic distillation boosts instruction compliance accuracy to 89.7% (Benchmark: C-Eval 2.0 test set) Hardware Optimization Proprietary distributed training framework achieves: 82% utilization rate on Ascend 910B chip clusters 512 PetaFLOPS actual computing power at FP16 precision 91% efficiency of equivalent-scale A100 clusters (Validated by Huawei Labs) Application Layer Advancements - Three Multimodal Breakthroughs: Vision Understanding ViT-Transformer hybrid architecture achieves: 92.4 mAP on COCO dataset object segmentation 11.6% improvement over CLIP models Industrial Inspection Adaptive feature fusion algorithm reduces false detection rate to 7.2E-6 in photovoltaic EL defect detection (Field data from LONGi Green Energy production lines) Medical Diagnostics Knowledge graph-enhanced chest X-ray multi-disease recognition: 98.1% accuracy vs. 96.3% average of senior radiologist panels (Blind test results from Peking Union Medical College Hospital) Key Highlight: 8-bit quantization compression achieves: 83% model size reduction <2% accuracy loss (Enables edge device deployment - Technical White Paper Chapter 4.2) ::: Others translate it as 'sub-8-bit' quantization, which is interesting too.
B

Heh, calls N=NP out *about* as politely as it can:
Watching Ignoring Scheduled Pinned Locked Moved
1

3 Votes

1 Posts

1 Views

No one has replied
A

Less positive model
Watching Ignoring Scheduled Pinned Locked Moved localllama
9

1 Votes

9 Posts

0 Views

H

I'm always a bit unsure about that. Sure AI has a unique perspective on the world, since it has only "seen" it through words. But at the same time these words conceptualize things, there is information and models stored in them and in the way they are arranged. I believe I've seen some evidence, that AI has access to the information behind language, when it applies knowledge, transfers concepts... But that's kind of hard to judge. I mean an obvious example is translation. It knows what a cat or banana is. It picks the correct french word. At the same time it also maintains tone, deals with proverbs, figures of speech... And that was next to impossible with the old machine translation services which only looked at the words. And my impression with doing computer coding or creative writing is, it seems to have some understanding of what it's doing. Why we do things a certain way and sometimes a different way, and what I want it to do. I'm not sure whether I'm being too philosophical with the current state of technology. AI surely isn't very intelligent. It certainly struggles with the harder concepts. Sometimes it feels like its ability to tell apart fact and fiction is on the level of a 5 year old who just practices lying. With stories, it can't really hint at things without giving it away openly. The pacing is off all the time. But I think it has conceptualized a lot of things as well. It'll apply all common story tropes. It loves to do sudden plot twists. And next to tying things up, It'll also introduce random side stories, new characters and dynamics. Sometimes for a reason, sometimes it just gets off track. And I've definitely seen it do suspension and release... Not successful, but I'd say it "knows" more than the words. That makes me think the concepts behind storytelling might actually be somewhere in there. It might just lack the needed intelligence to apply them properly. And maintain the bigger picture of a story, background story, subplots, pacing... I'd say it "knows", it's just utterly unable to juggle the complexity of it. And it hasn't been trained with what makes a story a good one. I'd guess, that might not be a fundamental limitation of AI, though. But more due to how we feed it award-winning novels next to lame Reddit stories without a clear distinction(?) or preference. And I wouldn't be surprised if that's one of the reasons why it doesn't really have a "feeling" of how to do a good job. Concerning OP's original question... I don't think that's part of it. The people doing the training have put in deliberate effort to make AI nice and helpful. As far as I know there's always at least two main steps in creating large language models. The first one is feeding large quantities or text. The result of that is called a "base model". Which will be biased in all the ways the learning datasets are. It'll do all the positivity, negativity, stereotypes, be helpful or unhelpful roughly like people on the internet are, the books and wikipedia, which went in, are. (And that's already more towards positive.) The second step is to tune it for some application. Like answering questions. That makes it usable. And makes it abide by whatever the creators chose. Which likely includes not being rude or negative to customers. That behaviour gets suppressed. If OP wants it a different way, they probably want a different model, or maybe a base model. Or maybe a community-made fine-tune that has a third step on top to align the model with different goals.
H

How do I get started with RAG (ideally with llama.cpp)?
Watching Ignoring Scheduled Pinned Locked Moved localllama
2

1 Votes

2 Posts

5 Views

S

You can use something like Anything LLM for RAG: https://github.com/Mintplex-Labs/anything-llm It works with local models. https://docs.anythingllm.com/agent/usage#what-is-rag-search-and-how-to-use-it
E

AI-generated videos now possible with gaming GPUs with just 6GB of VRAM
Watching Ignoring Scheduled Pinned Locked Moved localllama
6

1

2 Votes

6 Posts

1 Views

E

training but at the same time having 100 million people daily asking a variety of questions will put up a server load
T

Microsoft researchers build 1-bit AI LLM with 2B parameters — model small enough to run on some CPUs
Watching Ignoring Scheduled Pinned Locked Moved localllama
10

1

2 Votes

10 Posts

0 Views

B

The NPUs will have to be rearchitected to optimize themselves for bitnet.
B

Niche Model of the Day: Openbuddy 25.2q, QwQ 32B with Quantization Aware Training
Watching Ignoring Scheduled Pinned Locked Moved localllama
1

1

1 Votes

1 Posts

0 Views

No one has replied
E

[April 2025] Which model are you using?
Watching Ignoring Scheduled Pinned Locked Moved localllama
14

1 Votes

14 Posts

0 Views

A

That's the ones, the 0414 release.