Skip to content

LocalLLaMA

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they’re still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

73 Topics 523 Posts
  • Nice.

    1
    8 Votes
    1 Posts
    0 Views
    No one has replied
  • Show HN: Clippy, 90s UI for local LLMs

    localllama
    1
    13 Votes
    1 Posts
    0 Views
    No one has replied
  • Specialize LLM

    localllama
    7
    14 Votes
    7 Posts
    29 Views
    smokeydope@lemmy.worldS
    I would receommend you read over the work of the person who finetuned a mistral model on many us army field guides to understand what fine tuning on a lot of books to bake in knowledge looks like. If you are a newbie just learning how this technology works I would suggest trying to get RAG working with a small model and one or two books converted to a big text file just to see how it works. One you have a little more experience and if you are financially well off to the point 1-2 thousand dollars to train a model is who-cares whatever play money to you then go for finetuning.
  • 11 Votes
    1 Posts
    0 Views
    No one has replied
  • 28 Votes
    5 Posts
    1 Views
    X
    Fair point. My original prompt asked for more, but the model wasn't capable enough. Not sure if the "warp drive" part would be part of any standard algo. Any ideas on challenges that are new and more fun than the "balls rolling in a hexa-,hepta-,octagon" or "simulate a solar system" prompts everyone's using these days?
  • Technically correct

    localllama
    3
    2
    91 Votes
    3 Posts
    0 Views
    smokeydope@lemmy.worldS
    "Hey, how's it going" stares blankly at you like a deer looking at headlights for 15.50 seconds with an uncomfortable silence. "Good." walks away
  • Uh, wow.

    3
    2 Votes
    3 Posts
    0 Views
    H
    Yeah, thanks but I've already tried that. It will write a short amount of text but very quickly fall back to refusal. Both if I do it within the thinking step aand also if I do it in the output. This time the alignment doesn't seem to be slapped on halfheartedly.
  • I'm actually more medium on this!

    5
    0 Votes
    5 Posts
    0 Views
    B
    It could be an oversight, no one has answered yet. Not many people asking either, heh.
  • 7 Votes
    2 Posts
    0 Views
    B
    Some kind of presentation talks about longer context: https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2F1nos591czhxe1.jpeg Maybe its a work in progress, with Qwen 2.5 14B 1M (really 256K in that case) being the first test?
  • Qwen3 "Leaked"

    localllama
    1
    1
    18 Votes
    1 Posts
    0 Views
    No one has replied
  • Niche Model of the Day: Nemotron 49B 3bpw exl3

    localllama
    6
    1
    23 Votes
    6 Posts
    57 Views
    B
    That, and exl2 has ROCm support. There was always the bugaboo of uttering a prayer to get rocm flash attention working (come on, AMD...), but exl3 has plans to switch to flashinfer, which should eliminate that issue.
  • 1 Votes
    1 Posts
    0 Views
    No one has replied
  • 56 Votes
    2 Posts
    2 Views
    B
    1.2T param, 78B active, hybrid MoE That's enormous, very much not local, heh. Here's the actual article translation (which seems right comparing to other translations): ::: spoiler Translation DeepSeek R2: Unit Cost Drops 97.3%, Imminent Release + Core Specifications Author: Chasing Trends Observer Veteran Crypto Investor Watching from Afar 2025-04-25 12:06:16 Sichuan Three Core Technological Breakthroughs of DeepSeek R2: Architectural Innovation Adopts proprietary Hybrid MoE 3.0 architecture, achieving 1.2 trillion dynamically activated parameters (actual computational consumption: 78 billion parameters). Validated by Alibaba Cloud tests: 97.3% reduction in per-token cost compared to GPT-4 Turbo for long-text inference tasks (Data source: IDC Computing Power Economic Model) Data Engineering Constructed 5.2PB high-quality corpus covering finance, law, patents, and vertical domains. Multi-stage semantic distillation boosts instruction compliance accuracy to 89.7% (Benchmark: C-Eval 2.0 test set) Hardware Optimization Proprietary distributed training framework achieves: 82% utilization rate on Ascend 910B chip clusters 512 PetaFLOPS actual computing power at FP16 precision 91% efficiency of equivalent-scale A100 clusters (Validated by Huawei Labs) Application Layer Advancements - Three Multimodal Breakthroughs: Vision Understanding ViT-Transformer hybrid architecture achieves: 92.4 mAP on COCO dataset object segmentation 11.6% improvement over CLIP models Industrial Inspection Adaptive feature fusion algorithm reduces false detection rate to 7.2E-6 in photovoltaic EL defect detection (Field data from LONGi Green Energy production lines) Medical Diagnostics Knowledge graph-enhanced chest X-ray multi-disease recognition: 98.1% accuracy vs. 96.3% average of senior radiologist panels (Blind test results from Peking Union Medical College Hospital) Key Highlight: 8-bit quantization compression achieves: 83% model size reduction <2% accuracy loss (Enables edge device deployment - Technical White Paper Chapter 4.2) ::: Others translate it as 'sub-8-bit' quantization, which is interesting too.
  • Heh, calls N=NP out *about* as politely as it can:

    1
    3 Votes
    1 Posts
    1 Views
    No one has replied
  • Less positive model

    localllama
    9
    1 Votes
    9 Posts
    0 Views
    H
    I'm always a bit unsure about that. Sure AI has a unique perspective on the world, since it has only "seen" it through words. But at the same time these words conceptualize things, there is information and models stored in them and in the way they are arranged. I believe I've seen some evidence, that AI has access to the information behind language, when it applies knowledge, transfers concepts... But that's kind of hard to judge. I mean an obvious example is translation. It knows what a cat or banana is. It picks the correct french word. At the same time it also maintains tone, deals with proverbs, figures of speech... And that was next to impossible with the old machine translation services which only looked at the words. And my impression with doing computer coding or creative writing is, it seems to have some understanding of what it's doing. Why we do things a certain way and sometimes a different way, and what I want it to do. I'm not sure whether I'm being too philosophical with the current state of technology. AI surely isn't very intelligent. It certainly struggles with the harder concepts. Sometimes it feels like its ability to tell apart fact and fiction is on the level of a 5 year old who just practices lying. With stories, it can't really hint at things without giving it away openly. The pacing is off all the time. But I think it has conceptualized a lot of things as well. It'll apply all common story tropes. It loves to do sudden plot twists. And next to tying things up, It'll also introduce random side stories, new characters and dynamics. Sometimes for a reason, sometimes it just gets off track. And I've definitely seen it do suspension and release... Not successful, but I'd say it "knows" more than the words. That makes me think the concepts behind storytelling might actually be somewhere in there. It might just lack the needed intelligence to apply them properly. And maintain the bigger picture of a story, background story, subplots, pacing... I'd say it "knows", it's just utterly unable to juggle the complexity of it. And it hasn't been trained with what makes a story a good one. I'd guess, that might not be a fundamental limitation of AI, though. But more due to how we feed it award-winning novels next to lame Reddit stories without a clear distinction(?) or preference. And I wouldn't be surprised if that's one of the reasons why it doesn't really have a "feeling" of how to do a good job. Concerning OP's original question... I don't think that's part of it. The people doing the training have put in deliberate effort to make AI nice and helpful. As far as I know there's always at least two main steps in creating large language models. The first one is feeding large quantities or text. The result of that is called a "base model". Which will be biased in all the ways the learning datasets are. It'll do all the positivity, negativity, stereotypes, be helpful or unhelpful roughly like people on the internet are, the books and wikipedia, which went in, are. (And that's already more towards positive.) The second step is to tune it for some application. Like answering questions. That makes it usable. And makes it abide by whatever the creators chose. Which likely includes not being rude or negative to customers. That behaviour gets suppressed. If OP wants it a different way, they probably want a different model, or maybe a base model. Or maybe a community-made fine-tune that has a third step on top to align the model with different goals.
  • How do I get started with RAG (ideally with llama.cpp)?

    localllama
    2
    1 Votes
    2 Posts
    5 Views
    S
    You can use something like Anything LLM for RAG: https://github.com/Mintplex-Labs/anything-llm It works with local models. https://docs.anythingllm.com/agent/usage#what-is-rag-search-and-how-to-use-it
  • 2 Votes
    6 Posts
    1 Views
    eyekaytee@aussie.zoneE
    training but at the same time having 100 million people daily asking a variety of questions will put up a server load
  • 2 Votes
    10 Posts
    0 Views
    B
    The NPUs will have to be rearchitected to optimize themselves for bitnet.
  • 1 Votes
    1 Posts
    0 Views
    No one has replied
  • [April 2025] Which model are you using?

    localllama
    14
    1 Votes
    14 Posts
    0 Views
    A
    That's the ones, the 0414 release.