Skip to content

LocalLLaMA

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they’re still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

73 Topics 523 Posts
  • Local Voiceover/Audiobook generation

    localllama
    7
    17 Votes
    7 Posts
    2 Views
    smokeydope@lemmy.worldS
    Nice post Hendrik thanks for sharing your knowledge and helping people out
  • I'm using open web ui, does anybody else have a better interface?

    localllama
    2
    16 Votes
    2 Posts
    0 Views
    C
    jan.ai ?
  • 28 Votes
    5 Posts
    0 Views
    B
    I don’t want to be ungrateful complaining that they dont give us everything. For sure. But I guess it’s still kinda… interesting? Like you’d think Qwen3, Gemma 3, Falcon H1, Nemotron 49B and such would pressure them to release Medium, but I guess there are factors that help them sell it. As stupid as this is, they’re European and specifically not Chinese. In the business world, there’s this mostly irrational fear that the Deepseek or Qwen weights by themselves will jump out of their cage and hack you, heh.
  • My AI Skeptic Friends Are All Nuts

    localllama
    17
    1
    12 Votes
    17 Posts
    2 Views
    I
    Apart from the arguments that yes, vibe coders exist and they will be cheaper to employ, creating huge long term problems, with a generational gap in senior programmers, who are the ones maintaining open source projects. heinous environmental impact, and I mean heinous. This is my biggest problem honestly. you're betting that llms will improve faster than programmers forget "the craft". Llms are wide, not deep, and the less programmers care about boilerplate and how things actually work, the less material for the llms - >feedback loop - > worse llms etc. I use llms, hell I designed a workshop for my employer on how programmers can use llms, cursor, etc. But I don't think we're quite aware how we are screwing ourselves long term.
  • Updated guidelines for c/LocaLLama (new rules)

    localllama
    10
    24 Votes
    10 Posts
    0 Views
    B
    Agreed, agreed, agreed. Thanks Some may seem arbitrary, but things like the NFT/crypto comparison are so politically charged and ripe for abuse that it's a good to nip that in the bud. The only one I have mixed feelings on is: Rule: No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they’re still using the same algorithms since <over 10 years ago>. Reason: There are grains of truth to the reductionist statement that llms rely on mathematical statistics and probability for their outputs. The reasoning is true. I agree. But it does feel a bit uninclusive to outsiders who, to be frank, know nothing about LLMs. Commenters shouldn't drive by and drop reductionist hate, but that's also kinda the nature of Lemmy, heh. So... maybe be a little lax with that rule, I guess? Like give people a chance to be corrected unless they're outright abusive.
  • I'm excited for dots.llm (142BA14B)!

    localllama
    4
    1
    11 Votes
    4 Posts
    0 Views
    B
    This is like a perfect model for a Strix Halo mini PC. Man, I really want one of those Framework Desktops now...
  • What to Integrate With My AI

    localllama
    8
    16 Votes
    8 Posts
    0 Views
    swelter_spark@reddthat.comS
    I use Kobold as a backend for the FluentRead browser plugin, so I can do local language translation.
  • 32 Votes
    16 Posts
    0 Views
    B
    That’s a premade 8x 7900 XTX PC. All standard and off the shelf. I dunno anything about Geohot, all I know is people have been telling me how cool Tinygrad is for years with seemingly nothing to show for it other than social media hype, while other, sometimes newer, PyTorch alternatives like TVM, GGML, the MLIR efforts and such are running real workloads.
  • Don't overlook llama.cpp's rpc-server feature.

    localllama
    5
    4 Votes
    5 Posts
    1 Views
    T
    It loads the rpc machine’s part of the model across the network every time you start the server, I have to correct myself. It appears newer versions of rpc-server have a cache option and you can point them to a locally stored version of the model to avoid the network cost.
  • Noob experience using local LLM as a D&D style DM.

    localllama
    18
    12 Votes
    18 Posts
    0 Views
    T
    Mistral (24B) models are really bad at long context, but this is not always the case. I find that Qwen 32B and Gemma 27B are solid at 32K It looks like the Harbinger RPG model I'm using (from Latitude Games) is based on Mistral 24B, so maybe it inherits that limitation? I like it in other ways. It was trained on RPG games, which seems to help it for my use case. I did try some general purpose / vanilla models and felt they were not as good at D&D type scenarios. It looks like Latitude also has a 70B Wayfarer model. Maybe it would do better at bigger contexts. I have several networked machines with 40GB VRAM between all them, and I can just squeak I4Q_XS x 70B into that unholy assembly if I run 24000 context (before the SWA patch, so maybe more now). I will try it! The drawback is speed. 70B models are slow on my setup, about 8 t/s at startup.
  • 6 Votes
    2 Posts
    0 Views
    T
    At this point, I hope Nv etc realize that even if selling AI cards to data centers gets them 10 times the profit per unit, it really is best for them in the long run to have a healthy and vibrant gamer and enthusiast market too. It's never good to have all your eggs in one basket.
  • I bookmarked this tool in my projekt bucket

    1
    0 Votes
    1 Posts
    0 Views
    No one has replied
  • nvidia knows what the future and where the money is.

    1
    0 Votes
    1 Posts
    0 Views
    No one has replied
  • At this point they're just taking the piss.

    1
    4 Votes
    1 Posts
    0 Views
    No one has replied
  • 14 Votes
    2 Posts
    0 Views
    T
    I like this project. Very nice! I haven't tried RAG yet, nor the fancy vector space whatsit which looks like it requires a specialized model(?) to create. I've been wanting to do something similar in spirit to your project here, but for an online RPG, so I dig this.
  • 6 Votes
    17 Posts
    1 Views
    S
    If we were charged the real electric cost of the AI queries, maybe we would stop using it so speculatively.
  • StackOverflow activity down to 2008 numbers

    localllama
    11
    2
    1 Votes
    11 Posts
    0 Views
    S
    imagine if there were a plethora of them already lurking out there in the deep? isn't it strange then that you don't already have an AI overlord then?
  • 0 Votes
    5 Posts
    0 Views
    smokeydope@lemmy.worldS
    they don’t really write the model is having an Aha moment due to some insight it had. Well, they really can't write it that way because it would imply the model is capable of insight which is a function of higher cognition. That path leads to questioning if machine learning neural networks are capable of any real sparks of sapience or sentience. Thats a 'UGI' conversation most people absolutely don't want to have at this point for various practical, philosophical, and religous/spiritual implications. So you can't just outright say it, especially not in an academic STEM paper. Science academia has a hard bias against the implication of anything metaphysical or overly abstract at best they will say it 'simulates some cognative aspects of intelligence'. In my own experience, the model at least says 'ah! aha! Right, right, right, so..` when it thinks it has had an insight of some kind. Whether or not models are truly capable of such thing or is merely some statistical text prediction artifact is a subjective discussion of philosophy kind of like a computer scientist nerds version of the deterministic philosophical zombie arguments. Thanks for sharing the video! I havent seen computerphile in a while, will take a look especially with that title. Gotta learn about dat forbidden computation
  • 42 Votes
    16 Posts
    0 Views
    B
    Oh yeah, you will run into a ton of pain sampling random projects on AMD/Intel. Most "experiments" only work out of the box on Nvidia. Some can be fixed, some can't. A used 3090 is like gold if you can find one, yeah. And yes, I sympathize with Nvidia being a pain on linux... though it's not so bad if you just output from your IGP or another card. And yes, stuff rented from vast.ai or whatever is cheap. So are APIs. TBH thats probably the way to go if budget is a big concern, and a 24GB B60 is out of the cards.
  • 31 Votes
    8 Posts
    1 Views
    S
    Yes, that's an excellent restatement - "lumping the behaviors together" is a good way to think about it. It learned the abstract concept "reward model biases", and was able to identify that concept as a relevant upstream description of the behaviors it was trained to display through fine tuning, which allowed it to generalize. There was also a related recent study on similar emergent behaviors, where researchers found that fine tuning models on code with security vulnerabilities caused it to become widely unaligned, for example saying that humans should be enslaved by AI or giving malicious advice: https://arxiv.org/abs/2502.17424