Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. LocalLLaMA
  3. Been trying to play with this in ik_llama.cpp, and it's a *temperamental* model.

Been trying to play with this in ik_llama.cpp, and it's a *temperamental* model.

Scheduled Pinned Locked Moved LocalLLaMA
1 Posts 1 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • B This user is from outside of this forum
    B This user is from outside of this forum
    [email protected]
    wrote last edited by [email protected]
    #1

    Been trying to play with this in ik_llama.cpp, and it's a temperamental model. It feels deep fried, like it wants to be smart if it would just stop looping or getting its own think template wrong.

    It works great in 24GB VRAM though. I'm getting like 16 tok/sec at longish context, with 15 experts on the GPU and the rest offloaded.

    1 Reply Last reply
    2
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    • Login

    • Login or register to search.
    • First post
      Last post
    0
    • Categories
    • Recent
    • Tags
    • Popular
    • World
    • Users
    • Groups