Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. LocalLLaMA
  3. llama4 release discussion thread

llama4 release discussion thread

Scheduled Pinned Locked Moved LocalLLaMA
localllama
6 Posts 4 Posters 17 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • smokeydope@lemmy.worldS This user is from outside of this forum
    smokeydope@lemmy.worldS This user is from outside of this forum
    [email protected]
    wrote on last edited by
    #1

    General consensus seems to be that llama4 was a flop. A head of meta AI research division was let go.

    Do you think it was a bad fp32 conversion, or just unerwhelming models all around?

    2t parameters was a big increase without much gain. If throwing compute and parameters isnt working to stay competitive anymore, how do you think the next big performance gains will be made? Better CoT reasoning patterns? Omnimodal? something entirely new?

    P 1 Reply Last reply
    1
    • smokeydope@lemmy.worldS [email protected]

      General consensus seems to be that llama4 was a flop. A head of meta AI research division was let go.

      Do you think it was a bad fp32 conversion, or just unerwhelming models all around?

      2t parameters was a big increase without much gain. If throwing compute and parameters isnt working to stay competitive anymore, how do you think the next big performance gains will be made? Better CoT reasoning patterns? Omnimodal? something entirely new?

      P This user is from outside of this forum
      P This user is from outside of this forum
      [email protected]
      wrote on last edited by
      #2

      I think the next bit of performance may be leaning hard into QAT. We know there is a lot of wasted precision in models, so the more we understand that during training the better quality small quants can get.

      I also think diffusion LLMs ability to change previous tokens is amazing. As well as the ability to iteratively use an auto regressive LLM to increase output quality.

      I think a mix of QAT and iterative interference will bring the biggest upgrades to local use. It'll give you a smaller higher quality model thay you can decide to run for even longer for higher quality outputs.

      W 1 Reply Last reply
      0
      • P [email protected]

        I think the next bit of performance may be leaning hard into QAT. We know there is a lot of wasted precision in models, so the more we understand that during training the better quality small quants can get.

        I also think diffusion LLMs ability to change previous tokens is amazing. As well as the ability to iteratively use an auto regressive LLM to increase output quality.

        I think a mix of QAT and iterative interference will bring the biggest upgrades to local use. It'll give you a smaller higher quality model thay you can decide to run for even longer for higher quality outputs.

        W This user is from outside of this forum
        W This user is from outside of this forum
        [email protected]
        wrote on last edited by
        #3

        Hmmn, never heard of QAT. What does it stand for?

        A 1 Reply Last reply
        0
        • W [email protected]

          Hmmn, never heard of QAT. What does it stand for?

          A This user is from outside of this forum
          A This user is from outside of this forum
          [email protected]
          wrote on last edited by
          #4

          https://pytorch.org/blog/quantization-aware-training/

          I had heard of it but I'm not aware of public models implementing this

          P 1 Reply Last reply
          0
          • A [email protected]

            https://pytorch.org/blog/quantization-aware-training/

            I had heard of it but I'm not aware of public models implementing this

            P This user is from outside of this forum
            P This user is from outside of this forum
            [email protected]
            wrote on last edited by
            #5

            Here is link for ollama for Gemma 3 QAT
            https://ollama.com/eramax/gemma-3-27b-it-qat:q4_0

            There are ggufs around if you want to try it on another back end.

            A 1 Reply Last reply
            1
            0
            • System shared this topic on
            • P [email protected]

              Here is link for ollama for Gemma 3 QAT
              https://ollama.com/eramax/gemma-3-27b-it-qat:q4_0

              There are ggufs around if you want to try it on another back end.

              A This user is from outside of this forum
              A This user is from outside of this forum
              [email protected]
              wrote on last edited by
              #6

              Thanks. I'll try it out!

              1 Reply Last reply
              1
              0
              Reply
              • Reply as topic
              Log in to reply
              • Oldest to Newest
              • Newest to Oldest
              • Most Votes


              • Login

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • World
              • Users
              • Groups