Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. LocalLLaMA
  3. Should be able to load the full version of DeepSeek R1 on this no prob 😎😎

Should be able to load the full version of DeepSeek R1 on this no prob 😎😎

Scheduled Pinned Locked Moved LocalLLaMA
localllama
10 Posts 6 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • cm0002@lemmy.worldC This user is from outside of this forum
    cm0002@lemmy.worldC This user is from outside of this forum
    [email protected]
    wrote on last edited by
    #1
    This post did not contain any content.
    hoshikarakitaridia@lemmy.worldH smokeydope@lemmy.worldS H 3 Replies Last reply
    1
    1
    • cm0002@lemmy.worldC [email protected]
      This post did not contain any content.
      hoshikarakitaridia@lemmy.worldH This user is from outside of this forum
      hoshikarakitaridia@lemmy.worldH This user is from outside of this forum
      [email protected]
      wrote on last edited by
      #2

      This is a good time to ask: I want to use AI on a local server (deepseek maybe, image generators like flux, ...) is there a cheaper alternative to flagship Nvidia cards which can do it?

      K cm0002@lemmy.worldC icecreamtaco@lemmy.worldI smokeydope@lemmy.worldS 4 Replies Last reply
      1
      0
      • hoshikarakitaridia@lemmy.worldH [email protected]

        This is a good time to ask: I want to use AI on a local server (deepseek maybe, image generators like flux, ...) is there a cheaper alternative to flagship Nvidia cards which can do it?

        K This user is from outside of this forum
        K This user is from outside of this forum
        [email protected]
        wrote on last edited by
        #3

        Depends on your goals. For raw tokens per second, yeah you want an Nvidia card with enough^(tm)^ memory for your target model(s).

        But if you don't care so much for speed beyond a certain amount, or you're okay sacrificing some speed for economy, AMD RX7900 XT/XTX or 9070 both work pretty well for small to mid sized local models.

        Otherwise you can look at the SOC type solutions like AMD Strix Halo or Nvidia DGX for more model size at the cost of speed, but always look for reputable benchmarks showing 'enough' speed for your use case.

        1 Reply Last reply
        1
        0
        • hoshikarakitaridia@lemmy.worldH [email protected]

          This is a good time to ask: I want to use AI on a local server (deepseek maybe, image generators like flux, ...) is there a cheaper alternative to flagship Nvidia cards which can do it?

          cm0002@lemmy.worldC This user is from outside of this forum
          cm0002@lemmy.worldC This user is from outside of this forum
          [email protected]
          wrote on last edited by
          #4

          From my reading, if you don't mind sacrificing speed (tokens/sec), you can run models in system RAM. To be usable though, you'd need at a minimum a dual proc server/workstation for multichannel RAM and enough RAM to fit the model

          So for something like DS R1, you'd need like >512GB RAM

          smokeydope@lemmy.worldS 1 Reply Last reply
          1
          0
          • hoshikarakitaridia@lemmy.worldH [email protected]

            This is a good time to ask: I want to use AI on a local server (deepseek maybe, image generators like flux, ...) is there a cheaper alternative to flagship Nvidia cards which can do it?

            icecreamtaco@lemmy.worldI This user is from outside of this forum
            icecreamtaco@lemmy.worldI This user is from outside of this forum
            [email protected]
            wrote on last edited by
            #5

            Assuming you haven't ruled this out already, test your plans out now using whatever computer you already own. At the hobbyist level you can do a lot with 8GB ram and no graphics card. 7B LLMs are really good now and they're only going to get better.

            1 Reply Last reply
            1
            0
            • hoshikarakitaridia@lemmy.worldH [email protected]

              This is a good time to ask: I want to use AI on a local server (deepseek maybe, image generators like flux, ...) is there a cheaper alternative to flagship Nvidia cards which can do it?

              smokeydope@lemmy.worldS This user is from outside of this forum
              smokeydope@lemmy.worldS This user is from outside of this forum
              [email protected]
              wrote on last edited by
              #6

              Its all about ram and vram. You can buy some cheap ram sticks get your system to like 128gb ram and run a low quant of the full deepseek. It wont be fast but it will work. Now if you want fast you need to be able to get the model on some graphics card vram ideally all of it. Thats where the high end Nvidia stuff comes in, getting 24gb of vram all on the same card at maximum band with speeds. Some people prefer macs or data center cards. You can use amd cards too its just not as well supported.

              Localllama users tend use smaller models than the full deepseek r1 that fit on older cards. 32b partially offloaded between a older graphics card and ram sticks is around the limit of what a non dedicated hobbiest can achieve with ther already existing home hardware. Most are really happy with the performance of mistral small and qwen qwq and the deepseek distills. those that want more have the money to burn on multiple nvidia gpus and a server rack.

              LLM wise Your phone can run 1-4b models,
              Your laptop 4-8b, your older gaming desktop with a 4-8gb vram card can run around 8-32b. Beyond that needs the big expensive 24gb cards and further beyond needs multiples of them.

              Stable diffusion models in my experience is very compute intensive. Quantization degredation is much more apparent so You should have vram, a high quant model, and should limit canvas size as low as tolerable.

              Hopefully we will get cheaper devices meant for AI hosting like cheaper versions of strix and digits.

              1 Reply Last reply
              1
              0
              • cm0002@lemmy.worldC [email protected]
                This post did not contain any content.
                smokeydope@lemmy.worldS This user is from outside of this forum
                smokeydope@lemmy.worldS This user is from outside of this forum
                [email protected]
                wrote on last edited by
                #7

                What is it? Oh I see the sticker now πŸ™‚ yes quite the beastly graphics card so much vram!

                1 Reply Last reply
                1
                0
                • cm0002@lemmy.worldC [email protected]
                  This post did not contain any content.
                  H This user is from outside of this forum
                  H This user is from outside of this forum
                  [email protected]
                  wrote on last edited by
                  #8

                  Wow. Doing some spring-cleaning? I might have one of those on my own small pile of e-waste. Can't even remember what kind of bandwith the PCI bus had... probably enough to fill 128MB.

                  cm0002@lemmy.worldC 1 Reply Last reply
                  0
                  • H [email protected]

                    Wow. Doing some spring-cleaning? I might have one of those on my own small pile of e-waste. Can't even remember what kind of bandwith the PCI bus had... probably enough to fill 128MB.

                    cm0002@lemmy.worldC This user is from outside of this forum
                    cm0002@lemmy.worldC This user is from outside of this forum
                    [email protected]
                    wrote on last edited by
                    #9

                    Lol I was digging for other parts in my "hardware archive" and came across this, I had actually forgotten about the non-express PCI and thought it was AGP for a minute LMAO

                    1 Reply Last reply
                    1
                    0
                    • cm0002@lemmy.worldC [email protected]

                      From my reading, if you don't mind sacrificing speed (tokens/sec), you can run models in system RAM. To be usable though, you'd need at a minimum a dual proc server/workstation for multichannel RAM and enough RAM to fit the model

                      So for something like DS R1, you'd need like >512GB RAM

                      smokeydope@lemmy.worldS This user is from outside of this forum
                      smokeydope@lemmy.worldS This user is from outside of this forum
                      [email protected]
                      wrote on last edited by
                      #10

                      You are correct in your understanding. However the last part of your comment needs a big asterisk. Its important to consider quantization.

                      The full f16 deepseek r1 gguf from unsloth requires 1.34tb of ram. Good luck getting the ram sticks and channels for that.

                      The q4_km mid range quant is 404gb which would theoretically fit inside 512gb of ram with leftover room for context.

                      512gb of ram is still a lot, theoretical you could run a lower quant of r1 with 256gb of ram. Not super desirable but totally doable.

                      1 Reply Last reply
                      1
                      0
                      • System shared this topic on
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups