Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Technology
  3. Soldered on ram and GPU.

Soldered on ram and GPU.

Scheduled Pinned Locked Moved Technology
35 Posts 22 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • 01189998819991197253@infosec.pub0 This user is from outside of this forum
    01189998819991197253@infosec.pub0 This user is from outside of this forum
    [email protected]
    wrote on last edited by
    #1

    Soldered on ram and GPU. Strange for Framework.

    E S C 3 Replies Last reply
    0
    • System shared this topic on
    • 01189998819991197253@infosec.pub0 [email protected]

      Soldered on ram and GPU. Strange for Framework.

      E This user is from outside of this forum
      E This user is from outside of this forum
      [email protected]
      wrote on last edited by
      #2

      Apparently AMD couldn’t make the signal integrity work out with socketed RAM. (source: LTT video with Framework CEO)

      IMHO: Up until now, using soldered RAM was lazy and cheap bullshit. But I do think we are at the limit of what’s reasonable to do over socketed RAM. In high performance datacenter applications, socketed RAM is on it’s way out (see: MI300A, Grace-{Hopper,Blackwell},Xeon Max), with onboard memory gaining ground. I think we’ll see the same trend on consumer stuff as well. Requirements on memory bandwidth and latency are going up with recent trends like powerful integrated graphics and AI-slop, and socketed RAM simply won’t work.

      It’s sad, but in a few generations I think only the lower end consumer CPUs will be possible to use with socketed RAM. I’m betting the high performance consumer CPUs will require not only soldered, but on-board RAM.

      Finally, some Grace Hopper to make everyone happy: https://youtube.com/watch?v=gYqF6-h9Cvg

      U B wabafee@lemmy.worldW E 4 Replies Last reply
      0
      • E [email protected]

        Apparently AMD couldn’t make the signal integrity work out with socketed RAM. (source: LTT video with Framework CEO)

        IMHO: Up until now, using soldered RAM was lazy and cheap bullshit. But I do think we are at the limit of what’s reasonable to do over socketed RAM. In high performance datacenter applications, socketed RAM is on it’s way out (see: MI300A, Grace-{Hopper,Blackwell},Xeon Max), with onboard memory gaining ground. I think we’ll see the same trend on consumer stuff as well. Requirements on memory bandwidth and latency are going up with recent trends like powerful integrated graphics and AI-slop, and socketed RAM simply won’t work.

        It’s sad, but in a few generations I think only the lower end consumer CPUs will be possible to use with socketed RAM. I’m betting the high performance consumer CPUs will require not only soldered, but on-board RAM.

        Finally, some Grace Hopper to make everyone happy: https://youtube.com/watch?v=gYqF6-h9Cvg

        U This user is from outside of this forum
        U This user is from outside of this forum
        [email protected]
        wrote on last edited by
        #3

        Honestly I upgrade every few years and isually have to purchase a new mobo anyhow. I do think this could lead to less options for mobos though.

        E C 2 Replies Last reply
        0
        • U [email protected]

          Honestly I upgrade every few years and isually have to purchase a new mobo anyhow. I do think this could lead to less options for mobos though.

          E This user is from outside of this forum
          E This user is from outside of this forum
          [email protected]
          wrote on last edited by
          #4

          I don’t think you are wrong, but I don’t think you go far enough. In a few generations, the only option for top performance will be a SoC. You’ll get to pick which SoC you want and what box you want to put it in.

          G 1 Reply Last reply
          0
          • E [email protected]

            I don’t think you are wrong, but I don’t think you go far enough. In a few generations, the only option for top performance will be a SoC. You’ll get to pick which SoC you want and what box you want to put it in.

            G This user is from outside of this forum
            G This user is from outside of this forum
            [email protected]
            wrote on last edited by
            #5

            the only option for top performance will be a SoC

            System in a Package (SiP) at least. Might not be efficient to etch the logic and that much memory onto the same silicon die, as the latest and greatest TSMC node will likely be much more expensive per square mm than the cutting edge memory production node from Samsung or whatever foundry where the memory is being made.

            But with advanced packaging going the way it's been over the last decade or so, it's going to be hard to compete with the latency/throughout of an in-package interposer. You can only do so much with the vias/pathways on a printed circuit board.

            E 1 Reply Last reply
            0
            • E [email protected]

              Apparently AMD couldn’t make the signal integrity work out with socketed RAM. (source: LTT video with Framework CEO)

              IMHO: Up until now, using soldered RAM was lazy and cheap bullshit. But I do think we are at the limit of what’s reasonable to do over socketed RAM. In high performance datacenter applications, socketed RAM is on it’s way out (see: MI300A, Grace-{Hopper,Blackwell},Xeon Max), with onboard memory gaining ground. I think we’ll see the same trend on consumer stuff as well. Requirements on memory bandwidth and latency are going up with recent trends like powerful integrated graphics and AI-slop, and socketed RAM simply won’t work.

              It’s sad, but in a few generations I think only the lower end consumer CPUs will be possible to use with socketed RAM. I’m betting the high performance consumer CPUs will require not only soldered, but on-board RAM.

              Finally, some Grace Hopper to make everyone happy: https://youtube.com/watch?v=gYqF6-h9Cvg

              B This user is from outside of this forum
              B This user is from outside of this forum
              [email protected]
              wrote on last edited by
              #6

              I definitely wouldn't mind soldered RAM if there's still an expansion socket. Solder in at least a reasonable minimum (16G?) and not the cheap stuff but memory that can actually use the signal integrity advantage, I may want more RAM but it's fine if it's a bit slower. You can leave out the DIMM slot but then have at least one PCIe x16 expansion slot. A free one, one in addition to the GPU slot. PCIe latency isn't stellar but on the upside, expansion boards would come with their own memory controllers, and push come to shove you can configure the faster RAM as cache / the expansion RAM as swap.

              Heck, throw the memory into the CPU package. It's not like there's ever a situation where you don't need RAM.

              E 1 Reply Last reply
              0
              • 01189998819991197253@infosec.pub0 [email protected]

                Soldered on ram and GPU. Strange for Framework.

                S This user is from outside of this forum
                S This user is from outside of this forum
                [email protected]
                wrote on last edited by
                #7

                Ye the soldered ram is for sure making me doubt framework now.

                J N 2 Replies Last reply
                0
                • U [email protected]

                  Honestly I upgrade every few years and isually have to purchase a new mobo anyhow. I do think this could lead to less options for mobos though.

                  C This user is from outside of this forum
                  C This user is from outside of this forum
                  [email protected]
                  wrote on last edited by
                  #8

                  I get it but imagine the GPU style markup when all mobos have a set amount of RAM. You'll have two identical boards except for $30 worth of memory with a price spread of $200+. Not fun.

                  1 Reply Last reply
                  0
                  • B [email protected]

                    I definitely wouldn't mind soldered RAM if there's still an expansion socket. Solder in at least a reasonable minimum (16G?) and not the cheap stuff but memory that can actually use the signal integrity advantage, I may want more RAM but it's fine if it's a bit slower. You can leave out the DIMM slot but then have at least one PCIe x16 expansion slot. A free one, one in addition to the GPU slot. PCIe latency isn't stellar but on the upside, expansion boards would come with their own memory controllers, and push come to shove you can configure the faster RAM as cache / the expansion RAM as swap.

                    Heck, throw the memory into the CPU package. It's not like there's ever a situation where you don't need RAM.

                    E This user is from outside of this forum
                    E This user is from outside of this forum
                    [email protected]
                    wrote on last edited by
                    #9

                    All your RAM needs to be the same speed unless you want to open up a rabbit hole. All attempts at that thus far have kinda flopped. You can make very good use of such systems, but I’ve only seen it succeed with software specifically tailored for that use case (say databases or simulations).

                    The way I see it, RAM in the future will be on package and non-expandable. CXL might get some traction, but naah.

                    fiddlesticks@lemmy.dbzer0.comF B 2 Replies Last reply
                    0
                    • G [email protected]

                      the only option for top performance will be a SoC

                      System in a Package (SiP) at least. Might not be efficient to etch the logic and that much memory onto the same silicon die, as the latest and greatest TSMC node will likely be much more expensive per square mm than the cutting edge memory production node from Samsung or whatever foundry where the memory is being made.

                      But with advanced packaging going the way it's been over the last decade or so, it's going to be hard to compete with the latency/throughout of an in-package interposer. You can only do so much with the vias/pathways on a printed circuit board.

                      E This user is from outside of this forum
                      E This user is from outside of this forum
                      [email protected]
                      wrote on last edited by
                      #10

                      You are correct, I’m referring to on package. Need more coffee.

                      G 1 Reply Last reply
                      0
                      • S [email protected]

                        Ye the soldered ram is for sure making me doubt framework now.

                        J This user is from outside of this forum
                        J This user is from outside of this forum
                        [email protected]
                        wrote on last edited by
                        #11

                        Signal integrity is a real issue with dimm modules. It's the same reason you don't see modular VRAM on GPUs. If the ram needs to behave like VRAM, it needs to run at VRAM speeds.

                        natanox@discuss.tchncs.deN 1 Reply Last reply
                        0
                        • E [email protected]

                          All your RAM needs to be the same speed unless you want to open up a rabbit hole. All attempts at that thus far have kinda flopped. You can make very good use of such systems, but I’ve only seen it succeed with software specifically tailored for that use case (say databases or simulations).

                          The way I see it, RAM in the future will be on package and non-expandable. CXL might get some traction, but naah.

                          fiddlesticks@lemmy.dbzer0.comF This user is from outside of this forum
                          fiddlesticks@lemmy.dbzer0.comF This user is from outside of this forum
                          [email protected]
                          wrote on last edited by
                          #12

                          Couldn't you just treat the socketed ram like another layer of memory effectively meaning that L1-3 are on the CPU "L4" would be soldered RAM and then L5 would be extra socketed RAM? Alternatively couldn't you just treat it like really fast swap?

                          B E B 3 Replies Last reply
                          0
                          • E [email protected]

                            All your RAM needs to be the same speed unless you want to open up a rabbit hole. All attempts at that thus far have kinda flopped. You can make very good use of such systems, but I’ve only seen it succeed with software specifically tailored for that use case (say databases or simulations).

                            The way I see it, RAM in the future will be on package and non-expandable. CXL might get some traction, but naah.

                            B This user is from outside of this forum
                            B This user is from outside of this forum
                            [email protected]
                            wrote on last edited by
                            #13

                            The cache hierarchy has flopped? People aren't using swap?

                            NUMA also hasn't flopped, it's just that most systems aren't dual socket. Different memory speeds for the same CPU is not ideal and you don't build a system like that but among upgraded systems that's not rare at all.

                            J E 2 Replies Last reply
                            0
                            • fiddlesticks@lemmy.dbzer0.comF [email protected]

                              Couldn't you just treat the socketed ram like another layer of memory effectively meaning that L1-3 are on the CPU "L4" would be soldered RAM and then L5 would be extra socketed RAM? Alternatively couldn't you just treat it like really fast swap?

                              B This user is from outside of this forum
                              B This user is from outside of this forum
                              [email protected]
                              wrote on last edited by
                              #14

                              Using it as cache would reduce total capacity as cache implies coherence, and treating it as ordinary swap would mean copying to main memory before you access it which is silly when you can access it directly. That is you'd want to write a couple of lines of kernel code to use it effectively but it's nowhere close to rocket science. Nowhere near as complicated as making proper use of NUMA architectures.

                              1 Reply Last reply
                              0
                              • B [email protected]

                                The cache hierarchy has flopped? People aren't using swap?

                                NUMA also hasn't flopped, it's just that most systems aren't dual socket. Different memory speeds for the same CPU is not ideal and you don't build a system like that but among upgraded systems that's not rare at all.

                                J This user is from outside of this forum
                                J This user is from outside of this forum
                                [email protected]
                                wrote on last edited by
                                #15

                                In systems where memory speed are mismatched, the system runs at the slowest module's speed. So literally making the soldered, faster memory slower. Why even have soldered memory at that point?

                                B 1 Reply Last reply
                                0
                                • J [email protected]

                                  In systems where memory speed are mismatched, the system runs at the slowest module's speed. So literally making the soldered, faster memory slower. Why even have soldered memory at that point?

                                  B This user is from outside of this forum
                                  B This user is from outside of this forum
                                  [email protected]
                                  wrote on last edited by
                                  #16

                                  I'd assume the soldered memory to have a dedicated memory controller. There's also no hard requirement that a single controller can't drive different channels at different speeds. The only hard requirement is that one channel needs to run at one speed.

                                  1 Reply Last reply
                                  0
                                  • S [email protected]

                                    Ye the soldered ram is for sure making me doubt framework now.

                                    N This user is from outside of this forum
                                    N This user is from outside of this forum
                                    [email protected]
                                    wrote on last edited by
                                    #17

                                    Apparently AMD wasn't able to make socketed RAM work, timings aren't viable. So Framework has the choice of doing it this way or not doing it at all.

                                    J 1 Reply Last reply
                                    0
                                    • B [email protected]

                                      The cache hierarchy has flopped? People aren't using swap?

                                      NUMA also hasn't flopped, it's just that most systems aren't dual socket. Different memory speeds for the same CPU is not ideal and you don't build a system like that but among upgraded systems that's not rare at all.

                                      E This user is from outside of this forum
                                      E This user is from outside of this forum
                                      [email protected]
                                      wrote on last edited by
                                      #18

                                      Yeah, the cache hierarchy is behaving kinda wonky lately. Many AI workloads (and that’s what’s driving development lately) are constrained by bandwidth, and cache will only help you with a part of that. Cache will help with repeated access, not as much with streaming access to datasets much larger than the cache (i.e. many current AI models).

                                      Intel already tried selling CPUs with both on-package HBM and slotted DDR-RAM. No one wanted it, as the performance gains of the expensive HBM evaporated completely as soon as you touched memory out-of-package. (Assuming workloads bound by memory bandwidth, which currently dominate the compute market)

                                      To get good performance out of that, you may need to explicitly code the memory transfers to enable prefetch (preferably asynchronous) from the slower memory into the faster, á la classic GPU programming. YMMW.

                                      B 1 Reply Last reply
                                      0
                                      • fiddlesticks@lemmy.dbzer0.comF [email protected]

                                        Couldn't you just treat the socketed ram like another layer of memory effectively meaning that L1-3 are on the CPU "L4" would be soldered RAM and then L5 would be extra socketed RAM? Alternatively couldn't you just treat it like really fast swap?

                                        E This user is from outside of this forum
                                        E This user is from outside of this forum
                                        [email protected]
                                        wrote on last edited by
                                        #19

                                        Wrote a longer reply to someone else, but briefly, yes, you are correct. Kinda.

                                        Caches won’t help with bandwidth-bound compute (read: ”AI”) it the streamed dataset is significantly larger than the cache. A cache will only speed up repeated access to a limited set of data.

                                        1 Reply Last reply
                                        0
                                        • fiddlesticks@lemmy.dbzer0.comF [email protected]

                                          Couldn't you just treat the socketed ram like another layer of memory effectively meaning that L1-3 are on the CPU "L4" would be soldered RAM and then L5 would be extra socketed RAM? Alternatively couldn't you just treat it like really fast swap?

                                          B This user is from outside of this forum
                                          B This user is from outside of this forum
                                          [email protected]
                                          wrote on last edited by
                                          #20

                                          Could it work?

                                          Yes, but it would require:

                                          • A redesigned memory controller capable of tiering RAM.
                                          • OS-level support for dynamically assigning memory usage based on speed.
                                          • Applications/libraries optimized to take advantage of this tiering.

                                          Right now, the easiest solution for fast, high-bandwidth RAM is just to solder all of it.

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups