Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Linux
  3. CPU errors?

CPU errors?

Scheduled Pinned Locked Moved Linux
linux
22 Posts 10 Posters 104 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M [email protected]

    I don't think overheating would cause random corruptions (it should throttle down when overheating, and then shut down if the temperature gets too high even when throttled, but there should never be an incorrect result of any computation), and surely the RAM will run at the standard 2133 speed on default settings - OP says they reset the BIOS settings to default between CPU swaps.

    L This user is from outside of this forum
    L This user is from outside of this forum
    [email protected]
    wrote on last edited by
    #10

    RAM is indeed at 2133 MHz and the cooling is great, got a tower cooler (Scythe Kotetsu mark II), idle temps are in the low 30's C, stress temp was 76C.

    1 Reply Last reply
    0
    • xiisadaddy@lemmygrad.mlX [email protected]

      Have you checked that your motherboard supports that cpu? And that if it does its fully updated? A lot of motherboards will update with support for new cpus and i think also there are some boards that stopped support around the 5600s generation.

      L This user is from outside of this forum
      L This user is from outside of this forum
      [email protected]
      wrote on last edited by
      #11

      Yep, it's explicitly listed in the supported list and BIOS is up to date.

      1 Reply Last reply
      0
      • C [email protected]

        It might be the CPU, but it might be something else. On the old CPU, update the OS, update the BIOS, and run fwupd or boot Windows temporarily to update all other firmware. Then run memtest and a cpu stress test to make sure you're not just triggering an existing hardware issue.

        If that's all clean, put in the new CPU and run memtest and a cpu stress test to see where you get issues.

        L This user is from outside of this forum
        L This user is from outside of this forum
        [email protected]
        wrote on last edited by
        #12

        Everything is up to date as far as I can tell, I did Windows too.

        memtest ran fine for a couple of hours, CPU stress test hang up partway through though, while CPU temp was around 75C.

        F 1 Reply Last reply
        0
        • sga@lemmings.worldS [email protected]

          can you give more details about your system, what kernel you are using what distro? also, have you tried testing a live boot usb to do testing, and also (I am presuming that you changed motherboard, and kept rest same) can you replace 1 part at a time and check.

          Usually random breakages mean memory or power errors, and even different modules breaking can be caused by corrupt storage (maybe the readheads of hdd are broken, or if ssd, something something something, maybe broken/damged contacts). But you say going back to older cpu fixed this. Can you check what speed does your menory work at, sometimes, stable ram speeds depend on cpu, maybe you had it lower/higher for older cpu (usually lower than stable is not issue, higher is the issue), or maybe, your newer cpu is drawing more power, or you have some kind of faulty contact and some stray currents are developing on your motherboard. A fey more details would be really helpful

          L This user is from outside of this forum
          L This user is from outside of this forum
          [email protected]
          wrote on last edited by
          #13

          All hardware is the same, I'm trying to upgrade from a Ryzen 3100 so everything should be compatible. Both old and new CPU have a 65W TDP.

          I'm on Manjaro, everything is up to date, kernel is 6.12.17.

          Memory runs at 2133 MHz, same as for the other CPU. I usually don't tweak BIOS much if at all from the default settings, just change the boot drive and stuff like "don't show full logo at startup".

          I've add some voltage readings in the post and answered some other posts here.

          1 Reply Last reply
          0
          • bombomom@lemmy.worldB [email protected]

            Things to try:

            • Update your BIOS, this will likely solve the issue since things are working fine with your prior CPU.
            • If you are running an XMP/EXPO profile, turn it off, as it might be making your system unstable.
            • Memtest. Run it for at least a full cycle (which takes about an hour). If you see more than a single error, then there is something wrong with your RAM.
            L This user is from outside of this forum
            L This user is from outside of this forum
            [email protected]
            wrote on last edited by
            #14

            BIOS is up to date, CPU model explicitly listed as supported, memtest ran fine, not using XMP profiles.

            1 Reply Last reply
            0
            • L [email protected]

              Everything is up to date as far as I can tell, I did Windows too.

              memtest ran fine for a couple of hours, CPU stress test hang up partway through though, while CPU temp was around 75C.

              F This user is from outside of this forum
              F This user is from outside of this forum
              [email protected]
              wrote on last edited by
              #15

              75C is fine, the CPU will throttle in order to avoid max temps. This isn't something that should cause instability.

              It's POSSIBLE that this is a bug that's fixed with a microcode update, see here for installing it: https://wiki.archlinux.org/title/Microcode

              TL;DR:

              1. Install amd-ucode
              2. Edit /etc/mkinitcpio.conf, add microcode after autodetect
              3. sudo mkinitcpio -P
              4. reboot

              If that doesn't fix it, and it crashes in Windows too, it may be a hardware problem. There isn't much you need to do in order to get a CPU working.

              L 1 Reply Last reply
              0
              • L [email protected]

                I'm trying a new CPU in my PC (Ryzen 5600) and I'm seeing:

                • Sporadic kernel panics during boot.
                • Random .ko.zst module files (different one each boot) complaining that ZST decompression failed checksum.
                • Random .so's failing to find a symbol and causing programs to crash/fail to start.
                • Started a stress-ng sequential session at 5s per stressor and it hung up after a dozen stressors. Couldn't ctrl-c it and also ps didn't work anymore. 😅

                Funny thing is, other than that the system runs fine (when it boots, that is).

                Switched back to my old CPU (that's the only change in the machine) and all of these things stopped.

                That CPU that's doing that is defective, correct? Just double-checking I'm not missing anything else.

                I've reset BIOS between CPU swaps and left it at defaults. Could default settings cause a CPU to act like this?

                L This user is from outside of this forum
                L This user is from outside of this forum
                [email protected]
                wrote on last edited by
                #16

                I had a 3700x that would lock up sometimes at light usage. Passed every stress test, and could idle for days. I swapped ram, psu, and mb with no effect. It's possible the microcode and firmware mentioned here could have fixed it but I got another CPU and all my problems went away. Worth the $200 for me.

                L 1 Reply Last reply
                0
                • F [email protected]

                  75C is fine, the CPU will throttle in order to avoid max temps. This isn't something that should cause instability.

                  It's POSSIBLE that this is a bug that's fixed with a microcode update, see here for installing it: https://wiki.archlinux.org/title/Microcode

                  TL;DR:

                  1. Install amd-ucode
                  2. Edit /etc/mkinitcpio.conf, add microcode after autodetect
                  3. sudo mkinitcpio -P
                  4. reboot

                  If that doesn't fix it, and it crashes in Windows too, it may be a hardware problem. There isn't much you need to do in order to get a CPU working.

                  L This user is from outside of this forum
                  L This user is from outside of this forum
                  [email protected]
                  wrote on last edited by
                  #17

                  This sounds like my best shot, thank you.

                  I've installed the amd-ucode package. It already adds microcode to the HOOKS array in /etc/mkinitcpio.conf and runs mkinitcpio -P but I've moved microcode before autodetect so it bundles code for all CPUs not just for the current one (to have it ready when I swap) and re-ran mkinitcpio -P.

                  I've seen the message "Early uncompressed CPIO image generation successful" pass by, and lsinitcpio --early /boot/initramfs-6.12-x86_64.img|grep micro shows kernel/x86/microcode/AuthenticAMD.bin. I've also confirmed that /usr/lib/firmware/amd-ucode/README lists an update for that new CPU (and for the current one, speaking of which).

                  Now from what I understand all I have to do is reboot and the early stage will apply the update? Since it's bundled with the main initramfs I was given to understand I don't need to add a parameter pointing at a separate .img file.

                  F 1 Reply Last reply
                  0
                  • L [email protected]

                    I had a 3700x that would lock up sometimes at light usage. Passed every stress test, and could idle for days. I swapped ram, psu, and mb with no effect. It's possible the microcode and firmware mentioned here could have fixed it but I got another CPU and all my problems went away. Worth the $200 for me.

                    L This user is from outside of this forum
                    L This user is from outside of this forum
                    [email protected]
                    wrote on last edited by
                    #18

                    It's a pain in the butt to swap CPUs one more time but that may pale in comparison to trying to convince the shop that a core is bad and having intermittent faults. 🤪

                    1 Reply Last reply
                    0
                    • L [email protected]

                      This sounds like my best shot, thank you.

                      I've installed the amd-ucode package. It already adds microcode to the HOOKS array in /etc/mkinitcpio.conf and runs mkinitcpio -P but I've moved microcode before autodetect so it bundles code for all CPUs not just for the current one (to have it ready when I swap) and re-ran mkinitcpio -P.

                      I've seen the message "Early uncompressed CPIO image generation successful" pass by, and lsinitcpio --early /boot/initramfs-6.12-x86_64.img|grep micro shows kernel/x86/microcode/AuthenticAMD.bin. I've also confirmed that /usr/lib/firmware/amd-ucode/README lists an update for that new CPU (and for the current one, speaking of which).

                      Now from what I understand all I have to do is reboot and the early stage will apply the update? Since it's bundled with the main initramfs I was given to understand I don't need to add a parameter pointing at a separate .img file.

                      F This user is from outside of this forum
                      F This user is from outside of this forum
                      [email protected]
                      wrote on last edited by
                      #19

                      Yup, just reboot to apply it.

                      It'll show up in dmesg: "microcode updated early to Rev. ###' etc

                      L 1 Reply Last reply
                      0
                      • F [email protected]

                        Yup, just reboot to apply it.

                        It'll show up in dmesg: "microcode updated early to Rev. ###' etc

                        L This user is from outside of this forum
                        L This user is from outside of this forum
                        [email protected]
                        wrote on last edited by
                        #20

                        Welp no change. I'm guessing the motherboard firmware already contained the latest microcode. Oh well, was worth a try, thank you.

                        1 Reply Last reply
                        0
                        • L [email protected]

                          Motherboard is a Gigabyte B450 Aorus M. It's fully updated and support for this particular CPU is explicitly listed in a past revision of the mobo firmware.

                          Manual doesn't list any specific CPU settings but their website says stepping A0, and that's what the defaults were setting. Also I got "core speed: 400 MHz", "multiplier: x 4.0 (14-36)".

                          even some normal batch cpus might sometimes require a bit more (or less) juice or a system tweak

                          What does that involve? I wouldn't know where to begin changing voltages or other parameters. I suspect I shouldn't just faff about in the BIOS and hope for the best. 😕

                          K This user is from outside of this forum
                          K This user is from outside of this forum
                          [email protected]
                          wrote on last edited by
                          #21

                          No, I would search for your motherboard model and forums to see what situations might match yours so that you might glean something useful as far as settings go. A quick check revealed nothing useful that stands out to me. Resetting all electrical connections was the lone useful tip. (The reddit link blocked me, lol. Fine) Perhaps more detailed (or different) search terms would produce better results.

                          I think you've taken the right steps to this point. Another CPU to test with would prove useful (though your original should suffice). Or another board to test this CPU in. Perhaps the shop you procured this one from has one or the other? Otherwise, I would pursue replacement.

                          L 1 Reply Last reply
                          0
                          • K [email protected]

                            No, I would search for your motherboard model and forums to see what situations might match yours so that you might glean something useful as far as settings go. A quick check revealed nothing useful that stands out to me. Resetting all electrical connections was the lone useful tip. (The reddit link blocked me, lol. Fine) Perhaps more detailed (or different) search terms would produce better results.

                            I think you've taken the right steps to this point. Another CPU to test with would prove useful (though your original should suffice). Or another board to test this CPU in. Perhaps the shop you procured this one from has one or the other? Otherwise, I would pursue replacement.

                            L This user is from outside of this forum
                            L This user is from outside of this forum
                            [email protected]
                            wrote on last edited by
                            #22

                            Honestly I'll just send it back at this point. I have kernel panics that point to at least two of the cores being bad. Which would explain the sporadic nature of the errors. Also why memcheck ran fine because it only uses the first core by default. Too bad I haven't thought about it when running memtest because it lets you select cores explicitly.

                            1 Reply Last reply
                            0
                            • System shared this topic on
                            Reply
                            • Reply as topic
                            Log in to reply
                            • Oldest to Newest
                            • Newest to Oldest
                            • Most Votes


                            • Login

                            • Login or register to search.
                            • First post
                              Last post
                            0
                            • Categories
                            • Recent
                            • Tags
                            • Popular
                            • World
                            • Users
                            • Groups