Help with Home Server Architecture and Hardware Selection?
-
[email protected]replied to [email protected] last edited by
Thanks so much for flagging that, the above 4g decoding wasn't even on my radar. And I think you and another commenter have sold me on trying for an EPYC mobo and dual 3090 combination. If you don't mind my asking, did you get your 3090's new or used? I feel like used is the way to go from a cost perspective, but obviously only if it wasn't used 24/7 in a mining rig for years on end (and I am not confident in my ability to make a good call on that as of yet. I guess I'd try to get current benchmarks and just try to visually inspect from photos?) But thanks again!
-
[email protected]replied to [email protected] last edited by
Yea, I keep the outlet around as a reminder lol
-
[email protected]replied to [email protected] last edited by
Thank you! I think I am just at the "Valley of Despair" portion of the Dunning-Kruger effect lol, but the good news is that it's hopefully mostly up from here (and as you say, a good finished product is infinitely better than a perfect idea).
-
[email protected]replied to [email protected] last edited by
They're the best site around for high quality/capacity drives that don't cost an arm and a leg. Another great resource for tools n' stuff is awesomeselfhosted
Website: https://awesome-selfhosted.net/
Github: https://github.com/awesome-selfhosted/awesome-selfhosted
-
[email protected]replied to [email protected] last edited by
Makes sense, this was also years ago so small details are being forgotten, could have also been a 15 or possibly 20. It was one circuit split between 2 rooms, which was the norm apparently for the time it was built in the early 80s (and not a damn thing was ever upgraded, including the outlets)
It was also a small extinguisher handleable fire, but it was enough to be scary AF LMAO
-
[email protected]replied to [email protected] last edited by
Don't worry about how a video card was used. Unless it was handled by howtobasic, they're gonna break long after they're obsolete. You might worry about a bad firmware setup, but you avoid that by looking at the seller rating, not the video card.
there's an argument to be made that a mining gpu is actually the better card to buy since they never went hot>cold>hot>cold (thus stressing the solder joints) like a regular user would do. But it's just that; an argument. I have yet to find a well researched article on the effects of long-term gaming as compared to long term mining, but I can tell you that the breaking point for either is long after you would have kept the card in use, even second or third hand.
-
[email protected]replied to [email protected] last edited by
So, I'm a rabid selfhoster because I've spent too many years watching rugpull tactics from every company out there. I'm just going to list what I've ended up with, and it's not perfect, but it is pretty damn robust. I'm running pretty much everything you talk about except much in the way of AI stuff at this point. I wouldn't call it particularly energy efficient since the equipment isn't very new. But take a read and see if it provokes any thoughts on your wishlist.
My Machine 1 is a Proxmox node with ZFS storage backing and machine 2 is mirror image but is a second Proxmox node for HA. Everything, even my OPNsense router runs on Proxmox. My docker/k8s hosts are LXCs or VMs running on the nodes, and the nodes replicate nearly everything between them as a first level, fast recovery backup/high availability failover. I can then live migrate guests around very quickly if I want to upgrade and reboot or otherwise maintain a node. I can also snapshot guests before updates or maintainance that I'm scared will break stuff. Or if I'm experimenting and like to rollback when I fuck up.
Both nodes are backed up via Proxmox Backup Server for any guests I consider prod, and I take backups every hour and keep probably 200 backups at various intervals and amounts. These dedup in PBS so the space utilization for all these extra backups is quite low. I also backup via PBS to removable USB drives on a longer schedule, and swap those out offsite weekly. Because I bind mount everything in my docker compose stacks, recovering a particular folder at a point in time via folder restore lets me recover a stack quite granularly. Also, since it's done as a ZFS snapshot backup, it's internally consistent and I've never had a db-file mismatch issue that didn't just journal out cleanly.
I also zfs-send critical datasets via syncoid to zfs.rent daily from each proxmox node.
Overall, this is highly flexible and very, very bulletproof over the last 5 or 6 years. I bought some decade old 1-U dell servers with enough drive bays and dual xeons, so I have plenty of threads and ram and upgraded to IT-mode 12G SAS RAID cards , but it isn't a powerhouse server or anything, I might be $1000 into each of them. I have considered adding and passing through an external GPU. The PBS server is a little piece of trash i3 with a 8TB sata drive and a GB NIC in it.
-
[email protected]replied to [email protected] last edited by
I’m rocking 4 used ones from 4 different people.
So far, all good
You can’t buy 3090’s new anymore anyways.
4090’s are twice as much for 15% better perf, and the 5090’s will be ridiculous prices.
2x3090 is more than enough for basic inference, I have more for training and fine tuning.
You want epyc/threadrupper etc.
You want max pcie lanes.
-
[email protected]replied to [email protected] last edited by
Thanks so much for all of this info! You're almost certainly correct that I'm overthinking this (it's definitely a talent of mine). I had been leaning z2 on the NAS only because I'd heard that the resilvering process can be somewhat intensive on the drives, especially when they're larger, but I had also seen folks say that this was probably overkill for most home settings and so I'm glad someone with experience on it could chime in. I think my biggest takeaway from what you shared is that it sounds like keeping the file system baremetal and fiddling with it as little as possible is the strategy. And I think you're totally right on the LLMs being the real sticking point; I'd had no idea just how resource intensive they were not just to train but even to operate until I started looking into running one locally. It's honestly making me think that maybe trying to roll this out in phases starting with the NAS (while also doing some other infrastructure upgrades like looking at running cat 6a and swapping out my router from the ISP all-in-one to something that can run OPNSense paired with some WAPs), might be a better place to start. Then, if I can get some early successes under my belt, I can move onto the LLM arena and see how much time, money, and tears I want to spend getting that up and running. Oh, and thanks also for mentioning TiB; it sent me down a very interesting rabbit hole on the base 10 vs. base 2 byte measurement and how drive companies use the difference to pump up the number they get to advertise; I had no idea that accounting for the discrepancy in drive size, but is definitely not surprising.
-
[email protected]replied to [email protected] last edited by
I would definitely scale things out slowly. While the NAS will eventually be the cornerstone of your setup it will be an investment. You could also try setting up a cheap server as a stand-alone to get the feel for running applications. Maybe even as cheap as a Raspberry PI or small single-board system. Some of them have pretty decent specs at very affordable costs.
There are sometimes ways to upgrade a RAID later. In one scenario I replaced the drives one at a time with larger drives and created a second RAID on the same disks (in a second partition). Wasn't a great idea perhaps - but it worked! I just expanded my LVM pool to the new RAID and was off to the races. I'm sure performance was hit with two RAIDs on the same disks - but it did the job and worked well enough for me.
I'm not as familiar with zfs to know what options it has for expansion. With MD these days I think you can just fail and replace each disk one-by-one and expand the raid to the new size once they're all replaced. MD can be pretty picky about drives having exactly the same number of sectors though so care must be taken to use the same disks or partition a bit smaller than the drive... Waiting for each disk to sync can take ages but it's possible. There may be other options for ZFS (scaling with more disks maybe?).
Good luck with your project!
-
[email protected]replied to [email protected] last edited by
for high vram ai stuff it might be worth waiting and seeing how the 24gb b580 variant is
Intel has a bunch of translation layer sort of stuff though that I think generally makes it easy to run most CUDA ai things on it, but I'm not sure if common ai software supports multi gpu with it though
IDK how cash limited you are but if it's just the vram you need and not necessarily the tokens/sec it should be a much better deal when it releases
Not entirely related but I have a full half hourly shapshotted computer backup going to a large HDD in my home server using Kopia, its very convenient and you don't need to install anything on the server except a large drive and the ability to use ssh/sftp (or another method, it supports several). It supports many compression formats and also avoids storing duplicate data. I haven't needed to use it yet, but I imagine it could become very useful in the future. I also have the same set up in the cli on the server, largely so I can roll back in case some random person happens upon it and decides to destroy everything in my Minecraft server (which is public and doesn't have a whitelist...). It's pretty easy to set up and since it can back up over the internet, its something you could easily use for a whole family.
My home server (with a bunch of used parts plus a computer from the local university surplus store) was probably about ~170$ in total (i7 6700, 16gb ddr4, 256gb ssd, 8tb hdd) and is enough to host all of the stuff I have (very light modded MC with geyser, a gitlab instance, and the backup) very easily, but it is very much not expandable (the case is quite literally tiny and I don't have space to leave it open, I could get a pcie storage controller but the psu is weak and there aren't many sata ports), probably not all that future proof either, and definitely isn't something I would trust to perform well with AI models.
this is the hdd I got, I did a lot of research and they're supposed to be super reliable. I was worried about noise, but after getting one I can say that as long as it isn't within 4 feet of you you'll probably never hear it.
Anyways, it's always nice to really do something the proper way and have something fully future proof, but if you just need to host a few light things you can probably cheap out on the hardware and still get a great experience. It's worth noting that a normal Minecraft server, backups, and a document editor for example are all things that you can run on a Raspberry Pi if you really wanted to. I have absolutely no experience using a NAS, metasearch, or heavy mods however, those might be a lot harder to get fast for all I know.
-
[email protected]replied to [email protected] last edited by
Thanks so much for sharing! I just poked around for the Ironwolf 8TB drives I was thinking of an it unfortunately looks like they're sold out for now (as are the 8TB WD Reds it looks like), but I'll definitely keep an eye out for them here (and honestly maybe explore some different size options honestly; the drive costs I was seeing on other sites was more than I expected, but wasn't sure if that was just the new normal; glad to have another option!) And thanks so much for the awesomeselfhosted list!! I don't think I'd seen everything collected in one place like that before, that will be super helpful!
-
[email protected]replied to [email protected] last edited by
This is super interesting, thanks so much for sharing! In my initial poking around, I'd seen a lot of people that suggested virtualizing TrueNAS within Proxmox was a bit of a headache (especially when something inevitably goes wrong and everything goes down), but I hadn't considered cutting out TrueNAS entirely and just running directly on Proxmox and pairing that virtualization with k8s and robust backups (I am pleasantly shocked that PBS can manage that many backups without it eating up crazy amounts of space). After the other comments I was sort of aligning around starting off with a TrueNAS build and then growing into some of the LLM stuff I mentioned, but I have to admit this is really intriguing as an alternative (even if as something to work towards once I've got some initial prototypes; figuring out k8s would be a really fun project I think). Just out of curiosity, how noisy do you find the old Dell servers? I have been hesitant both because of power draw and noise, but would love to get feedback from someone who has them. Thanks so much again for taking the time to write all of this out, I really appreciate it!
-
[email protected]replied to [email protected] last edited by
(Also very curious about all of the HA stuff; it's definitely on my list of things to experiment with, but probably down the line once I've gotten some basic infrastructure in place. Very excited at the prospect though)
-
[email protected]replied to [email protected] last edited by
Thanks for flagging this! I'd just passively absorbed second hand the mining rig fears, but you're totally right that it's not as though a regularly used overclocked gaming GPU isn't going to also be subject similar degradation (especially if the miner is intentionally underclocking). I guess the biggest fears then are just physical damage from rough install and then potential heat damage (though maybe swapping thermal pads and paste helps alleviate that?) And of course checking benchmarks for any weirdness if possible I guess...
-
[email protected]replied to [email protected] last edited by
$4,000 seems like a lot to me. Then again, my budget was like $200.
I would start by setting yourself a smaller budget. Learn with cheaper investments before you screw up big. Obviously $200 is probably a bit low but you could build something simple for around $500. Focus on upgrade ability. Once you have a stable system up skill and reflect on what you learned. Once you have a bit more knowledge build a second and third system and then complete a Proxmox cluster. It might be overkill but having three nodes gives a lot of flexibility.
One thing I will add. Make sure you get quality enterprise storage. Don't cheap out since the lower tier drives will have performance issues with heavier workloads. Ideally you should get enterprise SSD's.
-
[email protected]replied to [email protected] last edited by
Amazing, thanks again for all for all of this! I'll start keeping my eyes peeled for any good deals on 3090s that pop up (though will probably end up prioritizing the NAS build first just to get my feet wet before diving straight into the localLLM world). But thanks again for taking the time to share!
-
[email protected]replied to [email protected] last edited by
You could look into building a game streaming server. Moonlight/sunshine runs decently well and if you have decent WiFi it will be fine. Theoretically you can divide up your GPU into vGPUs but the support for that is hit or miss.
-
[email protected]replied to [email protected] last edited by
Oh, they're noisy as hell when they wind up because they're doing a big backup or something. I have them in my laundry room. If you had to listen to them, you'd quickly find something else. In the end, I don't really use much processor power on these, it's more about the memory these boards will hold. RAM was dirt cheap so having 256GB available for experimenting with kube clusters and multiple docker hosts is pretty sweet. But considering that you can overprovision both proc and ram on PM guests as long as you use your head, you can get away with a lot less. I could probably have gotten by as well or better with a Ryzen with a few cores and plenty of ram, but these were cheaper.
At times, I've moved all the active guests to one node (I have the PBS server set up as a qdevice for Proxmox to keep a quorum active, it gets pissy if it thinks it's flying solo), and I'll WoL the other one periodically to let the first node replicate to the second, then down it again when it's done. If I'm going to be away for a while, I'll leave both of them running so HA can take over, which has actually happened without me even noticing that the first server packed in a drive, the failover was so seamless it took me a week to notice. That can save a bit of power, but overall, it's a kWh a day per server which in my area is about 12 cents.
I've never seen the point of TrueNAS for me. I run Nextcloud as a docker stack using the AIO mastercontainer for myself and 8 users. Together, we use about 1TB of space on it, and that's a few people with years of photos etc. So I mount a separate virtualdisk on the docker host that both nextcloud and immich can access on the same docker host, so they can share photos saved in users NC folders that get backed up from their phones. The AIO also has Collabra office set up by default, so that might satisfy your document editing ask there.
As I said, I've thought I might get an eGPU and pass it to a docker guest for using AI. I'd prefer to get my Home Assistant setup not relying on the NabuCasa server. I don't mind sending them money and the STT service that buys me works very well for voice commands around the house, but it rubs me the wrong way to rely on anything on someone else's computers. But it's brutally slow when I try to run it even on my desktop ryzen 7800 without a GPU, so until I decide to invest in a good GPU for that stuff, I'll be sending it out. At least I trust them way more than I ever would Google or Amazon. I'd do without if that was the choice.
All of this does not need to be a jump both feet first; you can just take some old laptop and start to build a PM cluster and play with this. Your only limit will be the ram.
I've also seen people build PM clusters using Mac Pro 2013 trashcans, you can get a 12core xeon with 64GB of ram for like $200 and maybe a thunderbolt enclosure for additional drives. Those would be super quiet and probably low power usage.
-
[email protected]replied to [email protected] last edited by
The HA stuff is as hard as prepping the cluster and making sure it's repping fine, then enable whichever guests you want to HA. It's seriously not difficult at all.