What do y’all use to monitor many linux servers?
-
[email protected]replied to [email protected] last edited by
- Base ansible role installs Prometheus node exporter, configured with the text file collector
- VM automations push DNS records so that the Prometheus dns-sd automatically discovers them
- Ansible roles for add Cron jobs that generate metrics for specific systems and dump them for the text file collector
- Grafana for dashboards
- Karma as a UI in front of Prometheus alert manager
-
[email protected]replied to [email protected] last edited by
Cockpit.
-
[email protected]replied to [email protected] last edited by
is cockpit on a server by server basis or can you monitor multiple servers with it?
-
[email protected]replied to [email protected] last edited by
Netdata is exactly what you're looking for. It's basically an all in one monitoring and and alerting suite that collects and analyzes data, and provides a gorgeous web dashboard for you to view.
You can also manually replicate this using Prometheus, Grafana and other tools, but that requires a much bigger effort to set up.
Edit: There's a public demo instance where you can try everything out: https://frankfurt.netdata.rocks/
-
[email protected]replied to [email protected] last edited by
I think they went to 5 nodes max on the free version as of the last patch. That's damn near useless.
-
[email protected]replied to [email protected] last edited by
I use my family. It has a simple volume based alert for when services are offline.
-
-
[email protected]replied to [email protected] last edited by
It'll even automatically configured variable alert volumes corresponding to the importance of the service!
-
[email protected]replied to [email protected] last edited by
The five node limit is a dealbreaker for me too. I'm also annoyed the free version doesn't have any real built in options to secure data by default. I followed a TechnoTim tutorial to get the NetData/Prometheus/Grafana stuff setup but it was too limited and required too much manual effort.
-
[email protected]replied to [email protected] last edited by
Seconding Netdata, I've been using it for years. It's pretty great.
-
[email protected]replied to [email protected] last edited by
Is that just for the centralized dashboard portion? I tend to use each instance of it standalone, and primarily for the email alerts.
-
[email protected]replied to [email protected] last edited by
Any chance you'd be willing to share playbooks or point me toward any resources you used?
I use Ansible to manage config across all my workstations/servers but I haven't gotten around to automating log shipping yet or aggregating system metrics.
-
[email protected]replied to [email protected] last edited by
I believe so. I imagine the next stage of the enshittification will be to force those standalones to register with a portal account.
-
[email protected]replied to [email protected] last edited by
That would be a truly dark day. I never liked their centralized dashboard functionality, it always seemed cumbersome to me.
I hope that doesn't happen, but I guess if it does, I will really need to find a different monitoring tool.
-
[email protected]replied to [email protected] last edited by
You can monitor multiple machines via the host switcher menu at the top-left of the screen: Multiple Machines
-
[email protected]replied to [email protected] last edited by
Oh that sucks. I haven't used it personally in quite a while, since I switched to the Grafana stack
-
[email protected]replied to [email protected] last edited by
Until the UPS battery gets low and it beeps, and they look for a way to turn it off vs calling you. Yup.