this post was submitted on 22 Oct 2023

3 points (100.0% liked)

Self-Hosted Main

582 readers

19 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

For Example

Service: Dropbox - Alternative: Nextcloud
Service: Google Reader - Alternative: Tiny Tiny RSS
Service: Blogger - Alternative: WordPress

We welcome posts that include suggestions for good self-hosted alternatives to popular online services, how they are better, or how they give back control of your data. Also include hints and tips for less technical readers.

Useful Lists

Awesome-Selfhosted List of Software
Awesome-Sysadmin List of Software

founded 2 years ago

MODERATORS

communick@selfhosted.forum

How do you all monitor your server performance? (alien.top)

submitted 2 years ago by Michaelscarn69-@alien.top to c/main@selfhosted.forum

92 comments fedilink hide all child comments

As in, when I watched YouTube tutorials, I often see YouTubers have a small widget on their desktop giving them an overview of their ram usage, security level, etc. What apps do you all use to track this?

top 50 comments

sorted by: hot top controversial new old

[–] Dizzybro@alien.top 2 points 2 years ago (3 children)

The fastest way? Probably netdata

[–] SadanielsVD@alien.top 2 points 2 years ago (1 children)

This. If you have more servers you can also get them all connected to a single UI where you can see all the Infos at once. With netdata cloud

[–] Spaceman_Splff@alien.top 1 points 2 years ago (1 children)

Just set this up yesterday. I used a parent node and then have all my vms point to that. Took like an hour to figure it out

[–] scotrod@alien.top 1 points 2 years ago (2 children)

Hey, did you use the cloud functionality or not? I'm tryna go all local with parent-child kind of capability but so far unable to.

[–] Spaceman_Splff@alien.top 2 points 2 years ago

The parent still is visible to the cloud portal. My understanding is the data all resides local, but when you login to their cloud portal, it connects to the parent to display the information. I’m still playing with it to confirm. My parent node shows all the child nodes on the local interface but the cloud still shows them all.

load more comments (1 replies)

[–] weller_rocks@alien.top 1 points 2 years ago

agreed .... BY FAR the fastest. Easiest learning curve as well

load more comments (1 replies)

[–] AstrologicalMob@alien.top 2 points 2 years ago (2 children)

I currently use thr classic "Hu seems slow, checks basic things like disk usage and process CPU/RAM usage I'll do a reboot to fix it for now".

[–] Nagashitw@alien.top 1 points 2 years ago

This is me. Can't hurt to just do a reboot

[–] dibu28@alien.top 1 points 2 years ago

Windows Server? )

[–] Theon@alien.top 2 points 2 years ago (1 children)

Netdata, I've meant to look into Grafana but it always seemed way too overcomplicated and heavy for my purposes. Maybe one day, though...

[–] weller_rocks@alien.top 1 points 2 years ago

I thought the same thing but it's not bad actually, there are some pre build dashboards you can import for common metrics from Linux, windows, firewalls etc .....

netdata is much better though (IMHO)

[–] Mother_Construction2@alien.top 2 points 2 years ago

I know that it needs a fix when my dad complaining that he can’t watch TV and the rolling door doesn’t open in the morning.

[–] HCharlesB@alien.top 2 points 2 years ago (1 children)

Checkmk (Raw - free version.) Some setup aspects are a bit annoying (wants to monitor every last ZFS dataset and takes too long to 'ignore' them one by one.) It does alert me to things that could cause issues, like the boot partition almost full. I run it in a Docker container on my (primarily) file server.

[–] TheDeepTech@alien.top 1 points 2 years ago

I use this as well! Works well and has built in intelligence for thresholds.

[–] weller_rocks@alien.top 1 points 2 years ago

easiest by far to set up, plenty of metrics

https://www.netdata.cloud/

[–] Olleye@alien.top 1 points 2 years ago

Use PRTG, up until 100 sensors it’s free.

Best Monitoring tool ever ☝🏻🙂

[–] servergeek82@alien.top 1 points 2 years ago

Glances, uptime-kuma, and back end script that reboots service if down. If it doesn't work I get a notification via gotify. Simple and sweet

[–] chuchodavids@alien.top 1 points 2 years ago

None. There is no need for a performance monitor for my home lab. I just have an alert if one of my main three services is down. That is all i need.

[–] TheDeepTech@alien.top 1 points 2 years ago (2 children)

I recommend Checkmk. https://checkmk.com/

[–] djbon2112@alien.top 2 points 2 years ago (4 children)

I second CMK.

A TICK stack is unwieldy, Grafana takes a lot of setup, and all of this assumes you both know what to monitor and get stats on it.

CMK by contrast is plug and play. Install the server on a VM or host, install thr agent on your other systems, and you're good to go.

load more comments (4 replies)

load more comments (1 replies)

[–] bobbarker4444@alien.top 1 points 2 years ago

I just check the proxmox dashboard every now and then. Honestly if everything is working I'm not too worried about exact ram levels at any given moment

[–] ElevenNotes@alien.top 1 points 2 years ago

Netdata, monitoring a few thousand servers (virtual) that way.

[–] Pesfreak92@alien.top 1 points 2 years ago (2 children)

Uptime Kuma and Grafana. Uptime Kuna to monitor if a service is up and running and Grafana to monitor the host like CPU, RAM, SSD usage etc.

[–] Reasonable-Ladder300@alien.top 1 points 2 years ago

Same here, also have some autoscaling mechanisms set up in docker swarm to scale certain services in case the load is high

load more comments (1 replies)

[–] jln_brtn@alien.top 1 points 2 years ago (3 children)

Nobody mentioned htop 🤔

[–] thekrautboy@alien.top 1 points 2 years ago

htop is a selfhosted service?

[–] speculatrix@alien.top 1 points 2 years ago

Bashtop is pretty. But not scalable.

load more comments (1 replies)

[–] thekrautboy@alien.top 1 points 2 years ago

Just to make sure: You are aware that a search option here exists, yes? And you keep refusing to use it for whatever reason?

[–] squadfi@alien.top 1 points 2 years ago

I personally use Influxdb , telegraf and grafana

[–] The_Axelander@alien.top 1 points 2 years ago

I use checkmk with notifications to a telegram bot

[–] borouhin@alien.top 1 points 2 years ago (9 children)

Alerts are much more important than fancy dashboards. You won't be staring at your dashboard 24/7 and you probably won't be staring at it when bad things happen.

Creating your alert set not easy. Ideally, every problem you encounter should be preceded by corresponding alert, and no alert should be false positive (require no action). So if you either have a problem without being alerted from your monitoring, or get an alert which requires no action - you should sit down and think carefully what should be changed in your alerts.

As for tools - I recommend Prometheus+Grafana. No need for separate AletrManager, as many guides recommend, recent versions of Grafana have excellent built-in alerting. Don't use those ready-to-use dashboards, start from scratch, you need to understand PromQL to set everything up efficiently. Start with a simple dashboard (and alerts!) just for generic server health (node exporter), then add exporters for your specific services, network devices (snmp), remote hosts (blackbox), SSL certs etc. etc. Then write your own exporters for what you haven't found :)

[–] atheken@alien.top 1 points 2 years ago (1 children)

One thing about using Prometheus alerting is that it’s one less link in the chain that can break, and you can also keep your alerting configs in source control. So it’s a little less “click-ops,” but easier to reproduce if you need to rebuild it at a later date.

[–] borouhin@alien.top 1 points 2 years ago

When you have several Prometheus instances (HA or in different datacenters), setting up separate AlertManagers for each of them is a good idea. But as OP is only beginning his journey to monitoring, I guess he will be setting up a single server with both Prometheus and Grafana on it. In this scenario a separate AlertManager doesn't add reliability, but adds complexity.

As for source control, you can write a simple script using Grafana API to export alert rules (and dashboards as well) and push them to git. Not ideal, sure, but it will work.

Anyway, it's never too late to go further and add AlertManager, Loki, Mimir and whatever else. But to flatten the learning curve I'd recommend starting with Grafana alerts that are much more user-friendly.

load more comments (8 replies)

[–] LNDN91@alien.top 1 points 2 years ago

Rainmeter if it's directly on their desktop/background.

[–] xardoniak@alien.top 1 points 2 years ago

I use Uptime Kuma to monitor particular services and NetData for server performance. I then pipe the alerts through to Pushover

[–] krysinello@alien.top 1 points 2 years ago

Grafana. Have alerts set up and get data with node exporter and cadvisor with some other containers giving some metrics.

I have alerts setup and they just ping me on a discord server I setup. High cpu and temps low disk space memory things like that. Mostly get high CPU or temp alerts and that's usually when plex does its automated things at 4am.

[–] dom9301k@alien.top 1 points 2 years ago

Prometheus + Grafana, the same I use at my job.

[–] trisanachandler@alien.top 1 points 2 years ago

Honestly my load is so light I don't bother monitoring performance. Uptime kuma for uptime, I used to use prtg and uptime robot when I ran a heavier stack before I switched to an all docker workload.

[–] Charming-Molasses-22@alien.top 1 points 2 years ago (1 children)

I don't check it all the time like a maniac but I have a glances docker running on my main server.

[–] opensrcdev@alien.top 1 points 2 years ago

Glances is really nice. I've been using btop more recently though.

[–] speculatrix@alien.top 1 points 2 years ago

I use Zabbix. Runs fine in a relatively small VM. Easy to write plugins.

[–] BloodyIron@alien.top 1 points 2 years ago

libreNMS is the tool I use, and it connects to systems primarily via SNMP (use v3, do not use v1 or v2c).

[–] gold76@alien.top 1 points 2 years ago

Influx/telegraf/grafana stack. I have all 3 on one server and then I put just telegraf on the others to send data into influx. Works great for monitoring things like usage. You can also bring in sysstat.

I have some custom apps as well where each time they run I record the execution time and peak memory in a database. This lets me go back over time and see where something improved or got worse. I can get a time stamp and go look at gitea commits to see what I was messing with.

[–] Majestic-Contract-42@alien.top 1 points 2 years ago

If one of my users ever complained about anything I would possibly look into it, otherwise it all works so I don't waste life energy on that.

[–] how_now_brown_cow@alien.top 1 points 2 years ago

TICK stack is the only answer

[–] Do_TheEvolution@alien.top 1 points 2 years ago

Prometheus + Grafana + Loki

It is bit difficult at start, but really in the end you can monitor and get notification on anything thats happening on your system.

[–] damn_the_bad_luck@alien.top 1 points 2 years ago

When the fan gets loud enough to hear, I'll check it :P

[–] opensrcdev@alien.top 1 points 2 years ago

InfluxDB metrics server and Telegraf agent to collect metrics

load more comments