Don't make a mess, and do the changes you need with ansible. Effectively making its code your documentation.
Sysadmin
A community dedicated to the profession of IT Systems Administration
Yes. Documentation. Documentation aaaalll the way.
You are right. In two months you wont remember the shit you had to enable/disable to make things work.
Doing things that arent a reocurring doing should be documented. Not crazy. A basic how to set up is enough.
Common/reocurring errors/situations? Document 'em
Got a semi permanent fix for problem, so that it will most likely never come up again, but possibly in 5 years? Document it fella.
You'll kiss your past self on the head and say thanks when you have an critical ticket in 5 years and remember nothing about the doing itself but that you wrote some documentation.
It will save your ass and possibly you might come out as the hero of the day for having a solution right away for a super nieche problem.
I've making a private hosted documentation for stuff, tricks and problems i learn at work.
I've had plenty of situatuons where i remembered that i already encountered such a situation yeeeaars ago at my previois employer and that i've written somtehting down in my personal documentation. Bam and just by a few mins I've got either a really good or at least a shittysysadmin-style solution that works.
Yep, and don't just state the what, but the why in your docs.
The why really helps with knowing if a step is still important, or if it no longer applies. This is especially important with anything cloud based, as I've seen weird workarounds become no longer needed due to updates, and I would never have caught it without my notes on why we had the weird workaround to begin with.
You are right. In two months you wont remember the shit you had to enable/disable to make things work.
Tried to login to my router that I reset up about 2 months ago...hell if I remember the password
Follow some basic rules so as to avoid making the mess.
Only install standard packages from distro's repository and Python's pseudo-official PIP. For both, keep a text file with the installed package names. No compiling from source EVER. Too much hassle to maintain.
Back up config files that I changed. Not all of them.
Keep a text file to record what I did, with exact commands etc, whenever I need to go off-road. Much experience taught me that this is a chore that is very much worth the effort.
But still, the problem you point to is real. It's the reason for immutable distros. The idea of which I find quite tempting.
"Infrastructure as code" is what the strategy is typically called. You use one of the many tools for orchestrating configuration of hosts (Ansible, OpenTofu, Puppet, Saltstack, Chef, etc.). These allow you to provide configuration files and code for setting up your hosts in a central place. This place is typically a Git repo, allowing you to keep track of when which change was made.
Depending on the tool you use, you trigger applying the configuration on your dev PC, or there's a hosted CI/CD server which automatically rolls out the changes when a new commit is pushed.
Declarative configuration fixes this problem. You don't really have to write down how to setup something because the configuration is the description.
I use NixOS so in my case all the stuff you described would be defined in a Nix code in a separate Calibre module. I can enable and disable such module at will with a single option in my main config file.
I really recommend looking into immutable, declarative systems. I think NixOS is the most complete solution but there are some other too. I have no experience with them though.
We use a mix of FreshDesk for tickets/(some) projects/helpdesk articles and Teams/Sharepoint for documentation and distribution of info/help to techs, analysts, end users, etc.
As for the non-technical side of the answer: Basically, yeah, just document everything you can when you come across anything that needs documented.
I take daily work log notes in obsidian, then transclude chunks from those notes into topic notes and attach config files, images, context from the web, etc.
Obsidian was a game changer for me. I just paste all the stuff, make a tag and forget about it until it is needed. For those not using it, there are dozens of plugins and unlimited options for customization. I have a standard daily note with timestamps and changelog over the whole vault and sqlite-like queries that manage dynamic dataviews.
I keep a documentation page in my wiki for every thing I set up - how I did it, what I ran into, how I fixed it, and where everything is. Reason being, when it comes time to upgrade or I have to install it again someplace else, I remember how I did it. Basically, every completed step gets copy-and-pasted into a page along with notes about it.
As for watching the file system, I have AIDE on all of my boxen (configured to run daily, but not configured to copy the new AIDE database over the old one automatically). That way, I can look at the output of an AIDE run and see what new files were created where (which would correspond to when I installed the new thing).
code forges are great for management tasks. host an internal forgejo, and create repos for your servers and services. use issues for keeping track of initial setup, config changes and upgrades. have a longer term issue for whenyou just want to record a little change but too lazy to open a full issue for it. you can also store config in the git repo, and write docs as wiki pages for things that are more stable or important aspects of your systems
I use a lot of comments in config files, and in the past I've also used bookstack to make documentation (something I should probably do again). You're right that docker (especially docker compose) has helped with this immensely.