The system:
MSI Raider GE67 HX 12UHS
Intel Core i9-12900HX
nVidia GeForce RTX 3080Ti (laptop)
32GiB RAM
Win11 Pro 64-bit
The problem:
Once in a while (usually 2-3 times per day), the system crashes, usually resulting in a blue screen with one of various error codes. Codes I've seen include:
HYPERVISOR_ERROR
CLOCK_WATCHDOG_TIMEOUT
VIDEO_TDR_FAILURE
IRQL_NOT_LESS_OR_EQUAL
Sometimes the system hangs but the blue screen never comes, and I have to power it off manually. When this happens, the fans go to full speed and yet the laptop quickly becomes incredibly hot if I don't power it off as soon as possible, suggesting that the CPU or GPU is maxing out for some reason.
Checking with Event Viewer shows nothing out of the ordinary in the lead up to the crash.
Things I've ruled out:
I initially thought it only happened while plugged in, and bought a new power supply. That didn't seem to affect the frequency of the issue, and I also have now seen it happen while on battery. I also initially thought it was more frequent while playing games that use the dedicated graphics card, but I'm not sure that's actually true; I have seen it happen even while just watching Youtube. At one point I felt that it happened more when I moved the laptop or plugged in USB devices, but I think that may be magical thinking; I have never been able to make it happen on purpose by doing those things. It does seem to be true that after it happens, if I let the laptop restart automatically, it often happens again in a short time, but shutting down and then turning it back on gives more time before the next incident.
Solutions I've tried:
I tried updating the BIOS and the Intel firmware to the latest available on MSI's website, but that doesn't seem to have helped. I also updated my nVidia drivers.
A possibly related issue:
A week or so before this happened for the first time, I updated the BIOS to fix a different issue. What happened then was: I was playing a game on battery unintentionally, and didn't notice until that "low battery - switching to Super Battery" warning appeared and began throttling system performance. I plugged the laptop in, but performance didn't improve. I restarted and performance was terrible across all applications, even Firefox. I checked Resource Manager and noticed that the CPU was being throttled down to around 0.16GHz. Event Viewer was showing warnings about this that said the processor was being limited by system firmware.
I tried using various Windows and MSI power management settings to resolve the issue, which persisted across restarts, fully charging the battery, etc. In the end, I solved it by updating the BIOS (to a version that is now one version back from the most current one).
It was a while, maybe a week, after running the update that the crash happened for the first time.
Current theory:
Is it possible I screwed up the BIOS update somehow? I noticed that it instructs you to return clock speeds to stock before doing the update. I don't think I've manually adjusted them, but MSI's "MSI Center" software seems to offer automatic adjustment. It was set to "Balanced" when I did the most recent update, but it may have been set to "Auto" when I did the first one, which I guess could be a problem if the CPU was automatically overclocked.
2077 basically breaks from its source material over this. There's a series of side quests where you are asked to non-lethally subdue people suffering from "cyberpsychosis" in order to facilitate independent research on rehabilitation, and it turns out that basically all of them are either a) suffering from medical side effects that (according to some other in-game documents) are known to cyberware manufacturers, but being swept under the rug to keep sales and profits flowing, and/or b) suffering from untreated psychological trauma, and it just turns out lashing out is a lot more destructive when you happen to have a ton of built in weapon systems that are always with you and ready to engage at a moment's notice. The "humanness" angle is sometimes seen being pushed by the media, but it's basically an excuse the corps use to shift blame away from their faulty products.