this post was submitted on 27 Jul 2024
197 points (99.5% liked)

Technology

34886 readers
31 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] pivot_root@lemmy.world 27 points 3 months ago* (last edited 3 months ago) (2 children)

Moore's Law is Dead shared an interesting video yesterday about these chips. Supposedly, leaks from his sources at Intel say that high voltages being pushed through the ring bus cause degradation. The leaks claim it shares the same power rail as the P and E cores, meaning it's influenced by the voltage requested by the cores.

For context, the ring bus is responsible for communication between cores, peripherals, and the platform. This includes memory accesses, which means that if the ring bus fails and does something incorrectly, it could appear normal but result in errors far down the line.

Going beyond the video specifically, and considering what others have suggested as workarounds, it seems like ring bus degradation might be a decent candidate for the actual root cause of these issues.

Some observations around chips degrading were:

  • High memory pressure exacerbates the issue.
  • Chips with more cores deteriorate faster.

Some of the suggestions to work around the issue were:

  • Lower the memory speed.
  • Lower the voltage and clock speeds.
  • Disabling E cores.

All of those can be related to stress being put on the ring bus:

  • Higher voltage being put through the bus -> higher likelihood of physical damage
  • More memory pressure -> more usage of the bus, more opportunity for damage to accumulate
  • More cores -> more memory pressure
  • Slower memory speeds -> less maximum throughput -> less stress

I'm not claiming anything definitive, but I think my money is on this one.

[–] KarnaSubarna@lemmy.ml 4 points 3 months ago (1 children)

Thanks for the additional details.

The scariest part of this whole problem is there is no way for the owners of i13/14 CPU to figure out to what extent the CPU is damaged. It's like holding a ticking bomb without knowing when that will go off!

[–] pivot_root@lemmy.world 2 points 3 months ago

100%. Whatever Intel does at this point, I don't trust it to be a fix so much as a mitigation or attempt to delay the inevitable until a few years after the warranty period.

If it's possible for people to return their 13th/14th gen processor and trade up for a 12th gen, that would be the safest solution.

I've heard speculation that this is exasperated by a feature where the CPU increases the voltage to boost clocks when running single core workloads at low temperatures. If that's true, having less load or better cooling may be detrimental to the life of the processor.