Both pages are focused on aerospace engineering (and perhaps fault-tolerant systems in general), so I'm not sure how I'd rate them as authoritative sources for the lifespan of electronics in general. Possible faults in a gaming GPU might not be as critical if they cause something to fail once a year, for example.
As someone with experience running a GPU mining farm for ~2 years, my anecdote: I had about 5% of cards break down during that time, and the majority of those were just fan failures.
There is the effect of 'electromigration' at least, which causes atoms of conducting materials to be transported over time because of the current (if I understand it correctly). That might be an issue over the long term, especially at the ridiculously small scale of chip manufacturing we've reached now.
Not only that, there is also diffusion caused by difference of concentration, which is increased by heat. And you have that concentration difference at the p-n boundaries/junctions in an IC.
Though I'm not sure how much actual damage you'd see in practice, whether the ICs tend to die with intense use before e.g. the capacitors mentioned above.
That's a bit of survivor bias. I used to buy truckloads of old PCs and recycle them when I was a teenager. I initially thought that this old tech was just really built to last, but in reality I was selecting the 1% or less that could survive being shoved around a rat-feces-infested warehouse at freezing temperatures before ending up in my parents' garage, to their dismay. Those survivors seemed to last freaking forever afterward, but again, that's because they were the random fraction that just happened to have that perfect balance of durability.
When I started buying new parts as an adult, the failure rates of e.g. GPUs were pretty disappointing in comparison to the biased expectations I had from those survivor PCs.
Haha, I also noticed that used parts that lasted ~2-4 years have a lower failure rate than new ones. All the ones that fail, fail early, and the surviving ones go through a sort of "extended burn-in" so to say.
I have a longstanding habit of buying my laptops refurbished - so, a few months of wear and sometimes a "scratch and dent". To date, every one of them has been a winner on longevity.
Same, only had a problem twice, with a failed USB/audio daughterboard on an Elitebook ($20 replacement) and a failed VRM capacitor on an MXM card (replaced it but the GPU itself failed after a year so I just got a new card, $100).
I'm really sad about new laptops having everything soldered on. If something fails, you either need a good reflow station (and skills) or you have to toss the whole thing, which is insane.
It also makes parts ridiculously expensive, like a system board for a Dell Precision now goes for ~$500 where it used to be under $100. All because the CPU, GPU, VRAM and even RAM are soldered on.
Honestly, I hate where all of this is going. So much for everyone going green.
The second article mentions thermal cycling -- I always thought that running 24/7 is actually less damaging then cycling (i.e. a gaming rig or a MacBook that does 30->90->30 °C multiple times a day.)
For the wear and tear of thermal cycling to pile up, it's not required to have reboots or shutdowns. All it takes is to have temperature fluctuations that in turn developt stress fluctuations, which induce fatigue wear on materials. Low amplitude cycles are better and less damaging than high amplitude cycles, but the damage is still there building up.
To put things in perspective, not so long ago it was believed that below a certain stress delta some materials were immune to fatigue and practically eternal. However it was soon apparent that that belief was not factual, and a phenomenon labelled very high cycle fatigue started to become a research topic. This type of fatigue is characterized by cracks being induced even at very low stress levels due to defects such as impurities and even crystal size in metal matrices.
Yes. Higher temperature leads to faster degradation of capacitors. My experience with running PC routers 24/7 in non-air-conditioned spaces (with ~40 degC in summer) is that after 5+ years systems that were rock-stable started to crash/reboot once per several days and must be replaced.
Absolutely, it's materials science. Most electronics like this aren't meant to be run full-tilt 24/7, especially not under the conditions that crypto mining typically occurs. Subpar cooling, potentially dirty air - so lots of dust and particulate.
Not to mention crypto miners will often have the GPUs overclocked on top of running full tilt to get every last hash out of it. It's about as brutal of a situation as you can get for a GPU.
>Not to mention crypto miners will often have the GPUs overclocked on top of running full tilt to get every last hash out of it. It's about as brutal of a situation as you can get for a GPU.
This is false because miners often undervolt to achieve better efficiency. The popular GPU mining algorithms are all memory-bound, so you can undervolt your core clocks quite a bit and still get >95% of the original performance.
That depends on how valuable the last 5% of your hash rate is and local electricity prices. Running an OC especially for memory can objectively be the correct choice.
For collage dorms for example rarely charge based on electricity use. Some apartment complexes also include electricity as part of the rent.
>For collage dorms for example rarely charge based on electricity use. Some apartment complexes also include electricity as part of the rent.
Those situations are rare. A student in a college dorm isn't going to be able to afford multiple GPUs for a mining rig, and if he's mining with one GPU, he's likely going to keep it when the crash comes rather than trying to sell it into a flooded market. Apartment mining is more believable, but even then power consumption is going to be an issue for them because of noise. They're also going to be vastly outnumbered (in terms of GPUs operated) by professional miners because most people don't have a few thousand dollars to drop on generating highly speculative assets.
Everything you just said is total nonsense. This idea that you are going to "wear out" a GPU is something people started saying when they were obviously bitter about not being able to find GPUs and seeing them being resold later for more than retail after they had been used for mining. There is nothing to back this up unless something melts.
Additionally, transistors can experience aging through various mechanisms [1], some of which are permanent and some of which can be fixed with a reset. Most manifest via a shift in threshold voltage of the transistors, which can impact the operating frequency of chips or stability of sensitive circuits such as memory cells. When ICs are designed they usually have a lifetime operating profile, i.e 5y or 10y with max voltage and operation for x% of that lifetime at a given temperature. Simulations are then run pre and post degradation to ensure requirements can be met. Different fabs such as tsmc and samsung will provide models/info for the transistor aging as part of their design tool kits.
Not if you have filters set up and clean/change them regularly. I'm assuming miners monitor their temperatures, as well, as long as it's under 80 degC, chips will work for years under load. Though VRM components are of rather poor quality compared to GPUs, CPUs and RAM, that's a major failure point.