On Tue, Feb 22, 2022 at 7:34 AM Neal Becker <ndbecker2(a)gmail.com> wrote:
I know this is a bit OT, but you guys are great at answering all
questions.
I bought a workstation from Titan computers around 1/2020 (dual EPYC
cpu). After about 1 year it stopped working. I could ssh to it, and
almost any command would return Input/Output error. Unfortunately
journalctl gave input/output error so I can't see logs. cat
/proc/partitions did not show any nvme device (the root device) on which
the OS was installed.
I replaced the SSD with a samsung 980 pro. Reinstalled fedora. It then
worked a few weeks, then the exact same symptoms.
I replaced the SSD with another samsung 980 pro, this time with heatsink.
Reinstalled fedora. It worked a few weeks. Then same symptoms.
Then I replaced with a 4th samsung 980 pro, but this time instead of using
the M.2 socket I used a pcie-m.2 adapter (in case something was wrong with
the m.2 socket). Also added a surge protector outlet for good measure.
Reinstalled. Watched the smartctl. No errors. Temperature was always low.
Now it's failed again, exactly same symptoms.
Any ideas?
I remember your other email about a month or so ago and thought it was
really strange. Have you tried the drives in another system to confirm
they're truly dead?
I would check for BIOS updates just for good measure. Other than that, have
you had any communication with Titan about it?
Thanks,
Richard