If At First You Don't Succeed, Fail, Fail Again

I’ve had a tough week, technology-wise.  Over the course of the last three days I’ve had two relatively new hard drives fail, a gigabit switch started having some ports go slow and a servo that controls the throttle on an RC airplane went nuts.  I think it may be time for some time away from the keyboard.

I believe my new saying for hard disks should go something like this:

“There are no such things as good, dependable or safe disks.  There are just disks that have failed and those that will fail.”

On the recommendation of some people on the Internet, I bought and tested a Samsung F1 Spinpoint 1TB SATA drive.  It seemed great: it was big, fast, very quiet and energy efficient.  Windows seemed to like it and a couple of my other PCs with very finicky SATA controllers took a shine to it as well.  After about a month or so of using it I decided to purchase 5 more, 4 to put into my ReadyNAS storage box and one “spare” to use for shuttling data around. I also convinced a partner of mine to pony up for an additional 4 drives for his NAS.  All seemed right in the world.

About two weeks ago I started noticing my ReadyNAS box getting slower and slower when trying to copy files from it or put files on it.  It also has a web page where the admin tasks get done, and most days I was fortunate to see that page in two to three minutes after trying it.  Great, I thought, some kind of firmware mess up (the box can offer and upgrade its own firmware) has happened.  So I slapped on a new version of the firmware, rebooted the NAS, and then nothing. Truly nothing, as in no web page, no network shares, no ping returns.  A few more reboots and things appeared to be working, so I left it alone to worry about it another day.

In the meantime I reformatted my main PC to be a full time Windows 7 x64 RC1 machine, so I fed it my existing Samsung 1TB drive to run from.  That worked for about 24 hours, then Win7 just stopped responding.  Thinking that I fouled it somehow (it happens, I go nuts on new installs from time to time) I hammered the whole install and did it again.  This time around it lasted for about 6 hours before Win7 coughed up an error message that roughly translated said:

“Dude (it’s California, work with me here), this drive is busted and you should back it up. Oh, and I won’t let you write to it any more. Have a nice day.”

Flash forward to June 19, just 24 hours after the desktop drive was rejected by Windows and now the NAS just disappeared.  I checked it to verify that it still had power but beyond that it did nothing but sit there and blink.  Reboot and try again.  The NAS works, but very slowly.  I finally get the admin web page up to view what the matter might be.  To my surprise there are no alerts in its log of “very bad things” that happen on it when I’m not looking, but there is another page where I can see the raw details of each disk’s S.M.A.R.T. report.  This is where all the scary data is on errors, retries and the like.  Imagine my surprise to find that one of my disks, labeled “2” by the NAS, has gone off the deep end with over 100K of errors in less than a weeks time.  I shut the box down and pulled that drive, replacing it with that “spare” Samsung I had been using as my portable disk.  26 hours later the NAS is up and running and it seems happy again, but I’m not so sure.

Along with my panic and rage, I notice that a few of my machines are running slowly when connecting to each other on my network or talking to the Internet.  That’s odd, I say to myself, since I often have self referential conversations, my network is all Gigabit Ethernet enabled save for a few older devices.  I check the gigabit switch and find that at least two of the ports are lit at 10/100 speeds.  The PCs confirm this and I sit puzzled.  It worked last week, I thought, but now it’s gone and slowed itself down for no reason?  A quick check of the Internets using some Google-foo and I have my answer, this Netgear 8 port switch, the GS608, has a history of dying slowly and taking one port at a time down to a crawl.  It just decided to make itself known to me while I’m fighting my hard disks.

Normally, three big failures at once is plenty, but since this is my life I had to make it more exciting.  I drove off to the RC airfield to fly my “reliable” airplane and the servo that controls the throttle goes nuts. It decides there are two settings: full on and off.  Stranger still is the fact that this has never happened on any airplane I’ve had before and after tinkering with the airplane and changing nothing it “cures” itself.  Knowing the week I’ve just had I packed everything up in the truck and took my toys home.

I’m not sure what the moral of this story is supposed to be, other than when I seem to have bad luck in a portion of my world it happens in clumps.  I’ll certainly want to be extra careful the next time I get in something fast and dangerous to go somewhere… come to think of it my car was just in the shop for a safety system malfunction. Hmmm….