I'm wondering if anyone else has experienced this issue or something similar:
We have about 100 T310's in production running with mirrored RAID configuration (they are Hyper-V Server 2008 R2 hosts - of course not 2012 because this RAID "Controller" does not support Server 2012 and never will according to a Dell rep with whom I spoke). This is the software RAID built into the BIOS (PERC S100 I believe) or whatever for Windows (not sure if it works with other OS's?). I'm definitely no expert on this particular RAID "Controller" but I am having a bad experience either with it or with the hard drives that Dell supplies. Now before anyone gets into the "you get what you pay for" speech, I am quite aware of that, especially at this moment. So here's the situation we're facing. We have 2 servers at each of our branch locations and each server has 2 drives. That's a total of about 200 hard drives. Currently, 50 of those 200 drives are failed. Normally, I would not fret. See, the purpose of a RAID serves to provide redundancy and fault tolerance when one (or more) drive fails. So... I should be able to just shut down the server, pop another drive in, hop on Dell Server Admin and rebuild the RAID, right? NOPE! Here's the results of the last 6 attempts to service machines with degraded disks:
3 servers never booted back up after being shut down (both physical disks in mirror in a failed state)
3 servers booted back up but RAID rebuild failed due to too many errors on the remaining disk
Now, these servers aren't THAT old. In fact, some of the ones that have degraded disks have only been in production for a little over a year. It doesn't seem to matter whether or not there is dust in the computer; we've had total failure on machines with ZERO dust on the inside or in the drives. The drives' vendor also does not seem to matter as some are Seagate and some are Western Digital. Now, I'm not just talking about disks that got corrupted and could be reformatted. I mean that they actually stop spinning (think click of death most of the time). Embarrassingly enough I will admit that there were a few cases in which I was able to smack some sense into them (literally - yes I actually physically struck the drives against a hard surface and they started spinning again, at least enough to recover the virtual machines).
Now I've gone down the avenue of trying to set up alerts through Server Admin but I noticed that, in our case, the log events do often not indicate the drive failure and don't even seem to consistently detect the degraded virtual disk state (in fact, a failed disk is often reported as being in a 'ready' state, seemingly thrown out of the RAID like a disowned child). That means that I cannot rely on that as a solution. But, hey, that's all beside the point... We've decided to replace the drives in the servers with a single solid state (the two-server pair system at each location actually provides redundancy anyway and we have been replacing the servers at the locations in which both servers have degraded virtual disks to prevent total failure). No RAID means that we can at least move forward with Server 2012 and get a couple more years of life out of our T310's.
I will admit, and I probably shouldn't, that we have absolutely not been diligent in keeping the firmware up to date (yes, SHAME ON US, I know...)
Anyhow, enough with my sob story and my sharing of "expert data forensic techniques"... I am just curious if anyone else has had any negative experiences similar to mine. I might also note that we have been having drive failure issues with a lot of desktop models lately which use the same drives that come with the T310's. I also thought it interesting that we have a couple of T110's out there which are both much older than the T310's and have never had a drive failure (keeping my fingers crossed). It's just a shame that we have some desktops (Dimension 3000's) that have been running for 7+ years with their original hard drives and we typically don't get more than 1-2 (MAYBE 3) years with the stuff they've been giving us lately. They just don't make 'em like they used to I guess! Anyway, thanks in advance for your time!