Quantcast
Channel: PowerEdge HDD/SCSI/RAID Forum - Recent Threads
Viewing all articles
Browse latest Browse all 4121

Disk failures with no warning from the Dell SAS 6/iR RAID card

$
0
0

We have a PowerEdge T105 server, equipped until recently with a Dell SAS 6/iR adapter card in RAID 1. The OS is Windows Server 2008 x64 Standard.

We recently had two outages caused by hard disk failure. The first incident was characterized by intermittent server hangs that lasted from a few seconds to several minutes. Task Manager showed nothing unusual for the processor or memory. The server finally threw a BSOD (0x000000D1, lsi_sas.sys) and would not reboot. We have reliable backups. After a prolonged diagnostic phase which led to numerous server reboots, we concluded that one (or both) of the two disk drives was bad. The SAS 6/iR card showed no errors or warnings. We removed one disk from service, added a new drive, created the virtual disk, and then removed the remaining old drive from service, which put the virtual disk into a degraded state with what we believed to be a good drive. The hangs disappeared and we were able to reinstall the server from backups.

The system reinstallation was hampered by the fact that recent backups via wbadmin (incorporating the "-allCritical" parameter) were corrupt and would not boot. Via trial and error, we resorted to a system backup that was about 2 weeks old.

On the last step of data restoration (not involving wbadmin), the server started to hang again, but much more severely. We went through another diagnostic phase which included numerous reboots and the RAID card either showed no errors or it froze trying to locate the virtual disk.

We switched to a spare SAS 6/iR card with the same results.

We determined that our new drive had crashed. We were able to salvage one of our original disks, which we connected in simple SATA to rebuild the server and restore the data. Our server is now back in service without the SAS 6/iR RAID card.

FWIW, both defective disks happened to be Western Digital. According to the DOS-based WD Data Lifeguard Diagnostic (DLD) utility, the drive in the original installation had a READ ELEMENT FAILURE (Error/Status Code 0007). The new drive was so damaged that the latest version of DLD (5.25) crashed. An older version (5.04f) couldn't analyze the disk but displayed a table of values exceeding targets before it exited. IOW, both disks are confirmed bad.

Here are my questions:

  1. We had two defective hard disks with different problems. One disk was intermittent, the other quickly became unreadable. Two SAS 6/iR cards never displayed an error or removed either of the disks from service. Why?

  2. What's the advantage of the Dell SAS 6/iR card if it can't reliably remove defective hard disks from service (even if it means shutting down the server when the virtual drive is already in a degraded state)?

  3. Can anyone report their server being protected by the Dell SAS 6/iR card?

  4. Can someone recommend a better RAID card for the PowerEdge T105?

TIA for any help you can provide.

regards, AndyA


Viewing all articles
Browse latest Browse all 4121

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>