Hi,
our MD1000 disk array on a PER710 connected via PERC 6/E goes offline under heavy read operations like during backups. Here is the error:
10/03/17 6:41:10: mfiIsr: idr=00000020
10/03/17 6:41:10: Driver detected possible FW hang, halting FW.
10/03/17 6:41:10: Pending Command Details:
10/03/17 6:41:10: cmdId= a3: cmd=4, cmdStat=0, num_sg_elements=1, status=1 [PCI_COMMAND]
10/03/17 6:41:10: mfa=cf217b80, mf=a04cc000, mfSge=a04cc028, bytesTransferred=0, next ffff
10/03/17 6:41:10: startTime=0, lines=a17de230, lineMap=0, activeRecoveryCount=0, lockPromotedByRec=0
10/03/17 6:41:10: ldbbmAlreadyTried=0, ldbbmIssueWriteAsWV=0
10/03/17 6:41:10: hdr.length=100 opcode=1040500 mbox=00008f2b fe00ffff 00000000
10/03/17 6:41:10: Total Pending Commands = 1
[0]: fp=a00bef08, lr=a0c41918 - _MonTask+1a8
[1]: fp=a00bf130, lr=a0cc17e4 - mfiIdrIsr+124
[2]: fp=a00bf148, lr=e401e960 - dispatchIsrs+c4
[3]: fp=a00bf178, lr=e401e9f0 - external_IRQ+34
[4]: fp=a00bf190, lr=e401e074 - wrapper__External_IRQ+74
[5]: fp=a00bf1e0, lr=e401e928 - dispatchIsrs+8c
[6]: fp=a00bf1e8, lr=a0c428a8 - MonCheck+38
[7]: fp=a00bf278, lr=a0c536cc - raid_task+860
[8]: fp=a00bffa0, lr=a0cbd384 - _main+aa4
[9]: fp=a00bfff8, lr=fe001d58 - __start+ce0
MonTask: line 3622 in file ../../raid/1078dma.c
UIC_ER=10000ac:5500063, UIC_MSR=0:40, MSR=21000, sp=a00bef08
MegaMon> pciPowerMgmtInt: Requested Power State: D3
pciPowerMgmtInt: Requested Power State: D0
T0: LSI Logic ROC firmware
T0: Copyright (C) LSI Logic, 2004
T0: Firmware version 1.22.52-1909 built on Sep 21 2012 at 15:29:16
T0: pciInit: O_PCI_SERVICE = 00000005
T0: Initializing memory pool size=00300B24 bytes
T0: Press '!' within 3 seconds to enter debugger before INIT
T3: LogInit: Flushing events from previous boot
T3: EVT#36651-10/03/17 6:41:10: 15=Fatal firmware error: Driver detected possible FW hang, halting FW.
T3: EVT#36652-10/03/17 6:41:10: 15=Fatal firmware error: Line 3622 in ../../raid/1078dma.c
T3: EVT#36653-T3: 0=Firmware initialization started (PCI ID 0060/1000/1f0a/1028)
T3: EVT#36654-T3: 1=Firmware version 1.22.52-1909
T3: EepromInit: Family=33, SN=909344030000
T3: Delaying POST by 20 seconds...
Also, the following are errors during startup only but never under normal operation:
10/03/17 9:31:23: EVT#36764-10/03/17 9:31:23: 113=Unexpected sense: Encl PD 11 Path 50026b92614ff40c, CDB: 1c 01 a0 00 04 00, Sense: 5/35/01
10/03/17 9:31:23: Raw Sense for PD 11: 70 00 05 00 00 00 00 0a 00 00 00 00 35 01 00 00 00 00
10/03/17 9:31:24: EVT#36765-10/03/17 9:31:24: 113=Unexpected sense: Encl PD 11 Path 50026b92614ff40c, CDB: 1c 01 a0 00 04 00, Sense: 5/35/01
10/03/17 9:31:24: Raw Sense for PD 11: 70 00 05 00 00 00 00 0a 00 00 00 00 35 01 00 00 00 00
10/03/17 9:31:24: EVT#36766-10/03/17 9:31:24: 113=Unexpected sense: Encl PD 21 Path 5001c232c961390c, CDB: 1c 01 a0 00 04 00, Sense: 5/35/01
10/03/17 9:31:24: Raw Sense for PD 21: 70 00 05 00 00 00 00 0a 00 00 00 00 35 01 00 00 00 00
10/03/17 9:31:25: EVT#36767-10/03/17 9:31:25: 113=Unexpected sense: Encl PD 21 Path 5001c232c961390c, CDB: 1c 01 a0 00 04 00, Sense: 5/35/01
10/03/17 9:31:25: Raw Sense for PD 21: 70 00 05 00 00 00 00 0a 00 00 00 00 35 01 00 00 00 00
I am a newbie so I am unable to interpret the root cause. Can someone please help me pinpoint a possible solution?
Thank you,
-sul.