Вы находитесь на странице: 1из 12

X7 TOI – Vail HBA

Engineered Systems / X86

February 16, 2016

Paul Lodrige
Engineered Systems / X86
Systems Quality Group
1Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted
Agenda

• What is Vail
• What’s new for Vail
• Vail / HDD issue triage and debugging for the field
• References
• Case Study – LIVE !
• Vail – in depth – if interest and time exists !

2Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted
What is Vail

• Based on Gen 3 LSI SAS 9361-16i controller


• Utilizes the Gen 3 LSI 3316 IO controller chip
• Supports x8 PCIe 3.0 with 8Gb/s per lane - - same
• Supports 16 individual SAS ports operating at 12Gb/s -
• Backward compatible with previous PCIe and SAS generations 1 & 2
• SAS data transfer rates of 12, 6, and 3Gb/s per lane
• On board ESM ( SuperCap) no PM required

3Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted
What’s new for Vail ? ( continued )

Per LSI / Broadcom:

“There are not any signifcant changes from a debug / triage perspectie between Aspen and

Vail. The two main diferences between these products are

1) Vail adds 8 more SAS ports (alleiiatng the need for expanders)

2) FW jumps from MR 6.3 to MR 6.13. So there's a whole lot of defects that

haie been fxed which should proiide beter reliability for the Vail

Product.” --- ( pf) And numerous RFE’s haie been implemented !

4Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted
What’s new for Vail ? ( continued )
FW Logs are now persistent across reboots and power cycles. Improies diagnosing issues by seeing
actiity leading up to fatal fw faults.

Reduced spam of repeatng messages in fw log fle. Improies readability of fw logs.

Controller will go into Write Thru mode upon a Driie failure with HBA haiing Pinned cache.

Support for different I/O request sizes up to 1MB per request

Improied Error Handling. Preiiously many encountered faults would immediately stop fw forcing
power cycle. We now will reset controller iia OCR to allow recoiery from numerous faults.

Upgrade HDD fw can now be done multple HDDs at a tme rather than sequentally signifcantly
decreasing oierall tme for upgrades.

5Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted
What’s new for Vail ? ( continued )

Improied SuperCap monitoring. Log additonal parameters specifc to SC behaiior.

StorCLI adds sanitie to crypto Erase functon

Additonal eients being logged to help distnguish cable is HDD errors.

Power throtling implemented to help with high temperature situatons.

6Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted
Vail I/O Architecture – same as before

7Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted
Vail Hints, Tips and References
Sample Storcli Commands
• Controller termlogs
– storcli /c0 /show termlog
• Controller configuration
– storcli /co/v0 /show all
• Events – NEW !!!!!!!!!!!!!!!!!!!

• /opt/MegaRAID/storcli/storcli64 /c0 /eall /sall show errorcounters


Description = Show Drive/Cable Error Counters Succeeded.

Drive Error counter for Drive Error counter for Slot

/c0/e8/s0 0 0

/c0/e8/s1 0 0

/c0/e8/s2 0 0
8Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted
SAS 1/2/3 LogInfo Decoding
[Snippet of FW termlog]
10/05/13 11:14:33: isForeignCfgComplete: MR_CFG - totAr: 0x1, totLd: 0x1, totSpare: 0x0
10/05/13 11:27:51: Disabling UART for 120s due to IDR on devH c
10/05/13 11:27:51: iopiEvent: EVENT_SAS_DEVICE_STATUS_CHANGE
10/05/13 11:27:51: DM_HandleDevStatusChgEvent: devHandle=x000c SASAdd=4433221102000000 TaskTag=xffff
ASC=x00 ASCQ=x00 IOCLogInfo x31110d00 IOCStatus x8000 ReasonCode x08 - INTERNAL_DEVICE_RESET

The IOCLogInfo field of the Reply message includes the following subfields.
•  [31:28] – MPI2_IOCLOGINFO_TYPE_SAS (3)
•  [27:24] – IOC_LOGINFO_ORIGINATOR: 0 = IOP, 1 = PL, 2 = IR
•  [23:16] – LOGINFO_CODE
•  [15:0] – LOGINFO_CODE Specific

IOCLogInfo 0x31110d00

3 ; MPI2_IOCLOGINFO_TYPE_SAS
1 ; means error generated from the controller PL (protocol layer). Layer below FW -- ie. on the chip
1100 ; LogInfo Code = PL_LOGINFO_CODE_RESET
0d00 ; subcode = PL_LOGINFO_SUB_CODE_SATA_LINK_DOWN SATA direct-attached link went down.

9Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted
Vail Reference Guides

You can find these materials in Paul’s Useful Stuff workspace


• MRDiag – MegaRaid Diagnostic Tool Users Guide
• STORCLI - 12Gb/s MegaRAID® SAS Software Users Guide
• Drivers - MegaRaid SAS Device Driver Users Guide
• Fusion MPT Fusion-MPT™ 2.5 Message Passing Interface (MPI) Spec. Guide
• Firmware Guide - LSISAS MegaRAID® Firmware Functional Specification Guide
• SAS 3 Error Codes - SAS Generation 3 Error Codes Systems Engineering Note
• SuperCap Events - SuperCap Events doc

. 10Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted
Vail –What’s missing ?

• parser.sh --- needs SAS3 & HDD specific updates


• Case study
• Send me your comments !

11Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted
12Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Restricted

Вам также может понравиться