Вы находитесь на странице: 1из 137

FRU Replacement Guide

HGST Active Archive System SA-70xx, SA-10xx


TM
with EasiScale Software, v 4.2.1
Publication 1ET0036 | v1.4.7 | June 2017

Long Live Data ™ | www.hgst.com


FRU Replacement Guide Copyright

Copyright
Notice

Publication Information
One MB is equal to one million bytes, one GB is equal to one billion bytes, one TB equals
1,000GB (one trillion bytes) and one PB equals 1,000TB when referring to storage capacity.
Usable capacity will vary from the raw capacity due to object storage methodologies and
other factors.
The following paragraph does not apply to any jurisdiction where such provisions are
inconsistent with local law: THIS PUBLICATION IS PROVIDED "AS IS" WITHOUT
WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR
A PARTICULAR PURPOSE.
This publication could include technical inaccuracies or typographical errors. Changes
are periodically made to the information herein; these changes will be incorporated in
new editions of the publication. There may be improvements or changes in any products
or programs described in this publication at any time. It is possible that this publication
may contain reference to, or information about, HGST products (machines and programs),
programming, or services that are not announced in your country. Such references or
information must not be construed to mean that Western Digital Corporation or its affiliates
intends to announce such HGST products, programming, or services in your country.
Technical information about this product is available by contacting your local HGST
product representative or on the Internet at: support.hgst.com.
Western Digital Corporation or its affiliates may have patents or pending patent
applications covering subject matter in this document. The furnishing of this document does
not give you any license to these patents.

© 2016-2017 Western Digital Corporation or its affiliates.

Long Live Data, EasiScale, EasiScale, and the HGST logo are registered trademarks or
trademarks of Western Digital Corporation or its affiliates or its affiliates in the U.S. and/
or other countries. Amazon S3, Amazon Simple Storage Services, and Amazon AWS S3 are
trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries.
Other trademarks are the property of their respective owners. References in this publication
to HGST-branded products, programs, or services do not imply that they will be made
available in all countries. Product specifications provided are sample specifications and
do not constitute a warranty. Actual specifications for unique part numbers may vary.
Please visit the Support section of our website, www.hgst.com/support/systems-support, for
additional information on product specifications. Photographs may show design models.
References in this publication to HGST-branded products, programs or services do not
imply that they are to be available in all countries in which HGST operates.

2
FRU Replacement Guide Preface

Preface
Notice

Topics:

• Document Conventions
• Related Documents
• Points of Contact

Document Conventions
Typography
Element Sample Notation
Linux commands or user input rm -rf /tmp
Linux system output Installation successful!
Commands longer than one line are split with "\" s3cmd \
--dump-config

User-supplied values datacenterID or <datacenterID>


File and directory names The file aFile.txt is stored in /home/user.
Graphical user interface (GUI) elements Click OK.
Keyboard keys and sequences To cancel the operation, press Ctrl+c.
GUI menu navigation Click Configuration > System.

Storage Notations
Convention Prefix Examples Usage
xB (base 10 notation) SI prefix: (kilo, mega, giga, 1GB = 1 gigabyte = Disk sizes
tera, peta, exa, zetta, yotta) 1,000,000,000 bytes
1TB = 1 terabyte =
1,000,000,000,000 bytes

xiB (base 8 notation) Binary prefixes (kibi, mebi, 1GiB = 1 gibibyte = Storage space, and sizes of
gibi, tebi, pebi, exbi, zebi, 1,073,741,824 bytes partitions or file systems
yobi)
1TiB = 1 tebibyte =
1,099,511,627,776 bytes

• This document uses a comma (",") for digit grouping; for example, 1,000 is one thousand.
• This document uses a period (".") as a decimal mark; for example, 12.5 %.

Admonitions
Type Usage
Indicates extra information that has no specific hazardous
Note: or damaging consequences.

3
FRU Replacement Guide Preface

Type Usage
Indicates a faster or more efficient way to do something.
Tip:

Indicates an action that, if taken or avoided, may result in


Caution: hazardous or damaging consequences.

Indicates an action that, if taken or avoided, may result in


Warning: data loss or unavailability.

Related Documents
Title Description
ActiveScale CM User Guide

Usage instructions for ActiveScale™ Cloud Management
(CM)

Points of Contact
Contact HGST Support with your rack serial number or deployment ID.

Email www.hgst.com/support/systems-support
Phone 1-844-717-7766 or 1-408-717-7766
Website support.hgst.com

End of document.

4
FRU Replacement Guide About This Document

About This Document


Notice

Topics: This guide provides instructions for replacing hardware components of the HGST Active
Archive System.
• Weight

Weight
Rack:
The following table displays the weight of the Active Archive System:

Table 1: Active Archive System Weight

Hardware Dimensions (Width x Height x Depth)


Active Archive System Minimum Sled Configuration (one sled column): 1,566
lbs./710 kg.
Full Sled Configuration: 2,249 lbs./1,020 kg.

Note: The weight mentioned previous is the total unpacked weight after delivery.

Controller (SM 1028U-TR4T+):


The following table displays the weight of the Controller:

Table 2: Active Archive System Weight

Hardware Dimensions (Width x Height x Depth)


Controller Net weight is 26lbs.
Gross weight is 41 lbs

Note: The gross weight of the controller


is based on the combined weight of
the server, accessories kit, rail kit, and
packaging

Storage (SM 1018R-WC0R):


The following table displays the weight of the Storage server:

Table 3: Active Archive System Weight

Hardware Dimensions (Width x Height x Depth)


Storage server Net weight is 25lbs.
Gross weight is 40lbs

Note: The gross weight of the storage


server is based on the combined weight
of the server, accessories kit, rail kit, and
packaging

5
FRU Replacement Guide Contents

Contents

List of Figures..................................................................................................................................................... 8

List of Tables.....................................................................................................................................................12

Chapter 1 Controller Node Replaceable Units................................................................................................. 13


1.1 Warnings........................................................................................................................................... 13
1.2 Chassis Replacement Procedure.......................................................................................................13
1.3 Hard Disk Drive Replacement Procedure........................................................................................30
1.4 Solid State Disk Replacement Procedure........................................................................................ 34
1.5 Power Supply Unit Replacement Procedure....................................................................................38
1.6 SFP+ DAC Cable Replacement Procedure......................................................................................39

Chapter 2 Storage Node Replaceable Units......................................................................................................40


2.1 Warnings........................................................................................................................................... 40
2.2 Chassis Replacement Procedure.......................................................................................................40
2.3 Hard Disk Drive Replacement Procedure........................................................................................56
2.4 Power Supply Unit Replacement Procedure....................................................................................59
2.5 MiniSAS Cable Replacement Procedure......................................................................................... 60

Chapter 3 Storage Interconnect Replaceable Units.......................................................................................... 61


3.1 Warnings........................................................................................................................................... 61
3.2 Storage Interconnect Replacement Procedure..................................................................................61
3.3 Fan Replacement Procedure.............................................................................................................66
3.4 Power Supply Unit Replacement Procedure....................................................................................67
3.5 SFP+ 1G Module Replacement Procedure...................................................................................... 68
3.6 SFP+ DAC Cable Replacement Procedure......................................................................................69

Chapter 4 Power Distribution Unit Replaceable Units.....................................................................................70


4.1 Warnings........................................................................................................................................... 70
4.2 Power Distribution Unit Replacement Procedure............................................................................ 70

Chapter 5 Storage Enclosure Basic Field Replaceable Units........................................................................... 79


5.1 Visual Indicator and Field Replaceable Units Locations.................................................................79
5.2 Sled Replacement Procedure............................................................................................................82
5.2.1 Hard Disk Drive Replacement Procedure.......................................................................... 90
5.3 Power Cord Replacement Procedure............................................................................................. 103
5.4 MiniSAS Cable Replacement Procedure....................................................................................... 104
5.5 Rear Fan Replacement Procedure.................................................................................................. 106
5.6 Power Supply Unit Replacement Procedure.................................................................................. 107
5.7 I/O Canister Replacement Procedure............................................................................................. 110
5.8 Storage Enclosure Basic Capacity Upgrades................................................................................. 116

Appendix A Troubleshooting.......................................................................................................................... 130


A.1 General........................................................................................................................................... 130

6
FRU Replacement Guide Contents

A.2 Marking a MetaStore as Read-Only............................................................................................. 131


A.3 Marking a MetaStore as Read/Write.............................................................................................131
A.4 Invoking the Repair Process......................................................................................................... 131
A.5 Recovering From a Failed Disk Initialization...............................................................................132
A.6 Rolling Back a Capacity Upgrade.................................................................................................133
A.7 Setting Serial Number to Rack Name...........................................................................................134
Active Archive System Glossary................................................................................................................... 135
H...........................................................................................................................................................135
S........................................................................................................................................................... 135
V...........................................................................................................................................................135

7
FRU Replacement Guide List of Figures

List of Figures

Figure 1: Overview of Controller Node Chassis Replacement.........................................................................14

Figure 2: The Shutdown Button in the Commands Pane................................................................................. 16

Figure 3: Controller Node, Back.......................................................................................................................17

Figure 4: Controller Node, Front...................................................................................................................... 17

Figure 5: Controller Node, Back.......................................................................................................................18

Figure 6: Uninitialized Nodes........................................................................................................................... 19

Figure 7: The New Chassis Appears Under the FAILED List in the CMC..................................................... 21

Figure 8: The Old IPMI IP Address of the Node.............................................................................................24

Figure 9: Decommissioned Disks in the CMC.................................................................................................30

Figure 10: Decommissioned Disk Details from the CMC............................................................................... 31

Figure 11: Controller Node, Front.................................................................................................................... 31

Figure 12: Removing a Drive Carrier...............................................................................................................32

Figure 13: Decommissioned Disks in the CMC...............................................................................................34

Figure 14: Decommissioned Disk Details from the CMC............................................................................... 35

Figure 15: Controller Node, Front.................................................................................................................... 35

Figure 16: Removing a Drive Carrier...............................................................................................................36

Figure 17: Controller Node, Back, with PSU Status LEDs Highlighted..........................................................38

Figure 18: Controller Node SFP+ Ports........................................................................................................... 39

Figure 19: Overview of Storage Node Chassis Replacement...........................................................................41

Figure 20: A Storage Node Pane in the CMC..................................................................................................42

Figure 21: The Shutdown Button in the Commands Pane............................................................................... 42

Figure 22: Storage Node, Back.........................................................................................................................43

Figure 23: Storage Node, Front........................................................................................................................ 44

Figure 24: Storage Node, Back.........................................................................................................................44

Figure 25: Uninitialized Nodes......................................................................................................................... 45

Figure 26: The New Chassis Appears Under the FAILED List in the CMC................................................... 47

Figure 27: The Old IPMI IP Address of the Node...........................................................................................50

8
FRU Replacement Guide List of Figures

Figure 28: The Node with a Rebooted Chassis as Seen on the CMC.............................................................. 52

Figure 29: Storage Node Status in the CMC....................................................................................................54

Figure 30: Decommissioned Disks in the CMC...............................................................................................56

Figure 31: Drive Map Showing a Decommissioned Drive on a Storage Node................................................56

Figure 32: Storage Node, Front........................................................................................................................ 57

Figure 33: Removing a Drive Carrier...............................................................................................................57

Figure 34: Storage Node, Back, with PSU Status LEDs Highlighted.............................................................. 59

Figure 35: A Single-Rack Active Archive System...........................................................................................63

Figure 36: Storage Interconnect Port Reservations.......................................................................................... 65

Figure 37: Signal Cabling Scheme................................................................................................................... 65

Figure 38: A Single-Rack Active Archive System...........................................................................................66

Figure 39: A Single-Rack Active Archive System...........................................................................................67

Figure 40: A Single-Rack Active Archive System...........................................................................................68

Figure 41: Controller Node SFP+ Ports........................................................................................................... 69

Figure 42: Windows 7 Java Control Panel....................................................................................................... 71

Figure 43: Windows 7 Java Control Panel: Exception Site List...................................................................... 72

Figure 44: Windows 7 Java Control Panel: Security Warning.........................................................................72

Figure 45: Windows 7 Java Control Panel: Confirmation............................................................................... 73

Figure 46: PDU Login Dialog.......................................................................................................................... 73

Figure 47: PDU Main Menu............................................................................................................................. 74

Figure 48: PDU Status Panel............................................................................................................................ 75

Figure 49: PDU Main Menu............................................................................................................................. 75

Figure 50: PDU Configuration Panel................................................................................................................76

Figure 51: ON Delay Values for PDU 2 (Upper Horizontal PDU)..................................................................77

Figure 52: ON Delay Values for PDU 1 (Lower Horizontal PDU)................................................................. 77

Figure 53: System Enclosure Information........................................................................................................ 80

Figure 54: Rear Fan Order................................................................................................................................81

Figure 55: Sled HDD Order..............................................................................................................................81

Figure 56: A Storage Node Pane in the CMC..................................................................................................82

9
FRU Replacement Guide List of Figures

Figure 57: The Shutdown Button in the Commands Pane............................................................................... 82

Figure 58: Removing the Power Cords............................................................................................................ 83

Figure 59: Removing the MiniSAS Cables...................................................................................................... 83

Figure 60: Unlocking the I/O Module.............................................................................................................. 84

Figure 61: Removing the I/O Module.............................................................................................................. 85

Figure 62: Sled Release Button........................................................................................................................ 86

Figure 63: Sled Release at 45 Degrees.............................................................................................................86

Figure 64: Removing the Sled.......................................................................................................................... 87

Figure 65: Sled Cover....................................................................................................................................... 88

Figure 66: Hard Disk Drive Carrier Buttons.................................................................................................... 89

Figure 67: Removing the Hard Disk Drive with Carrier..................................................................................89

Figure 68: Decommissioned Disks in the CMC...............................................................................................91

Figure 69: Decommissioned Disk Details from the CMC............................................................................... 91

Figure 70: A Storage Node Pane in the CMC..................................................................................................93

Figure 71: The Shutdown Button in the Commands Pane............................................................................... 93

Figure 72: Removing the Power Cords............................................................................................................ 94

Figure 73: Removing the MiniSAS Cables...................................................................................................... 95

Figure 74: Unlocking the I/O Canister............................................................................................................. 96

Figure 75: Removing the I/O Canister............................................................................................................. 96

Figure 76: Sled HDD Order..............................................................................................................................97

Figure 77: Sled Release Button........................................................................................................................ 98

Figure 78: Sled Release at 45 Degrees.............................................................................................................98

Figure 79: Removing the Sled.......................................................................................................................... 99

Figure 80: Sled Cover..................................................................................................................................... 100

Figure 81: Hard Disk Drive Carrier Buttons.................................................................................................. 101

Figure 82: Removing the Hard Disk Drive with Carrier................................................................................101

Figure 83: Replacing the Failed Power Cord................................................................................................. 103

Figure 84: Storage Node MiniSAS Ports........................................................................................................104

Figure 85: Removing the MiniSAS Cables.................................................................................................... 105

10
FRU Replacement Guide List of Figures

Figure 86: Fan Release Button........................................................................................................................106

Figure 87: Rear Fan.........................................................................................................................................106

Figure 88: A Storage Node Pane in the CMC................................................................................................107

Figure 89: The Shutdown Button in the Commands Pane............................................................................. 107

Figure 90: Removing the Power Cord............................................................................................................ 108

Figure 91: Removing the Power Supply Unit................................................................................................ 109

Figure 92: A Storage Node Pane in the CMC................................................................................................110

Figure 93: The Shutdown Button in the Commands Pane............................................................................. 110

Figure 94: Removing the MiniSAS Cables.................................................................................................... 111

Figure 95: Removing the Power Cords.......................................................................................................... 112

Figure 96: Latch Handle Identification...........................................................................................................113

Figure 97: Latch Handle Clear of Rack Ear...................................................................................................114

Figure 98: Removing the I/O Canister............................................................................................................115

Figure 99: A Storage Node Pane in the CMC................................................................................................122

Figure 100: The Shutdown Button in the Commands Pane........................................................................... 122

Figure 101: Removing the Power Cords........................................................................................................ 123

Figure 102: Removing the MiniSAS HD Cables........................................................................................... 123

Figure 103: I/O Canister Handle.....................................................................................................................124

Figure 104: Unlocking the I/O Module.......................................................................................................... 124

Figure 105: Removing the I/O Module.......................................................................................................... 125

Figure 106: Sled Release Button.................................................................................................................... 125

Figure 107: Removing the Sled...................................................................................................................... 126

11
FRU Replacement Guide List of Tables

List of Tables

Table 1: Active Archive System Weight............................................................................................................ 5

Table 2: Active Archive System Weight............................................................................................................ 5

Table 3: Active Archive System Weight............................................................................................................ 5

Table 4: Work Table with Sample MAC Addresses and Serial Bus Paths...................................................... 20

Table 5: Work Table with Sample Ethernet Port Names and NIC Array IDs..................................................23

Table 6: Work Table for Controller Node Chassis Replacement..................................................................... 28

Table 7: Work Table with Sample MAC Addresses and Serial Bus Paths...................................................... 46

Table 8: Work Table with Sample Ethernet Port Names and NIC Array IDs..................................................49

Table 9: Work Table for Storage Node Chassis Replacement......................................................................... 54

Table 10: Utilization of Old Disks When New Disks Are Added................................................................. 116

Table 11: Durability with Various Capacity Upgrades.................................................................................. 117

Table 12: Part Numbers and Descriptions...................................................................................................... 117

Table 13: Work Table for Storage Enclosure Basic Capacity Upgrades........................................................119

Table 14: Troubleshooting Prepare Mode Output.......................................................................................... 121

Table 15: Troubleshooting Upgrade Mode Output.........................................................................................127

Table 16: Troubleshooting Finalize Mode Output..........................................................................................128

Table 17: Troubleshooting Finalize Mode Output..........................................................................................134

12
FRU Replacement Guide 1 Controller Node Replaceable Units

11 Controller Node Replaceable Units


Chapter

Topics: This section provides replacement procedures for the following parts in a Controller Node:
• Chassis
• Warnings
• HDD
• Chassis Replacement
• SSD
Procedure
• PSU
• Hard Disk Drive
• SFP+ DAC Cable
Replacement Procedure
• Solid State Disk
Replacement Procedure
• Power Supply Unit
Replacement Procedure
• SFP+ DAC Cable
Replacement Procedure

1.1 Warnings
Caution: Opening or removing the system cover when the system is powered on may expose you to a
risk of electric shock.
When replacing items from the inside of the chassis, ensure that you take precautions to prevent
electrostatic discharge (ESD).

Warning: Replace only one disk at a time on the Management Node.

1.2 Chassis Replacement Procedure


The Controller Node chassis is a SuperMicro DP 1U Server, 1028. Replacing the chassis replaces its NICs, CPU,
memory, motherboard, and fans, but not its disks.
Prerequisites
• Obtain a replacement Controller Node chassis from HGST.
• Obtain the virtual IP address of the Management Node.
• Obtain the IP addresses of the other (non failed) Controller Nodes.
• Obtain the admin password for the CMC.
• Obtain the root password.
• Fill in as much of the work table as possible before starting this procedure.
Required Tools
• Ladder
• Long Phillips-head screwdriver

13
FRU Replacement Guide 1 Controller Node Replaceable Units

Time Estimate: 3 hours.


Figure 1: Overview of Controller Node Chassis Replacement

14
FRU Replacement Guide 1 Controller Node Replaceable Units

A work table is provided at the end of this section for your convenience, to store all of the information needed for a
chassis replacement.
To replace a Controller Node chassis, proceed as follows:
1.

If the failed node is the Management Node, fail over the Management Node to another Controller Node.
a) Open an SSH session to any Controller Node.
You must obtain the IP addresses of the Controller Node ahead of time.
b) Use the following command to determine the virtual IP address of the Management Node.
grep dmachine.amplistor.com /etc/hosts | grep -v 127.0.0.1 | awk '{print $1}'

The output of this command is the virtual IP address of the Management Node. For example,

172.16.63.154

c) Open an SSH session to the Management Node using the virtual IP address obtained in the previous substep.
d) Exit the OSMI menu.
The Linux prompt appears.
e) Copy or write down the hostname in the Linux prompt.
f) Log into the CMC.
g) Navigate to Dashboard > Administration > Hardware > Servers > Controller Nodes, and select the failed
Controller Node.
h) Compare the hostname of the failed node, as displayed in the CMC, to the hostname you saved from substep e.
i) If the failed node is the Management Node, fail over the Management Node to another Controller Node.
For instructions on how to fail over the Management Node, see Managing Hardware in the HGST Active
Archive System Administration Guide.
2.

Enable the location LED on the Controller Node.


a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Controller Nodes.
b) Select the correct node.
c) In the Commands pane, click Location LED On.
A blue LED on its front and back panels is now blinking.
3.

Shut down the Controller Node from the CMC.

Note: Save the node's hostname in your worktable under Original Hostname of Node.

a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Controller Nodes.
b) Select the desired Controller Node.

15
FRU Replacement Guide 1 Controller Node Replaceable Units

c) In the Commands pane, click Shutdown.


Figure 2: The Shutdown Button in the Commands Pane

d) Wait for the Status field to change to DONE.


4.

Go to the rack and identify the correct chassis by the blinking blue LED on its front and back panels.
5.

Remove the failed chassis from the rack.


a) At the front of the chassis, loosen the rack mounting screws.
b) At the back of the chassis, disconnect the five network cables connected to the ports labeled as follows in the
image below.

Important: Pull very gently on the pull tabs of the SFP+ cables, otherwise they might break.

Note: Check that the cables are labeled correctly, so that you can put them back in the same
order.

i. N1 (remove the SFP+ optical transceiver attached to the cable)


ii. N2
iii. N3
iv. N4 (remove the SFP+ optical transceiver attached to the cable)

16
FRU Replacement Guide 1 Controller Node Replaceable Units

v. M2
Figure 3: Controller Node, Back

c) At the back of the chassis, disconnect the two power cords.


In the image above, the power cords are connected to the PSUs labeled P1 and P2.
d) At the back of the chassis, depress both rail slides inwards, and push the chassis towards the front of the rack,
until the rails pass the safety catch (about 3"/7.6cm).
e) At the front of the chassis, slowly slide the chassis out until you reach the pull-safety at the midway point (you
will hear a soft clicking sound, and feel the chassis "catch" on the rails).
f) Disengage the pull-safety on both sides of the chassis and slide it out until the split line of the two top covers.
Push the pull-safety on one side up, and the pull-safety on the other side down.
g) Continue to slowly slide the chassis out until you reach the pull-safety at the end point, and disengage it as you
did the earlier one.
h) Safely unmount the chassis from the rack and place it on a table.

Caution: A Controller Node chassis weighs about 50lbs. Ensure that you have sufficient
manpower to handle it safely.

Warning: Once you pull the chassis past the pull-safety, do not leave it hanging in the rack.
Otherwise, the rack rails may be damaged permanently.

6.

Move the two HDDs and the four SSDs from the failed chassis to the exact corresponding slots in the new
chassis.

Tip: Write down the disk serial number and slot location so that you can double-check that each
disk is seated in the correct slot post installation into the new chassis.

a) Remove each disk from its slot in the front bay of the failed chassis.
b) Install the disk into the corresponding slot in the new chassis.
Figure 4: Controller Node, Front

17
FRU Replacement Guide 1 Controller Node Replaceable Units

7.

Install the new chassis into the rack.


a) Mount the new chassis onto the rack slides and slide it into the rack.

Caution: Mounting the chassis is a two person task.


b) Tighten the rack mounting screws to secure the chassis to the rack.
c) Reconnect the five network cables to the chassis ports.
The network cables are labeled.
• Connect the cable labeled CNx.N1.SW1.Nxx (with SFP+ optical transceiver attached) to the port labeled
N1 in the image below.
• Connect the cable labeled CNx.N2.SW1.Nxx to the port labeled N2 in the image below.
• Connect the cable labeled CNx.N3.SW2.Nxx to the port labeled N3 in the image below.
• Connect the cable labeled CNx.N4.SW1.Nxx (with SFP+ optical transceiver attached) to the port labeled
N4 in the image below.
• Connect the cable labeled CNx.M2.SW1.Nxx to the port labeled M2 in the image below.
Figure 5: Controller Node, Back

d) Reconnect the power cords.


8. Get the MAC address of the new chassis IPMI NIC from the BIOS.
a) Connect a VGA monitor and USB keyboard to the new chassis.
b) Power on the new chassis.
The power button is located on the chassis front control panel.
c) At power up, press Del to enter into the system BIOS.
d) In the system BIOS, navigate to IPMI > BMC Network Configuration > .
e) Record Station MAC Address in your work table under IPMI MAC Address of Node.
f) Exit the BIOS without saving any changes by pressing the ESC.
The boot process continues.
g) Disconnect the VGA monitor and USB keyboard from the new chassis.
9.

Get the IP address and machine name (hostname) of the new chassis.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Unmanaged Devices >
Uninitialized.
The new chassis appears in the list of uninitialized devices. This indicates that it has started successfully.
b) Write the value of Name into your work table, under Temporary Hostname of Node.
c) Write the value of Name without the PM- prefix into your work table, under MAC Address of Node.

18
FRU Replacement Guide 1 Controller Node Replaceable Units

d) Write the IP address into your work table, under Temporary IP Address of Node.
Figure 6: Uninitialized Nodes

10.

Get the bus-location-to-MAC-address mapping of the new chassis.


a) From the Management Node, open an SSH session to the new IP address of the Controller Node obtained in
the previous step.
Log in with username root and password rooter.
b) At the Linux prompt, run the following command:

for add in `ls /sys/devices/pci*/*/*/net/*/address`; do echo $add; cat


$add; done

The output of this command is similar to the example below.

/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/net/eth0/address
00:25:90:fd:e8:7c
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/net/eth2/
address
00:25:90:fd:e8:7d
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.2/net/eth3/
address
00:25:90:fd:e8:7e
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.3/net/eth5/
address
00:25:90:fd:e8:7f
/sys/devices/pci0000:80/0000:80:01.0/0000:81:00.0/net/eth1/
address
90:e2:ba:7c:5a:fc
/sys/devices/pci0000:80/0000:80:01.0/0000:81:00.1/net/eth4/
address
90:e2:ba:7c:5a:fd
/sys/devices/pci0000:80/0000:80:02.0/0000:82:00.0/net/eth6/
address
90:e2:ba:7c:5d:a4
/sys/devices/pci0000:80/0000:80:02.0/0000:82:00.1/net/eth7/
address
90:e2:ba:7c:5d:a5
root@nfsROOT:~#

The output of this command shows the serial bus path (for example, 0000:81:00.1) and the new MAC
address (for example, 90:e2:ba:7c:5a:fc).

19
FRU Replacement Guide 1 Controller Node Replaceable Units

Tip: As an alternative to the command above, you can use the command below to print only the
serial bus paths and MAC addresses in uppercase.
for add in `ls /sys/devices/pci*/*/*/net/*/address`; do echo -en
"`echo $add|sed 's/\// /g' | awk '{print $5}'`\t"; cat $add|tr 'a-f'
'A-F'; done

c) Fill in the serial bus path in ascending order in the Serial Bus Path column of the work table.
d) Fill in the MAC address corresponding to the serial bus path in ascending order in the MAC Address on the
New Chassis column of the work table.
For the sample output from the step above, the work table would look like this:

Table 4: Work Table with Sample MAC Addresses and Serial Bus Paths

Serial Bus Path MAC Address on the New Ethernet Port Name
Chassis
0000:01:00.0 00:25:90:fd:e8:7c eth0
0000:01:00.1 00:25:90:fd:e8:7d eth1
0000:01:00.2 00:25:90:fd:e8:7e eth2
0000:01:00.3 00:25:90:fd:e8:7f eth3
0000:81:00.0 90:e2:ba:7c:5a:fc eth4
0000:81:00.1 90:e2:ba:7c:5a:fd eth5
0000:82:00.0 90:e2:ba:7c:5d:a4 eth6
0000:82:00.1 90:e2:ba:7c:5d:a5 eth7
e) Close the SSH session to the Controller Node.
You are now back in the SSH session to the Management Node.
11.

Get the machine GUID and device GUID of the new chassis.
a) On the Management Node, start the Q-Shell:
/opt/qbase3/qshell

b) Create a cloudAPI connection.


cloudapi = i.config.cloudApiConnection.find('main')

c) Retrieve the machine GUID for the new chassis, using the value of Temporary Hostname of Node in
uppercase, from the work table, for hostname_of_new_node in the command below:
machine_guid = cloudapi.machine.find(name='hostname_of_new_node')['result'][0]

For example,

machine_guid = cloudapi.machine.find(name='PM-90:E2:BA:7E:B8:31')['result'][0]

d) Retrieve the device GUID using the machine GUID you obtained from the previous step.
dg = cloudapi.machine.list(machineguid=machine_guid)['result'][0]['deviceguid']

e) Sanity check: print the value of dg.

20
FRU Replacement Guide 1 Controller Node Replaceable Units

For example,

dg
'd951f6d9-7104-470d-8c97-ecf52d57c7b5'

12.

Mark the new chassis as FAILED in the Active Archive System database, and clean up references to it.
The Active Archive System created a new INSTOCK node in its database for the new chassis. If you do not mark
the new chassis as FAILED in the database, you are in effect adding a new node rather than replacing an existing
node's chassis. Therefore, you must remove the INSTOCK node by following the steps below.
a) Mark the new chassis as FAILED in the Active Archive System database:
Execute this command on the Management Node:
cloudapi.device.updateModelProperties(dg, \
status=str(q.enumerators.devicestatustype.FAILED))

The new chassis now appears under the FAILED list in the CMC, and is removed from the Unmanaged
Devices list.
Figure 7: The New Chassis Appears Under the FAILED List in the CMC

b) From the Management Node, clean up references to the new chassis in the Active Archive System database.
In the command below, replace MAC_ADDRESS with the value you wrote in the work table for MAC Address
of Node.

Important: Use capital letters for the MAC address.

q.amplistor.cleanupMachine('MAC_ADDRESS')

For example,

In [14]: q.amplistor.cleanupMachine('90:E2:BA:7E:B8:31')
Out[14]: True

This command takes about 10 seconds to complete.


c) Do a sanity check.

21
FRU Replacement Guide 1 Controller Node Replaceable Units

Refresh the screen by clicking Refresh in the Commands pane. Check that the new chassis is no longer in the
FAILED list.
13.

Update the Active Archive System database with the MAC addresses for the new chassis.
a) From the Management Node, create a cloudAPI connection.
cloudapi = i.config.cloudApiConnection.find('main')

b) From the Management Node, get the machine GUID using your work table value for Original Hostname of
Node.

Note: Use upper case for HOSTNAME_OF_OLD_NODE.

machine_guid = cloudapi.machine.find(name='HOSTNAME_OF_OLD_NODE')\
['result'][0]

For example,

machine_guid = cloudapi.machine.find(name='HGST-Alpha02-DC01-R02-CN01')\
['result'][0]

c) From the Management Node, get the machine object.


machine = cloudapi.machine.getObject(machine_guid)

d) Display all the Ethernet port names (ethN) that are registered:
For example,

In [4]: print machine.nics[0].name


eth0
In [5]: print machine.nics[1].name
eth2
In [6]: print machine.nics[2].name
eth3
In [7]: print machine.nics[3].name
eth5
In [8]: print machine.nics[4].name
eth1
In [9]: print machine.nics[5].name
eth4
In [10]: print machine.nics[6].name
eth6
In [11]: print machine.nics[7].name
eth7
In [12]: print machine.nics[8].name
BMC

e) Write the index of the above machine.nics[index].name value into the work table in column NIC
Array ID, in the row corresponding to ethN.

22
FRU Replacement Guide 1 Controller Node Replaceable Units

For the sample output from the step above, the work table would look like this:

Table 5: Work Table with Sample Ethernet Port Names and NIC Array IDs

Serial Bus Path MAC Address on the Ethernet Port Name NIC Array ID
New Chassis
0000:01:00.0 00:25:90:fd:e8:7c eth0 0
0000:01:00.1 00:25:90:fd:e8:7d eth1 4
0000:01:00.2 00:25:90:fd:e8:7e eth2 1
0000:01:00.3 00:25:90:fd:e8:7f eth3 2
0000:81:00.0 90:e2:ba:7c:5a:fc eth4 5
0000:81:00.1 90:e2:ba:7c:5a:fd eth5 3
0000:82:00.0 90:e2:ba:7c:5d:a4 eth6 6
0000:82:00.1 90:e2:ba:7c:5d:a5 eth7 7
IPMI See IPMI MAC Address BMC 8
of Node in the work table.
f) Update the database entry for machine.nics[N].hwaddr with the corresponding MAC address for ethN
from your work table.

Important: Use capital letters for the MAC address.


machine.nics[0].hwaddr = 'NEW_MAC_ADDRESS_FOR_ETHN'

For example,
machine.nics[0].hwaddr = '00:25:90:FD:E8:7C'
machine.nics[1].hwaddr =
'00:25:90:FD:E8:7E'
machine.nics[2].hwaddr =
'00:25:90:FD:E8:7F'
machine.nics[3].hwaddr =
'90:E2:BA:7C:5A:FD'
machine.nics[4].hwaddr =
'00:25:90:FD:E8:7D'
machine.nics[5].hwaddr =
'90:E2:BA:7C:5A:FC'
machine.nics[6].hwaddr =
'90:E2:BA:7C:5D:A4'
machine.nics[7].hwaddr =
'90:E2:BA:7C:5D:A5'

g) Update the database entry for machine.nics[8].hwaddr with the corresponding IPMI MAC address
from your work table, under MAC Address of Node.

Important: Use capital letters for the MAC address.

For example,
machine.nics[8].hwaddr = '0C:C4:7A:36:8B:12'

23
FRU Replacement Guide 1 Controller Node Replaceable Units

14.

Update the MAC address of the IPMI NIC, and the DHCP leases.
a) Log into the CMC.
b) Navigate to the CMC's view of the node whose chassis you have replaced.
c) Select that node's Summary tab.
d) Write the IPMI IP address, as shown in the General section, in your work table, under IPMI IP Address of
Node.
Figure 8: The Old IPMI IP Address of the Node

e) Leave the current SSH session as is. Open a new SSH session on the Management Node.
f) Open /opt/qbase3/cfg/dhcpd/dhcpd.leases with your text editor.
g) Search for the IPMI IP address (obtained in substep d) in the file.
The section containing the IPMI IP address looks like the following example:
host 457f495a-80b7-4125-862b-5f87d9121cfa {
dynamic;
hardware ethernet 0c:c4:7a:36:8b:12:;
fixed-address 172.16.201.16;
group "pmachines";
}

h) Change the hardware ethernet value to the new IPMI MAC address in lowercase from your work table,
under IPMI MAC Address of Node.

Note: Use lowercase only.

For example,

hardware ethernet 0c:c4:7a:36:8b:12:;

i) Save and close the file.


j) Exit the new SSH session.
15.

Do a sanity check to verify that you have updated the new MAC addresses correctly.
Compare the output of the command below to your work table.
In [9]: for nic in machine.nics: nic.name; nic.hwaddr

24
FRU Replacement Guide 1 Controller Node Replaceable Units

...:
Out[9]: 'eth0'
Out[9]: '00:25:90:FD:E8:7C'
Out[9]: 'eth2'
Out[9]: '00:25:90:FD:E8:7E'
Out[9]: 'eth3'
Out[9]: '00:25:90:FD:E8:7F'
Out[9]: 'eth5'
Out[9]: '90:E2:BA:7C:5A:FD'
Out[9]: 'eth1'
Out[9]: '00:25:90:FD:E8:7D'
Out[9]: 'eth4'
Out[9]: '90:E2:BA:7C:5A:FC'
Out[9]: 'eth6'
Out[9]: '90:E2:BA:7C:5D:A4'
Out[9]: 'eth7'
Out[9]: '90:E2:BA:7C:5D:A5'
Out[9]: 'BMC'
Out[9]: '0C:C4:7A:36:8B:12'

16.

Save the Active Archive System database machine settings.


q.drp.machine.save(machine)

17.

Update and save the Active Archive System database device object.
a) Get the device object.
device = cloudapi.device.getObject(machine.deviceguid)

b) Update the MAC address of the chassis with the value you saved in the work table under MAC Address of
Node.

Note: Use capital letters for the MAC address.

device.nicports[0].hwaddr = 'NEW_MAC_ADDRESS'

For example,

In [12]: device.nicports[0].hwaddr='90:E2:BA:7E:B8:31'

c) Save the device object.


q.drp.device.save(device)

18.

Restart dhcpd.
In [14]: q.manage.dhcpd.restart()
Stopping dhcpd...
dhcpd is halted
Starting dhcpd...
dhcpd is running

25
FRU Replacement Guide 1 Controller Node Replaceable Units

19.

Reboot the new chassis.


a) Open an SSH session to the node whose chassis you have just replaced.
Use the IP address you saved in the work table under Original IP Address of Node.
The Linux prompt appears.
b) At the Linux prompt, run the following command:
reboot

20.

When the node is restarted, update the main.cfg file and restart the application server.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Controller Nodes.
A list of Controller Nodes appears in the CMC.
b) Click the Controller Node whose chassis you have just replaced.
Identify the correct Controller Node by its hostname: it now matches the Original Hostname of Node value
you recorded in the worktable. This value is typically of the format SystemID-DCnn-Rnn-CNnn.
c) Identify the IP addresses listed in the Private IP field.
d) Open an SSH session to the Controller Node, using any one of the IP addresses you obtained from substep c,
and exit the OSMI menu.
The Linux prompt appears.
e) At the Linux prompt on the Controller Node, open the file /opt/qbase3/cfg/qconfig/main.cfg with
your text editor.
The file has a section that looks like this:
[main]
lastlogcleanup = 1428960577
domain = somewhere.com
nodetype = CPUNODE
nodename = 90E2BA7EB831
logserver_loglevel = 6
logserver_port = 9998
logserver_ip = 127.0.0.1
qshell_firstrun = False
machineguid = fc635662-5247-45b1-
ab66-d0abe8e60712

f) Replace the value after nodename = with the new MAC address from your work table, under MAC Address
of Node.

Note: The MAC address must be in uppercase and without colons. For example,
00:25:90:3B:C1:72 must be typed as 0025903BC172.
g) Save and close the configuration file.
h) Start the Q-Shell.
/opt/qbase3/qshell

i) Restart the application server:


In [1]: q.manage.applicationserver.restart()
Restarting applicationserver
Applicationserver...
Stopping applicationserver
Applicationserver...

26
FRU Replacement Guide 1 Controller Node Replaceable Units

Applicationserver is still
running, waiting for 5 more seconds
Applicationserver is still
running, waiting for 4 more seconds
Starting applicationserver
Applicationserver...

j) Exit the Q-Shell.


quit()

21.

Verify that the bus information of the network interfaces matches the udev rules.
a) Run the following command:

Tip: Check the hardware paths in the command below, as they might be different on the new
chassis.

for add in `ls /sys/devices/pci*/*/*/net/*/


root@HGST-S3-DC01-R01-CN03:~#
address`; do echo $add; cat $add; done

For example, the output of the above command looks like this:

/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/net/eth0/address
00:25:90:fd:e8:7c
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/net/eth2/address
00:25:90:fd:e8:7d
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.2/net/eth3/address
00:25:90:fd:e8:7e
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.3/net/eth5/address
00:25:90:fd:e8:7f
/sys/devices/pci0000:80/0000:80:01.0/0000:81:00.0/net/eth1/address
90:e2:ba:7c:5a:fc
/sys/devices/pci0000:80/0000:80:01.0/0000:81:00.1/net/eth4/address
90:e2:ba:7c:5a:fd
/sys/devices/pci0000:80/0000:80:02.0/0000:82:00.0/net/eth6/address
90:e2:ba:7c:5d:a4
/sys/devices/pci0000:80/0000:80:02.0/0000:82:00.1/net/eth7/address
90:e2:ba:7c:5d:a5

Tip: As an alternative to the command above, you can use the command below to print only the
serial bus paths and MAC addresses in uppercase.
for add in `ls /sys/devices/pci*/*/*/net/*/address`; do echo -en
"`echo $add|sed 's/\// /g' | awk '{print $5}'`\t"; cat $add|tr 'a-f'
'A-F'; done

b) Compare the output of the command above to the contents of the file /etc/udev/rules.d/70-
persistent-net.rules.
For example, the contents of this file look like this:

root@HGST-S3-DC01-R01-CN03:~# cat /etc/udev/rules.d/70-persistent-net.rules


SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:01:00.0", KERNEL=="eth*",
NAME="eth0"
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:01:00.1", KERNEL=="eth*",
NAME="eth2"

27
FRU Replacement Guide 1 Controller Node Replaceable Units

SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:01:00.2", KERNEL=="eth*",


NAME="eth3"
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:01:00.3", KERNEL=="eth*",
NAME="eth5"
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:81:00.0", KERNEL=="eth*",
NAME="eth1"
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:81:00.1", KERNEL=="eth*",
NAME="eth4"
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:82:00.0", KERNEL=="eth*",
NAME="eth6"
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:82:00.1", KERNEL=="eth*",
NAME="eth7"

If they do not match, update /etc/udev/rules.d/70-persistent-net.rules to match the output


of the command above, then reboot the node again.

reboot

The chassis replacement procedure is done.

Warning: Be very careful when recording and updating MAC addresses. A mistake may render the new
chassis unusable.

Table 6: Work Table for Controller Node Chassis Replacement

Item Value
Virtual IP Address of the Management Node:
Get this value as instructed in the Using the Administrator
Interfaces chapter of the HGST Active Archive System
Administration Guide

IP Addresses of the Other (Non Failed) Controller


Nodes:
Original Hostname of Node:
Get this value from the CMC before you shut down
the failed node. This value is typically of the format
SystemID-DCnn-Rnn-CNnn.

Temporary IP Address of Node:


The CMC displays this value after the new chassis is
installed.

Temporary Hostname of Node:


The CMC displays this value after the new chassis is
installed. This value is of the format PM-MAC_ADDRESS.

MAC Address of Node:


IPMI IP Address of Node:
IPMI MAC Address of Node:

28
FRU Replacement Guide 1 Controller Node Replaceable Units

Serial Bus Path MAC Address on the New Ethernet Port Name NIC Array ID
Chassis
0000:01:00.0 eth0
0000:01:00.1 eth1
0000:01:00.2 eth2
0000:01:00.3 eth3
0000:81:00.0 eth4
0000:81:00.1 eth5
0000:82:00.0 eth6
0000:82:00.1 eth7
IPMI

29
FRU Replacement Guide 1 Controller Node Replaceable Units

1.3 Hard Disk Drive Replacement Procedure


The HDD on a Controller Node is an HGST 1TB SATA 6 Gb/sec drive. It is a front-bay drive. It is hot swappable after
being decommissioned in the CMC.
Prerequisites
• Decommission the faulty drive in the CMC. For more information, see Managing Hardware in the HGST Active
Archive System Administration Guide.
• Obtain a replacement HDD from HGST.
Required Tools
• Ladder
• Long Phillips-head screwdriver
Time Estimate: 40 minutes.
To replace an HDD, proceed as follows:

Warning: Replace only one disk at a time on the Controller Node.

1. Obtain details about the decommissioned disk from the CMC.


a) In the CMC, navigate to Dashboard > Administration > Hardware > Disks > Decommissioned.
Figure 9: Decommissioned Disks in the CMC

b) Click the desired decommissioned disk.


The decommissioned disk details are displayed.
c) (Optional) Right-click anywhere in the decommissioned disk details, select Print > Print to PDF.
The decommissioned disk details contains the following information, which you will need to refer to later:
• The device name
• The model type and serial number
• A drive map showing the exact location of the disk (name of data center, rack, and node).
• The node type
• The MAC addresses of the node

30
FRU Replacement Guide 1 Controller Node Replaceable Units

• The current status of the node


Figure 10: Decommissioned Disk Details from the CMC

2. Enable the location LED on the Controller Node.


a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Controller Nodes.
b) Select the correct node.
c) In the Commands pane, click Location LED On.
A blue LED on its front and back panels is now blinking.
3. Go to the rack and identify the correct chassis by the blinking blue LED on its front and back panels.
4. Replace the decommissioned HDD.
a) Identify the decommissioned HDD by using the drive map.
The HDD is a front-bay drive.
Figure 11: Controller Node, Front

31
FRU Replacement Guide 1 Controller Node Replaceable Units

b) Press the release button on the drive carrier of the decommissioned HDD to extend the drive carrier handle.
Figure 12: Removing a Drive Carrier

c) Pull drive carrier out of the front bay using the drive carrier handle.
d) Compare the serial number on the HDD to the serial number specified in the decommissioned disk details to
confirm the that you have the correct HDD.
e) Unscrew the drive carrier from the decommissioned HDD.
f) Screw the drive carrier onto the replacement HDD.
g) Install the replacement HDD into the same slot that the decommissioned HDD was using.
A blue LED will blink for a moment.
5. Disable the location LED on the Controller Node.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Controller Nodes.
b) Select the desired Controller Node.
c) In the Commands pane, click Location LED Off.
6. Confirm that the Active Archive System correctly determines the purpose for the new disk.
a) Wait 15 minutes.
b) In the CMC, navigate to Dashboard > Administration > HGST Active Archive System Management >
Logging > Events .
c) In the Events list, check to see that a new empty disk has been detected.
d) In the Jobs list, check to see that an Initializing new disk job has been triggered.
It may take about 2 minutes for the job to appear.
e) In the CMC, navigate to Dashboard > Administration > Hardware > Servers.
f) Select the desired node.
g) Select the Disks tab.
h) Wait for the physical drive that has been replaced, as well as the logical disks, to change status from a red icon to
a green icon.

Note: The physical drive that has been replaced, as well as the logical disks, may take up to 40
minutes to change status.

The Initializing new disk job has completed successfully when the number of degraded disks decreases by
1.
7. If the disk still shows up in the Degraded or Unmanaged list, you must manually specify the purpose of the new
disk:

32
FRU Replacement Guide 1 Controller Node Replaceable Units

a) In the CMC, navigate to Dashboard > Administration > Hardware > Disks > Unmanaged.
b) Select the new disk, and in the Commands pane, click Repurpose.
c) In the Use As field, select Replacement Disk.

Note: You can only select Replacement Disk when there is a decommissioned disk. If there are
no decommissioned disks, you can only select Additional Disk as the purpose for the disk.

d) In the Replacement For field, select the decommissioned disk that you want to replace.
e) Click Next to start the repurposing.
An Initializing new disks on node_name job starts.
If you replaced the wrong disk, see Troubleshooting.

33
FRU Replacement Guide 1 Controller Node Replaceable Units

1.4 Solid State Disk Replacement Procedure


The a solid state disk (SSD) on a Controller Node is an Intel DCS3500 Series 240 GB, SATA 6 Gb/s. It is hot swappable
after being decommissioned in the CMC.
Prerequisites

Caution: If the faulty SSD has postgresql database on it, you must perform a failover first!
To check whether the faulty SSD is the one hosting the postgresql database, do the following:
1. In the CMC, navigate to Dashboard > Administration > Storage Management > MetaStores >
env_metastore.
2. See if the device name is part of env_metastore (in other words, if the device name appears under
the SSD Disk column).
To perform a failover, see Managing Hardware in the HGST Active Archive System Administration
Guide.

• Decommission the faulty SSD in the CMC. For more information, see Managing Hardware in the HGST Active
Archive System Administration Guide .
• Obtain a replacement SSD from HGST.
Required Tools
• Ladder
• Long Phillips-head screwdriver
Time Estimate: 40 minutes.
To replace an SSD, proceed as follows:

Warning: Replace only one disk at a time on the Controller Node.

1. Obtain details about the decommissioned disk from the CMC.


a) In the CMC, navigate to Dashboard > Administration > Hardware > Disks > Decommissioned.
Figure 13: Decommissioned Disks in the CMC

b) Click the desired decommissioned disk.


The decommissioned disk details are displayed.
c) (Optional) Right-click anywhere in the decommissioned disk details, select Print > Print to PDF.
The decommissioned disk details contains the following information, which you will need to refer to later:
• The device name
• The model type and serial number
• A drive map showing the exact location of the disk (name of data center, rack, and node).
• The node type
• The MAC addresses of the node
• The current status of the node

Warning: The CMC identifies the incorrect slot for failed SSDs on Controller Nodes.

34
FRU Replacement Guide 1 Controller Node Replaceable Units

The image in the decommissioned disk details for SSDs is mislabeled: when it highlights slot 9, the
decommissioned SSD is actually located in slot 5; when it highlights slot 10, the decommissioned
SSD is actually located in slot 6.

Figure 14: Decommissioned Disk Details from the CMC

2. Enable the location LED on the Controller Node.


a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Controller Nodes.
b) Select the correct node.
c) In the Commands pane, click Location LED On.
A blue LED on its front and back panels is now blinking.
3. Go to the rack and identify the correct chassis by the blinking blue LED on its front and back panels.
4. Replace the decommissioned disk.
a) Identify the decommissioned SSD by using the drive map.
The SSD is a front-bay drive.
Figure 15: Controller Node, Front

35
FRU Replacement Guide 1 Controller Node Replaceable Units

b) Press the release button on the drive carrier of the decommissioned SSD to extend the drive carrier handle.
Figure 16: Removing a Drive Carrier

c) Pull drive carrier out of the front bay using the drive carrier handle.
d) Compare the serial number on the SSD to the serial number specified in the decommissioned disk details to
confirm the that you have the correct SSD.
e) Unscrew the drive carrier from the decommissioned SSD.
f) Screw the drive carrier onto the replacement SSD.
g) Install the replacement SSD into the same slot that the decommissioned SSD was using.
A blue LED will blink for a moment.
5. Disable the location LED on the Controller Node.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Controller Nodes.
b) Select the desired Controller Node.
c) In the Commands pane, click Location LED Off.
6. Confirm that the Active Archive System correctly determines the purpose for the new disk.
a) Wait 15 minutes.
b) In the CMC, navigate to Dashboard > Administration > HGST Active Archive System Management >
Logging > Events .
c) In the Events list, check to see that a new empty disk has been detected.
d) In the Jobs list, check to see that an Initializing new disk job has been triggered.
It may take about 2 minutes for the job to appear.
e) In the CMC, navigate to Dashboard > Administration > Hardware > Servers.
f) Select the desired node.
g) Select the Disks tab.
h) Wait for the physical drive that has been replaced, as well as the logical disks, to change status from a red icon to
a green icon.

Note: The physical drive that has been replaced, as well as the logical disks, may take up to 40
minutes to change status.

The Initializing new disk job has completed successfully when the number of degraded disks decreases by
1.
7. If the disk still shows up in the Degraded or Unmanaged list, you must manually specify the purpose of the new
disk:

36
FRU Replacement Guide 1 Controller Node Replaceable Units

a) In the CMC, navigate to Dashboard > Administration > Hardware > Disks > Unmanaged.
b) Select the new disk, and in the Commands pane, click Repurpose.
c) In the Use As field, select Replacement Disk.

Note: You can only select Replacement Disk when there is a decommissioned disk. If there are
no decommissioned disks, you can only select Additional Disk as the purpose for the disk.

d) In the Replacement For field, select the decommissioned disk that you want to replace.
e) Click Next to start the repurposing.
An Initializing new disks on node_name job starts.
If you replaced the wrong disk, see Troubleshooting.

37
FRU Replacement Guide 1 Controller Node Replaceable Units

1.5 Power Supply Unit Replacement Procedure


The power supply units (PSU) on a Controller Node are redundant hot-swappable SuperMicro 1U 750w 74mm,
Platinum.
Prerequisites
• Obtain a replacement PSU from HGST.
• Ensure that the other PSU connected to this node is working, before pulling out the defective one.
Required Tools
• Ladder
Time Estimate: 5 minutes.
To replace a PSU proceed as follows:
1. Enable the location LED on the Controller Node.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Controller Nodes.
b) Select the correct node.
c) In the Commands pane, click Location LED On.
A blue LED on its front and back panels is now blinking.
2. Go to the rack and identify the correct chassis by the blinking blue LED on its front and back panels.
3. On the back of the rack, identify the failed PSU on the node identified in the previous step.
In the image below, the PSUs are labeled P1 and P2. The faulty PSU has an amber LED illuminated.
Figure 17: Controller Node, Back, with PSU Status LEDs Highlighted

4. Remove the failed PSU.


a) Disconnect the power cord from the failed PSU only.
b) Push the red release tab towards the power connector of the failed PSU.
c) Pull the PSU out of the node using the grab handle.
5. Install the replacement PSU.
a) Push the replacement PSU into the Controller Node, and listen for a click.
b) Connect the power cord to the replacement PSU.
6. Disable the location LED on the Controller Node.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Controller Nodes.
b) Select the desired Controller Node.
c) In the Commands pane, click Location LED Off.

38
FRU Replacement Guide 1 Controller Node Replaceable Units

1.6 SFP+ DAC Cable Replacement Procedure


The SFP+ cable connecting Controller Nodes to Storage Interconnects is a 10G SFP+ to SFP+ DAC Cable, 30AWG,
3m.
Prerequisites
None.
Required Tools
None.
To replace an SFP+ DAC cable, proceed as follows:
1. Identify the faulty SFP+ cable.
a) Go to the back of the rack.
b) Look for the Controller Node with a faulty SFP+ DAC cable.
The SFP+ DAC cables are in ports N2 and N3 as shown in the figure below. The faulty cable has an amber LED
illuminated.
Figure 18: Controller Node SFP+ Ports

2. Replace the faulty SFP+ DAC cable on the Controller Node end.
a) Unlatch the faulty cable by pulling very gently on its pull tab.
Once the latch is disengaged, the cable is loose.
b) Pull the faulty cable out of its port.

Warning: Do not pull the cable out by its pull tab, because the pull tab might break.

c) Plug the new SFP+ DAC cable into the same port.
The cable is reseated properly (the latch is engaged) when you hear a click.
3. Visually trace the faulty SFP+ DAC cable to the correct port on its Storage Interconnect end.
4. Replace the faulty SFP+ DAC cable on the Storage Interconnect end.
a) Unlatch the faulty cable by pulling very gently on its pull tab.
Once the latch is disengaged, the cable is loose.
b) Pull the faulty cable out of its port.

Warning: Do not pull the cable out by its pull tab, because the pull tab might break.
c) Plug the new SFP+ DAC cable into the same port.
The cable is reseated properly (the latch is engaged) when you hear a click.
5. Verify that the amber LED on the SFP+ DAC cable is off.

39
FRU Replacement Guide 2 Storage Node Replaceable Units

22 Storage Node Replaceable Units


Chapter

Topics: This section provides replacement procedures for the following parts in a Storage Node:
• Chassis
• Warnings
• HDD
• Chassis Replacement
• PSU
Procedure
• MiniSAS Cable
• Hard Disk Drive
Replacement Procedure
• Power Supply Unit
Replacement Procedure
• MiniSAS Cable
Replacement Procedure

2.1 Warnings
Caution:
• Opening or removing the system cover when the system is powered on may expose you to a risk of
electric shock.
• When replacing items from the inside of the chassis, ensure that you take precautions to prevent
electrostatic discharge (ESD).
• A Storage Node weighs about 43lbs. Ensure sufficient manpower to handle it safely.

2.2 Chassis Replacement Procedure


The Storage Node chassis is a SuperMicro UP 1U Server, 1018. Replacing the chassis replaces its NICs, CPU, memory,
motherboard, and fans, but not its disks.
Prerequisites
• Obtain a replacement Storage Node chassis from HGST.
• Obtain the virtual IP address of the Management Node.
• Obtain the admin password for the CMC.
• Obtain the root password.
• Fill in as much of the work table as possible before starting this procedure.
Required Tools
• Ladder
• Long Phillips-head screwdriver

40
FRU Replacement Guide 2 Storage Node Replaceable Units

Time Estimate: 3 hours.


Figure 19: Overview of Storage Node Chassis Replacement

41
FRU Replacement Guide 2 Storage Node Replaceable Units

A work table is provided at the end of this section for your convenience, to store all of the information needed for a
chassis replacement.
To replace a Storage Node chassis, proceed as follows:
1.

Enable the location LED on the Storage Node.

Tip: The sample outputs shown for this procedure are from Storage Node 6.

a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the correct node.
c) In the Commands pane, click Location LED On.
2.

Shut down the Storage Node from the CMC.

Note: Save the node's hostname in your worktable under Original Hostname of Node.

a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the desired Storage Node.
Figure 20: A Storage Node Pane in the CMC

c) In the Commands pane, click Shutdown.


Figure 21: The Shutdown Button in the Commands Pane

d) Wait for the Status field to change to DONE.

42
FRU Replacement Guide 2 Storage Node Replaceable Units

Warning: Even if all LEDs are off, you must still wait until the CMC shows DONE in the
Status field.

All I/O to the Storage Enclosure Basic attached to this Storage Node is now quiesced.
3.

Go to the rack and identify the correct chassis by the blinking blue LED on its front and back panels.
4.

Remove the failed chassis from the rack.


a) At the front of the chassis, loosen the rack mounting screws.
b) At the back of the chassis, disconnect the three network cables connected to the ports labeled as follows in the
image below.

Note: Check that the cables are labeled correctly, so that you can put them back in the same
order.

i. N1 (Public network #1)


ii. N2 (Public network #1)
iii. M2 (IPMI)
iv. S1 (Storage network #1)
v. S2 (Storage network #2)
Figure 22: Storage Node, Back

c) At the back of the chassis, disconnect the miniSAS cables.


Unlatch the miniSAS cables by pulling very gently on their pull tabs. While pulling gently on its pull tab, grasp
its metal connector or cord to pull the cable out of its port.

Warning: Do not pull the cable out by its pull tab, because the pull tab might break.

Observe the amber LED on the paired Storage Enclosure Basic indicating loss of connection.
d) At the back of the chassis, disconnect the two power cords.
In the image above, the power cords are connected to the PSUs labeled P1 and P2.
e) At the front of the chassis, slowly slide the chassis out until you reach the pull-safety at the midway point (you
will hear a soft clicking sound, and feel the chassis "catch" on the rails).
f) Disengage the pull-safety on both sides of the chassis and slide it out until the split line of the two top covers.
Push the pull-safety on one side up, and the pull-safety on the other side down.
g) Continue to slowly slide the chassis out until you reach the pull-safety at the end point, and disengage it as you
did the earlier one.
h) Safely unmount the chassis from the rack and place it on a table.

43
FRU Replacement Guide 2 Storage Node Replaceable Units

Caution: A Storage Node chassis weighs about 43lbs. Ensure that you have sufficient
manpower to handle it safely.

Warning: Once you pull the chassis past the pull-safety, do not leave it hanging in the rack.
Otherwise, the rack rails may be damaged permanently.

5.

Move the two HDDs from the failed chassis to the exact corresponding slots in the new chassis.

Tip: Write down the disk serial number and slot location so that you can double-check that each
disk is seated in the correct slot post installation into the new chassis.

a) Remove each disk from its slot in the front bay of the failed chassis.
b) Install the disk into the corresponding slot in the new chassis.
Figure 23: Storage Node, Front

6.

Install the new chassis into the rack.


a) Mount the new chassis onto the rack slides and slide it into the rack.

Caution: Mounting the chassis is a two person task.


b) Tighten the rack mounting screws to secure the chassis to the rack.
c) Reconnect the three network cables to the chassis ports.
The network cables are labeled.
i. Connect the cable labeled SNx.N1.SW1.Nxx to the port labeled N1 in the image below.
ii. Connect the cable labeled SNx.N2.SW2.Nxx to the port labeled N2 in the image below.
iii. Connect the cable labeled SNx.M2.SW1.Nxx to the port labeled M2 in the image below.
Figure 24: Storage Node, Back

d) Reconnect the miniSAS cables.

44
FRU Replacement Guide 2 Storage Node Replaceable Units

The miniSAS cables are labeled.


• Connect the cable labeled SNx.S1.DAx.SA to the port labeled S1 in the image below.
• Connect the cable labeled SNx.S2.DAx.SB to the port labeled S2 in the image above.
e) Reconnect the power cords.
7. Get the MAC address of the new chassis IPMI NIC from the BIOS.
a) Connect a VGA monitor and USB keyboard to the new chassis.
b) Power on the new chassis.
The power button is located on the chassis front control panel.
c) At power up, press Del to enter into the system BIOS.
d) In the system BIOS, navigate to IPMI > BMC Network Configuration > .
e) Record Station MAC Address in your work table under IPMI MAC Address of Node.
f) Exit the BIOS without saving any changes by pressing the ESC.
The boot process continues.
g) Disconnect the VGA monitor and USB keyboard from the new chassis.
8.

Get the IP address and machine name (hostname) of the new chassis.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Unmanaged Devices >
Uninitialized.
The new chassis appears in the list of uninitialized devices. This indicates that it has started successfully.
b) Write the value of Name into your work table, under Temporary Hostname of Node.
c) Write the value of Name without the PM- prefix into your work table, under MAC Address of Node.
d) Write the IP address into your work table, under Temporary IP Address of Node.
Figure 25: Uninitialized Nodes

9.

Get the bus-location-to-MAC-address mapping of the new chassis.


a) From the Management Node, open an SSH session to the new IP address of the Storage Node obtained in the
previous step.
Log in with username root and password rooter.

45
FRU Replacement Guide 2 Storage Node Replaceable Units

b) At the Linux prompt, run the following command:

for add in `ls /sys/devices/pci*/*/*/net/*/address`; do echo $add; cat


$add; done

The output of this command is similar to the example below.

/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/net/eth1/address
90:e2:ba:7e:b8:30
/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.1/
net/eth3/address
90:e2:ba:7e:b8:31
/sys/devices/pci0000:00/0000:00:1c.4/0000:07:00.0/
net/eth0/address
0c:c4:7a:33:38:10
/sys/devices/pci0000:00/0000:00:1c.4/0000:07:00.1/
net/eth2/address
0c:c4:7a:33:38:11
root@nfsROOT:~#

The output of this command shows the serial bus path (for example, 0000:02:00.1) and the new MAC
address (for example, 90:e2:ba:7e:b8:31).

Tip: As an alternative to the command above, you can use the command below to print only the
serial bus paths and MAC addresses in uppercase.
for add in `ls /sys/devices/pci*/*/*/net/*/address`; do echo -en
"`echo $add|sed 's/\// /g' | awk '{print $5}'`\t"; cat $add|tr 'a-f'
'A-F'; done

c) Fill in the serial bus path in ascending order in the Serial Bus Path column of the work table.
d) Fill in the MAC address corresponding to the serial bus path in ascending order in the MAC Address on the
New Chassis column of the work table.
For the sample output from the step above, the work table would look like this:

Table 7: Work Table with Sample MAC Addresses and Serial Bus Paths

Serial Bus Path MAC Address on the New Ethernet Port Name
Chassis
0000:02:00.0 90:e2:ba:7e:b8:30 eth0
0000:02:00.1 90:e2:ba:7e:b8:31 eth1
0000:07:00.0 0c:c4:7a:33:38:10 eth2
0000:07:00.1 0c:c4:7a:33:38:11 eth3
e) Close the SSH session to the Storage Node.
You are now back in the SSH session to the Management Node.
10.

Get the machine GUID and device GUID of the new chassis.
a) On the Management Node, start the Q-Shell:
/opt/qbase3/qshell

46
FRU Replacement Guide 2 Storage Node Replaceable Units

b) Create a cloudAPI connection.


cloudapi = i.config.cloudApiConnection.find('main')

c) Retrieve the machine GUID for the new chassis, using the value of Temporary Hostname of Node in
uppercase, from the work table, for hostname_of_new_node in the command below:
machine_guid = cloudapi.machine.find(name='hostname_of_new_node')['result'][0]

For example,

machine_guid = cloudapi.machine.find(name='PM-90:E2:BA:7E:B8:31')['result'][0]

d) Retrieve the device GUID using the machine GUID you obtained from the previous step.
dg = cloudapi.machine.list(machineguid=machine_guid)['result'][0]['deviceguid']

e) Sanity check: print the value of dg.


For example,

dg
'd951f6d9-7104-470d-8c97-ecf52d57c7b5'

11.

Mark the new chassis as FAILED in the Active Archive System database, and clean up references to it.
The Active Archive System created a new INSTOCK node in its database for the new chassis. If you do not mark
the new chassis as FAILED in the database, you are in effect adding a new node rather than replacing an existing
node's chassis. Therefore, you must remove the INSTOCK node by following the steps below.
a) Mark the new chassis as FAILED in the Active Archive System database:
Execute this command on the Management Node:
cloudapi.device.updateModelProperties(dg, \
status=str(q.enumerators.devicestatustype.FAILED))

The new chassis now appears under the FAILED list in the CMC, and is removed from the Unmanaged
Devices list.
Figure 26: The New Chassis Appears Under the FAILED List in the CMC

b) From the Management Node, clean up references to the new chassis in the Active Archive System database.

47
FRU Replacement Guide 2 Storage Node Replaceable Units

In the command below, replace MAC_ADDRESS with the value you wrote in the work table for MAC Address
of Node.

Important: Use capital letters for the MAC address.

q.amplistor.cleanupMachine('MAC_ADDRESS')

For example,

In [14]: q.amplistor.cleanupMachine('90:E2:BA:7E:B8:31')
Out[14]: True

This command takes about 10 seconds to complete.


c) Do a sanity check.
Refresh the screen by clicking Refresh in the Commands pane. Check that the new chassis is no longer in the
FAILED list.
12.

Update the Active Archive System database with the MAC addresses for the new chassis.
a) From the Management Node, create a cloudAPI connection.
cloudapi = i.config.cloudApiConnection.find('main')

b) From the Management Node, get the machine GUID using your work table value for Original Hostname of
Node.

Note: Use upper case for HOSTNAME_OF_OLD_NODE.

machine_guid = cloudapi.machine.find(name='HOSTNAME_OF_OLD_NODE')\
['result'][0]

For example,

machine_guid = cloudapi.machine.find(name='HGST-S3-DC01-R01-SN06')['result'][0]

c) From the Management Node, get the machine object.


machine = cloudapi.machine.getObject(machine_guid)

d) Display all the Ethernet port names (ethN) that are registered:
For example,
In [3]: machine = cloudapi.machine.getObject(machine_guid)
In [4]: print
machine.nics[0].name
eth1
In [5]: print
machine.nics[1].name
eth3
In [6]: print
machine.nics[2].name
eth2
In [7]: print
machine.nics[3].name
eth0

48
FRU Replacement Guide 2 Storage Node Replaceable Units

In [8]: print
machine.nics[4].name
BMC

e) Write the index of the above machine.nics[index].name value into the work table in column NIC
Array ID, in the row corresponding to ethN.
For the sample output from the step above, the work table would look like this:

Table 8: Work Table with Sample Ethernet Port Names and NIC Array IDs

Serial Bus Path MAC Address on the Ethernet Port Name NIC Array ID
New Chassis
0000:02:00.0 90:e2:ba:7e:b8:30 eth0 3
0000:02:00.1 90:e2:ba:7e:b8:31 eth1 0
0000:07:00.0 0c:c4:7a:33:38:10 eth2 2
0000:07:00.1 0c:c4:7a:33:38:11 eth3 1
IPMI See IPMI MAC Address BMC 4
of Node in the work table.
f) Update the database entry for machine.nics[N].hwaddr with the corresponding MAC address for ethN
from your work table.

Important: Use capital letters for the MAC address.


machine.nics[0].hwaddr = 'NEW_MAC_ADDRESS_FOR_ETHN'

For example,
machine.nics[0].hwaddr = '90:E2:BA:7E:B8:31'
machine.nics[1].hwaddr = '0C:C4:7A:33:38:11'
machine.nics[2].hwaddr = '0C:C4:7A:33:38:10'
machine.nics[3].hwaddr = '90:E2:BA:7E:B8:30'

g) Update the database entry for machine.nics[4].hwaddr with the corresponding IPMI MAC address
from your work table, under MAC Address of Node.

Important: Use capital letters for the MAC address.

For example,
machine.nics[4].hwaddr = '0C:C4:7A:36:8B:12'

13.

Update the MAC address of the IPMI NIC, and the DHCP leases.
a) Log into the CMC.
b) Navigate to the CMC's view of the node whose chassis you have replaced.
c) Select that node's Summary tab.

49
FRU Replacement Guide 2 Storage Node Replaceable Units

d) Write the IPMI IP address, as shown in the General section, in your work table, under IPMI IP Address of
Node.
Figure 27: The Old IPMI IP Address of the Node

e) Leave the current SSH session as is. Open a new SSH session on the Management Node.
f) Open /opt/qbase3/cfg/dhcpd/dhcpd.leases with your text editor.
g) Search for the IPMI IP address (obtained in substep d) in the file.
The section containing the IPMI IP address looks like the following example:
host 457f495a-80b7-4125-862b-5f87d9121cfa {
dynamic;
hardware ethernet 0c:c4:7a:36:8b:12:;
fixed-address 172.16.201.16;
group "pmachines";
}

h) Change the hardware ethernet value to the new IPMI MAC address in lowercase from your work table,
under IPMI MAC Address of Node.

Note: Use lowercase only.

For example,

hardware ethernet 0c:c4:7a:36:8b:12:;

i) Save and close the file.


j) Exit the new SSH session.
14.

In your previous SSH session, do a sanity check to verify that you have updated the new MAC addresses correctly.
Compare the output of the command below to your work table.
In [9]: for nic in machine.nics: nic.name; nic.hwaddr
...:
Out[9]: 'eth1'
Out[9]: '90:E2:BA:7E:B8:31'
Out[9]: 'eth3'
Out[9]: '0C:C4:7A:33:38:11'
Out[9]: 'eth2'
Out[9]: '0C:C4:7A:33:38:10'
Out[9]: 'eth0'
Out[9]: '90:E2:BA:7E:B8:30'

50
FRU Replacement Guide 2 Storage Node Replaceable Units

Out[9]: 'BMC'
Out[9]: '0C:C4:7A:36:8B:12'

15.

Save the Active Archive System database machine settings.


q.drp.machine.save(machine)

16.

Update and save the Active Archive System database device object.
a) Get the device object.
device = cloudapi.device.getObject(machine.deviceguid)

b) Update the MAC address of the chassis with the value you saved in the work table under MAC Address of
Node.

Note: Use capital letters for the MAC address.

device.nicports[0].hwaddr = 'NEW_MAC_ADDRESS'

For example,

In [12]: device.nicports[0].hwaddr='90:E2:BA:7E:B8:31'

c) Save the device object.


q.drp.device.save(device)

17.

Restart dhcpd.
In [14]: q.manage.dhcpd.restart()
Stopping dhcpd...
dhcpd is halted
Starting dhcpd...
dhcpd is running

18.

Reboot the new chassis.


a) From the Management Node, open an SSH session to the new chassis using the IP address from the work table,
under IP Address of Node.
The Linux prompt appears.
b) At the Linux prompt, run the following command:
reboot

51
FRU Replacement Guide 2 Storage Node Replaceable Units

Once rebooted, if you log into the CMC, you can see that the chassis now has the correct hostname as shown in the
figure below.
Figure 28: The Node with a Rebooted Chassis as Seen on the CMC

19.

When the node is restarted, update the main.cfg file and restart the application server.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
A list of Storage Nodes appears in the CMC.
b) Click the Storage Node whose chassis you have just replaced.
Identify the correct Storage Node by its hostname: it now matches the Original Hostname of Node value you
recorded in the worktable. This value is typically of the format SystemID-DCnn-Rnn-SNnn.
c) Identify the IP addresses listed in the Private IP field.
d) Open an SSH session to the Management Node, and exit the OSMI menu.
The Linux prompt appears.
e) Open an SSH session to the Storage Node, using any one of the IP addresses you obtained from substep c.
The Linux prompt appears.
f) At the Linux prompt on the Storage Node, open the file /opt/qbase3/cfg/qconfig/main.cfg with
your text editor.
The file has a section that looks like this:
[main]
lastlogcleanup = 1428960577
domain = somewhere.com
nodetype = STORAGENODE
nodename = 90E2BA7EB831
logserver_loglevel = 6
logserver_port = 9998
logserver_ip = 127.0.0.1
qshell_firstrun = False
machineguid = fc635662-5247-45b1-ab66-
d0abe8e60712

g) Replace the value after nodename = with the new MAC address from your work table, under MAC Address
of Node.

52
FRU Replacement Guide 2 Storage Node Replaceable Units

Note: The MAC address must be in uppercase and without colons. For example,
00:25:90:3B:C1:72 must be typed as 0025903BC172.

h) Save and close the configuration file.


i) Start the Q-Shell.
/opt/qbase3/qshell

j) Restart the application server:


In [1]: q.manage.applicationserver.restart()
Restarting applicationserver Applicationserver...
Stopping applicationserver Applicationserver...
Applicationserver is still running, waiting for 5
more seconds
Applicationserver is still running, waiting for 4
more seconds
Starting applicationserver Applicationserver...

k) Exit the Q-Shell.


quit()

20.

Verify that the bus information of the network interfaces matches the udev rules.
a) Run the following command:

Tip: Check the hardware paths in the command below, as they might be different on the new
chassis.

root@HGST-S3-DC01-R01-SN06: for add in `ls /sys/devices/pci*/*/*/net/*/


address`; do echo $add; cat $add; done

For example, the output of the above command looks like this:

/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/net/eth0/address
90:e2:ba:7e:b8:30
/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.1/net/eth1/address
90:e2:ba:7e:b8:31
/sys/devices/pci0000:00/0000:00:1c.4/0000:07:00.0/net/eth2/address
0c:c4:7a:33:38:10
/sys/devices/pci0000:00/0000:00:1c.4/0000:07:00.1/net/eth3/address
0c:c4:7a:33:38:11

Tip: As an alternative to the command above, you can use the command below to print only the
serial bus paths and MAC addresses in uppercase.
for add in `ls /sys/devices/pci*/*/*/net/*/address`; do echo -en
"`echo $add|sed 's/\// /g' | awk '{print $5}'`\t"; cat $add|tr 'a-f'
'A-F'; done

b) Compare the output of the command above to the contents of the file /etc/udev/rules.d/70-
persistent-net.rules.
For example, the contents of this file look like this:
root@HGST-S3-DC01-R01-SN06:~# cat /etc/udev/rules.d/70-persistent-net.rules

53
FRU Replacement Guide 2 Storage Node Replaceable Units

SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:02:00.0", KERNEL=="eth*",


NAME="eth0"
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:02:00.1", KERNEL=="eth*",
NAME="eth1"
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:07:00.0", KERNEL=="eth*",
NAME="eth2"
SUBSYSTEM=="net", ACTION=="add", KERNELS=="0000:07:00.1", KERNEL=="eth*",
NAME="eth3"

If they do not match, update /etc/udev/rules.d/70-persistent-net.rules to match the output


of the command above, then reboot the node again.

reboot

Sanity check: you can observe that the Storage Enclosure Basic LEDs are now solid green. In addition, the CMC shows
the node status as RUNNING.
Figure 29: Storage Node Status in the CMC

The chassis replacement procedure is done.

Warning: Be very careful when recording and updating MAC addresses. A mistake may render the new
chassis unusable.

Table 9: Work Table for Storage Node Chassis Replacement

Item Value
Virtual IP Address of the Management Node:
Get this value as instructed in the Using the Administrator
Interfaces chapter of the HGST Active Archive System
Administration Guide

Original Hostname of Node:


Get this value from the CMC before you shut down
the failed node. This value is typically of the format
SystemID-DCnn-Rnn-SNnn.

Temporary IP Address of Node:

54
FRU Replacement Guide 2 Storage Node Replaceable Units

Item Value
The CMC displays this value after the new chassis is
installed.

Temporary Hostname of Node:


The CMC displays this value after the new chassis is
installed. This value is of the format PM-MAC_ADDRESS.

MAC Address of Node:


IPMI IP Address of Node:
IPMI MAC Address of Node:

Serial Bus Path MAC Address on the New Ethernet Port Name NIC Array ID
Chassis
0000:01:00.0 eth0
0000:01:00.1 eth1
0000:01:00.2 eth2
0000:01:00.3 eth3
IPMI

55
FRU Replacement Guide 2 Storage Node Replaceable Units

2.3 Hard Disk Drive Replacement Procedure


The HDD on a Storage Node is an HGST 500 GB SATA 6 Gb/sec drive. It is hot swappable after being decommissioned
in the CMC.
Prerequisites
• Decommission the faulty drive in the CMC. For more information, see Managing Hardware in the HGST Active
Archive System Administration Guide.
• Obtain a replacement HDD from HGST.
Required Tools
• None
Time Estimate: 1 hour
To replace an HDD, proceed as follows:
1. Obtain details about the decommissioned disk from the CMC.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Disks > Decommissioned.
Figure 30: Decommissioned Disks in the CMC

b) Click the desired decommissioned disk.


The decommissioned disk details are displayed.
c) (Optional) Right-click anywhere in the decommissioned disk details, select Print > Print to PDF.
The decommissioned disk details contains the following information, which you will need to refer to later:
• The device name.
• The model type and serial number.
• A drive map showing the exact location of the disk (name of data center, rack, and node).
• The node type.
• The MAC addresses of the node.
• The current status of the node.
Figure 31: Drive Map Showing a Decommissioned Drive on a Storage Node

2. Enable the location LED on the Storage Node.


a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the correct node.
c) In the Commands pane, click Location LED On.
3. Go to the rack and identify the correct chassis by the blinking blue LED on its front and back panels.
4. Replace the decommissioned HDD.
a) Identify the decommissioned HDD by using the drive map.

56
FRU Replacement Guide 2 Storage Node Replaceable Units

The HDD is a front-bay drive.


Figure 32: Storage Node, Front

b) Press the release button on the drive carrier of the decommissioned HDD to extend the drive carrier handle.
Figure 33: Removing a Drive Carrier

c) Pull drive carrier out of the front bay using the drive carrier handle.
d) Compare the serial number on the HDD to the serial number specified in the decommissioned disk details to
confirm the that you have the correct HDD.
e) Unscrew the drive carrier from the decommissioned HDD.
f) Screw the drive carrier onto the replacement HDD.
g) Install the replacement HDD into the same slot that the decommissioned HDD was using.
A blue LED will blink for a moment.
5. Disable the location LED on the Storage Node.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the desired Storage Node.
c) In the Commands pane, click Location LED Off.
6. Confirm that the Active Archive System correctly determines the purpose for the new disk.
a) Wait 15 minutes.
b) In the CMC, navigate to Dashboard > Administration > HGST Active Archive System Management >
Logging > Events .
c) In the Events list, check to see that a new empty disk has been detected.
d) In the Jobs list, check to see that an Initializing new disk job has been triggered.
It may take about 2 minutes for the job to appear.
e) In the CMC, navigate to Dashboard > Administration > Hardware > Servers.
f) Select the desired node.
g) Select the Disks tab.

57
FRU Replacement Guide 2 Storage Node Replaceable Units

h) Wait for the physical drive that has been replaced, as well as the logical disks, to change status from a red icon to
a green icon.

Note: The physical drive that has been replaced, as well as the logical disks, may take up to 40
minutes to change status.

The Initializing new disk job has completed successfully when the number of degraded disks decreases by
1.
7. If the disk still shows up in the Degraded or Unmanaged list, you must manually specify the purpose of the new
disk:
a) In the CMC, navigate to Dashboard > Administration > Hardware > Disks > Unmanaged.
b) Select the new disk, and in the Commands pane, click Repurpose.
c) In the Use As field, select Replacement Disk.

Note: You can only select Replacement Disk when there is a decommissioned disk. If there are
no decommissioned disks, you can only select Additional Disk as the purpose for the disk.

d) In the Replacement For field, select the decommissioned disk that you want to replace.
e) Click Next to start the repurposing.
An Initializing new disks on node_name job starts.
If you replaced the wrong disk, see Troubleshooting.

58
FRU Replacement Guide 2 Storage Node Replaceable Units

2.4 Power Supply Unit Replacement Procedure


The power supply units (PSU) on a Controller Node are redundant hot-swappable SuperMicro 1U 750w 74mm,
Platinum.
Prerequisites
• Obtain a replacement PSU from HGST.
• Ensure that the other PSU connected to this node is working, before pulling out the defective one.
Required Tools
• Ladder
Time Estimate: 5 minutes.
To replace a PSU proceed as follows:
1. Enable the location LED on the Storage Node.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the correct node.
c) In the Commands pane, click Location LED On.
2. Go to the rack and identify the correct chassis by the blinking blue LED on its front and back panels.
3. On the back of the rack, identify the failed PSU on the node identified in the previous step.
In the image below, the PSUs are labeled P1 and P2.
Figure 34: Storage Node, Back, with PSU Status LEDs Highlighted

4. Remove the failed PSU.


a) Disconnect the power cord from the failed PSU only.
b) Push the red release tab towards the power connector of the failed PSU.
c) Pull the PSU out of the node using the grab handle.
5. Install the replacement PSU.
a) Push the replacement PSU into the Storage Node, and listen for a click.
b) Connect the power cord to the replacement PSU.
6. Disable the location LED on the Storage Node.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the desired Storage Node.
c) In the Commands pane, click Location LED Off.

59
FRU Replacement Guide 2 Storage Node Replaceable Units

2.5 MiniSAS Cable Replacement Procedure


The miniSAS cable connects a Storage Node to its paired Storage Enclosure Basic. It is a 12G miniSAS, miniSAS HD,
28 AWG, 3m or 6m.
To replace a miniSAS cable, see MiniSAS Cable Replacement Procedure on page 104.

60
FRU Replacement Guide 3 Storage Interconnect Replaceable Units

33 Storage Interconnect Replaceable Units


Chapter

Topics: This section provides replacement procedures for the following parts in a Storage
Interconnect:
• Warnings
• Chassis
• Storage Interconnect
• Fan
Replacement Procedure
• PSU
• Fan Replacement • SFP+ 1G Module
Procedure • SFP+ DAC Cable
• Power Supply Unit
Replacement Procedure
• SFP+ 1G Module
Replacement Procedure
• SFP+ DAC Cable
Replacement Procedure

3.1 Warnings
Caution:
• All data on the Active Archive System is unavailable during repair.
• Always perform a health check after replacing a Storage Interconnect . Consult the HGST Active
Archive System Administration Guide, in the chapter "Monitoring the System".
• When you replace a Storage Interconnect by a one with firmware version 3.2.0.3 (instead of 2.2.0.5),
upgrade first the Telemetry Collection Software to version 1.0.181. Consult the HGST Active
Archive System Administration Guide, in the chapter "Upgrading the System".
• Upgrade the firmware of all Storage Interconnects to 3.2.0.3, once you have installed one Storage
Interconnect with the new firmware.

3.2 Storage Interconnect Replacement Procedure


Prerequisites
• Obtain a replacement Storage Interconnect from HGST Support.
• Obtain a Storage Interconnect configuration file from HGST Support.
Required Tools
• Ladder
• Long Phillips-head screwdriver
Time Estimate: Unknown.

Important: If Storage Interconnect 1, (SWA, lower), is down, then IPMI is down.

To replace a Storage Interconnect, proceed as follows:


1. Edit the Storage Interconnect configuration file you obtained from HGST Support as follows:
a) Modify the IP address for the syslog target to match your environment.
The syslog target is the Management Node.

61
FRU Replacement Guide 3 Storage Interconnect Replaceable Units

b) Modify the IP address for VLAN to match your environment.


c) Modify the time and time zone to match your environment.
2. Load your modified configuration file into the replacement Storage Interconnect as follows:
a) Open a TELNET session to the replacement Storage Interconnect.
telnet switchport_IP

Replace switchport_IP with the default IP address of the new Storage Interconnect:
For Storage Interconnect 1 (lower), the default IP address is 192.168.123.123.
For Storage Interconnect 2 (upper), the default IP address is 192.168.123.123.
The login prompt appears.
Log in using the default credentials (username admin, password none or HGSTHGST, depending on
the rack's manufacture date).
The Storage Interconnect command prompt appears.
b) Type enable to enter the Privileged EXEC command mode.

(Routing)>enable
c) Type configure to enter the Privileged CONF command mode.

(Routing)#configure
(Routing) (Config)#
d) Configure the name of the switch:
Replace DC01 with the data center index you use for this data center. Replace SW01 with SW02 if you are
replacing the upper Storage Interconnect.

(Routing) (Config)#DC01-SW01
(DC01-SW01) #

The prompt changes to the new name of the switch.


e) Copy the entire contents of your modified Storage Interconnect configuration file, and paste it into the telnet
session.
f) (For 3 geo systems) Configure the uplink port 0/48:
(DC01-SW01) #configure
(DC01-SW01) (Config)#interface 0/48
(DC01-SW01) (Interface 0/48)#vlan pvid 100
(DC01-SW01) (Interface 0/48)#vlan participation include 100
(DC01-SW01) (Interface 0/48)#no shutdown
(DC01-SW01) (Interface 0/48)#no spanning-tree bpdufilter
(DC01-SW01) (Interface 0/48)#exit
(DC01-SW01) (Config)#

g) Save the new configuration.

(DC01-SW01) #write memory


Are you sure you want to save? (y/n) y
Config file 'startup-config' created successfully.
Configuration Saved!
(DC01-SW01) #

This operation may take a few minutes. Management interfaces are not available during this time.
h) Verify that the settings on the Storage Interconnect match your configuration file.

(DC01-SW01) #show startup-config

62
FRU Replacement Guide 3 Storage Interconnect Replaceable Units

i) If the output of show startup-config does not show flow control as enabled on the Storage Interconnect,
enable it:
Type the following command:

(DC01-SW01) >enable
(DC01-SW01) #show startup-config | include flowcontrol

If the output of the command above is not flowcontrol symmetric, type the following commands:

(DC01-SW01) #configuration
(DC01-SW01) (Config) #flowcontrol symmetric
(DC01-SW01) (Config) #exit
(DC01-SW01) #write memory

This operation may take a few minutes.


Management interfaces will not be available during this time.

Are you sure you want to save? (y/n) y

Config file 'startup-config' created successfully .

Configuration Saved!

(DC01-SW01) #

3. Remove the two blanking plates from the top front of the rack.
4. Identify the faulty Storage Interconnect.
Figure 35: A Single-Rack Active Archive System

5. Power off the faulty Storage Interconnect:


Disconnect the power cords on the back of the faulty Storage Interconnect.
6. Remove the faulty Storage Interconnect from the rack.

63
FRU Replacement Guide 3 Storage Interconnect Replaceable Units

a) Disconnect the labeled network cables from the Storage Interconnect.


b) Unscrew the mounting screws from the Storage Interconnect.
c) From the front of the rack, push the Storage Interconnect out of the rack. Move around to the back of the rack and
slide it out of the rack.
7. Install the new Storage Interconnect into the rack.
a) Slide the new Storage Interconnect into the rack.
b) Tighten new Storage Interconnect screws to secure it to the rack.
c) Reconnect the network cables to the new Storage Interconnect.
The network cables are labeled.
i. Connect the cable labeled SN1.Nx.SWx.N01 to the port labeled 01 in the image below.
ii. Connect the cable labeled SN2.Nx.SWx.N03 to the port labeled 03 in the image below.
iii. Connect the cable labeled SN3.Nx.SWx.N05 to the port labeled 05 in the image below.
iv. Connect the cable labeled SN4.Nx.SWx.N07 to the port labeled 07 in the image below.
v. Connect the cable labeled SN5.Nx.SWx.N09 to the port labeled 09 in the image below.
vi. Connect the cable labeled SN6.Nx.SWx.N11 to the port labeled 11 in the image below.
vii. Connect the cable labeled CN1.Nx.SWx.N33 to the port labeled 33 in the image below.
viii. Connect the cable labeled CN2.Nx.SWx.N35 to the port labeled 35 in the image below.
ix. Connect the cable labeled CN3.Nx.SWx.N37 to the port labeled 37 in the image below.

64
FRU Replacement Guide 3 Storage Interconnect Replaceable Units

x. (For SW1 only) And so on (refer to the Signal Cabling Scheme below).
Figure 36: Storage Interconnect Port Reservations

Figure 37: Signal Cabling Scheme

8. Reconnect the power cords to the new Storage Interconnect.


The Storage Interconnect begins to power up as soon as the power cables are connected.
9. Reattach the two blanking plates onto the top front of the rack.

65
FRU Replacement Guide 3 Storage Interconnect Replaceable Units

3.3 Fan Replacement Procedure


The Storage Interconnect has redundant hot-swappable fans.
Prerequisites
• Obtain a replacement fan from HGST.
Required Tools
• Ladder
• Long Phillips-head screwdriver
Time Estimate: Unknown.
This section describes how to replace a fan of the Storage Interconnect.
To replace the fan, do the following:
1. Remove the two blanking plates from the top front of the rack.
2. Identify the Storage Interconnect that has the faulty fan.
Figure 38: A Single-Rack Active Archive System

3. Identify the faulty fan.


4. Squeeze the two tabs inward on the faulty fan and pull the fan out.
5. Push the new fan into the same slot.
6. Reattach the two blanking plates onto the top front of the rack.

66
FRU Replacement Guide 3 Storage Interconnect Replaceable Units

3.4 Power Supply Unit Replacement Procedure


The Storage Interconnect has redundant hot-swappable power supply units (PSUs).
Prerequisites
• Obtain a replacement PSU from HGST.
Required Tools
• Ladder
• Long Phillips-head screwdriver
Time Estimate: Unknown.
To replace a PSU, proceed as follows:
1. Remove the two blanking plates from the top front of the rack.
2. Identify the Storage Interconnect that has the faulty PSU.
Figure 39: A Single-Rack Active Archive System

3. On the front side, identify the faulty PSU (amber colored LED).
4. Unplug the power cable of the faulty PSU.
5. Push the blue release tab towards the power connector and use the grab handle to remove the faulty PSU.
6. Push the new PSU into the Storage Interconnect.
7. Attach the power cable to the new PSU.
8. Reattach the two blanking plates onto the top front of the rack.

67
FRU Replacement Guide 3 Storage Interconnect Replaceable Units

3.5 SFP+ 1G Module Replacement Procedure


The SFP+ 1G RJ45 module connected to a Storage Interconnect is an Intel E10GSFPSR FTLX8571D3BCV-IT..
Prerequisites
None.
Required Tools
None.
To replace an SFP+ 1G module on a Storage Interconnect, proceed as follows:
1. Identify the Storage Interconnect that has the faulty SFP+ 1G module.
Figure 40: A Single-Rack Active Archive System

a) Go to the back of the rack.


b) Look for the Storage Interconnect whose SFP+ 1G module has an amber LED illuminated on its metal connector.
2. Remove the faulty SFP+ 1G module from the Storage Interconnect.
a) Make a note of which Storage Interconnect port the SFP+ 1G module is connected to.
b) Pull the SFP+ 1G module out of the Storage Interconnect.
c) Disconnect the LC-to-LC optical cable from the SFP+ 1G module.
3. Install the new SFP+ 1G module.
a) Connect the LC-to-LC optical cable you removed from the faulty SFP+ 1G module to the new SFP+ 1G module.
The cable is reseated properly (the latch is engaged) when you hear a click.
b) Push the SFP+ 1G module into the same Storage Interconnect port.
4. Verify that the network/activity LEDs on the new SFP+ 1G module are illuminated/active.

68
FRU Replacement Guide 3 Storage Interconnect Replaceable Units

3.6 SFP+ DAC Cable Replacement Procedure


The SFP+ cable connecting Controller Nodes to Storage Interconnects is a 10G SFP+ to SFP+ DAC Cable, 30AWG,
3m.
Prerequisites
None.
Required Tools
None.
To replace an SFP+ DAC cable, proceed as follows:
1. Identify the faulty SFP+ cable.
a) Go to the back of the rack.
b) Look for the Controller Node with a faulty SFP+ DAC cable.
The SFP+ DAC cables are in ports N2 and N3 as shown in the figure below. The faulty cable has an amber LED
illuminated.
Figure 41: Controller Node SFP+ Ports

2. Replace the faulty SFP+ DAC cable on the Controller Node end.
a) Unlatch the faulty cable by pulling very gently on its pull tab.
Once the latch is disengaged, the cable is loose.
b) Pull the faulty cable out of its port.

Warning: Do not pull the cable out by its pull tab, because the pull tab might break.

c) Plug the new SFP+ DAC cable into the same port.
The cable is reseated properly (the latch is engaged) when you hear a click.
3. Visually trace the faulty SFP+ DAC cable to the correct port on its Storage Interconnect end.
4. Replace the faulty SFP+ DAC cable on the Storage Interconnect end.
a) Unlatch the faulty cable by pulling very gently on its pull tab.
Once the latch is disengaged, the cable is loose.
b) Pull the faulty cable out of its port.

Warning: Do not pull the cable out by its pull tab, because the pull tab might break.
c) Plug the new SFP+ DAC cable into the same port.
The cable is reseated properly (the latch is engaged) when you hear a click.
5. Verify that the amber LED on the SFP+ DAC cable is off.

69
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units

44 Power Distribution Unit Replaceable Units


Chapter

Topics: This section provides replacement procedures for a power distribution unit (PDU):
• PDU
• Warnings
• Power Distribution Unit
Replacement Procedure

4.1 Warnings
Caution: During the replacement of a PDU, the Active Archive System is running on a single power
source.

4.2 Power Distribution Unit Replacement Procedure


Prerequisites
• Obtain a replacement power distribution unit from HGST.
Required Tools
• Ladder
• Long Phillips-head screwdriver
To replace a Power Distribution Unit (PDU) in a rack, proceed as follows:

Note: PDU01 is located on the right side when facing the back of the rack (the side of switch ports).

1. Power down the entire rack.


a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers.
b) Select each of the six Storage Nodes, and in the right pane, click Shutdown.
c) Shutdown Controller Node 3, then Controller Node 2, then Controller Node 1 (in other words, shut down the
Management Node last).
2. Go to the data center and identify the defective PDU.
3. Remove the defective PDU from the rack.
a) Disconnect the external power cables from the rack.
b) Unscrew the green ground wire from the rack of the desired PDU.
c) Disconnect the internal power cables from the desired PDU.
d) Unscrew the rear support bracket of the desired PDU.
e) Unscrew the mounting screws at the front of the desired PDU.
f) Pull the desired PDU out of the rack. Be careful to feed the external power cable along with the desired PDU out
of the rack.

Caution: Two people should be used to unmount the PDU.

Warning: Once you pull the PDU past the pull-safety, do not leave the PDU hanging in the rack.
Otherwise, the rack rails may be damaged permanently.

4. Install the replacement PDU into the rack.

70
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units

a) Feed the external power cable of the replacement PDU into the rack as you slide the replacement PDU into the
slides of the rack.

Caution: Two people should be used to mount the PDU.


b) Screw the front mounting screws of the PDU to secure it.
c) Screw the green rack ground wire to the rack.
d) Screw the rear support bracket onto the desired PDU.
e) Connect the internal power cables to the desired PDU.
5. Program the startup/delay times for each port on the replacement PDU.
a) Connect to the replacement PDU's Ethernet port.
i. Connect an RJ45 Ethernet cable from your computer to the ENET port on the back of the replacement
PDU.
ii. (If your computer is running Windows 7) Click Start and type Java in the search box.
iii. Choose the Configure Java application (also called the Java control panel).
Figure 42: Windows 7 Java Control Panel

71
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units

iv. In the Java control panel, select Edit Site List and add an exception for the following website:
http://192.168.123.123/
Figure 43: Windows 7 Java Control Panel: Exception Site List

v. Click Add to add the IP Address listed above, and then click OK.
vi. Click Continue to add the exception.
Figure 44: Windows 7 Java Control Panel: Security Warning

72
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units

vii. Click OK to save the settings.


Figure 45: Windows 7 Java Control Panel: Confirmation

viii. Start your web browser (Chrome or Firefox recommended), and type http://192.168.123.123/
in the address bar.
The PDU's login dialog appears, as in the image below.
Figure 46: PDU Login Dialog

73
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units

Note: On the first-time running on the browser, you are asked by JAVA if you want to
run on this page. Click "Run" button or "OK" when asked for prompts.
ix. Log into the PDU with username admin and password admin.
b) Verify the status of each electrical socket.
i. On the main menu of the PDU interface, click Status to view the PDU status.
Figure 47: PDU Main Menu

ii. The PDU status panel appears, as in the image below.

74
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units

Tip: To refresh the status, click Refresh. To close the panel, click Close.

Figure 48: PDU Status Panel

c) Program the startup sequence for each port.


i. On the main menu of the PDU interface, click Configure to view/change the configuration options.
Figure 49: PDU Main Menu

ii. A Java window pops up with PDU startup sequence delay parameters.

75
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units

Tip: The Reset Delay value specifies how long the electrical socket waits in the OFF
position before switching to the ON position. It may be useful to change this value when
you want to quickly reset all ports without pulling out the main power cable connected to
the rack.

Figure 50: PDU Configuration Panel

iii. For each port listed in the Name columns in each of the three sections (Branch XY, Branch YZ, and
Branch ZX), enter the values for ON Delay and Reset Delay exactly as shown in the image below.

76
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units

Caution: Refer to the correct image based on whether you replaced the upper PDU
(PDU2) or the lower PDU (PDU 1).

Figure 51: ON Delay Values for PDU 2 (Upper Horizontal PDU)

Figure 52: ON Delay Values for PDU 1 (Lower Horizontal PDU)

iv. When you are finished, click Save, wait for 30 seconds, and then click Close.

77
FRU Replacement Guide 4 Power Distribution Unit Replaceable Units

6. Power up the entire rack.


a) Connect the external power cords of the rack to two different power distribution networks.
The rack begins to power up as soon as the power cords are connected. The intelligent programmable PDUs
control the bringup sequence.
b) Confirm that all nodes power on in the right order. There is a short gap between each segment:
i. Storage Interconnect
ii. Controller Nodes
iii. Storage Enclosure Basic
iv. Storage Nodes
c) Log into the CMC.
d) Wait until the CMC displays the status of the Management Node as RUNNING; in other words, its startup is
complete.
e) Verify that the CMC dashboard indicates that the system status is good:
Disk Safety is 5 in a single geo system, or 8 in a 3-geo system..
Controller Nodes indicate the correct number are UP.
Storage Nodes indicate the correct number are UP.
MetaStores indicate the correct number are OK.
Disks displays the correct number for your system, and none are degraded or decommissioned.
No status indicator is red.
f) Verify that the CMC displays the status of at least 5 Storage Nodes as RUNNING:
Navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes. Check the Status field.

78
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

55 Storage Enclosure Basic Field Replaceable Units


Chapter

Topics: This section provides replacement procedures for the following parts in a Storage Enclosure
Basic:
• Visual Indicator and
• Sled
Field Replaceable Units
• HDD
Locations
• Power Cord
• Sled Replacement
• MiniSAS Cable
Procedure
• Rear Fan
• Power Cord Replacement
• PSU
Procedure
• I/O Canister
• MiniSAS Cable • Sled Blank (to be replaced with a fully populated sled)
Replacement Procedure
• Rear Fan Replacement
Procedure
• Power Supply Unit
Replacement Procedure
• I/O Canister Replacement
Procedure
• Storage Enclosure Basic
Capacity Upgrades

5.1 Visual Indicator and Field Replaceable Units Locations


The Storage Enclosure Basic displays the following visual indicators:
• I/O Canister
◆ 1 Green enclosure OK LED
◆ 1 Blue enclosure Identify LED
◆ 1 Amber enclosure Fault LED
• SAS Riser (one set of LEDs per host port, 2 sets total)
◆ 1 Green Link OK LED
◆ 1 Amber Identify/Fault LED
• Fan (one set of LEDs per fan, 3 sets total)
◆ 1 Amber Fan Identify/Fault LED
• PSU (one set of LEDs per PSU, 2 sets total)
◆ 1 Green AC OK LED
◆ 1 Green DC OK LED
◆ 1 Blue PSU Identify LED
◆ 1 Amber PSU Fault LED
• Sled (one set of LEDs per sled, 7 sets total)
◆ 1 Amber Sled Identify/Fault LED
◆ 1 Amber Drive Identify/Fault LED per drive (14 per sled)

79
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

The following diagram displays the visual indicators for the I/O canister, sled, and the rear fans in the Storage Enclosure
Basic:
Figure 53: System Enclosure Information

The following diagrams display the physical locations of the various FRUs and visual indicators in the Storage
Enclosure Basic:

80
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

Storage Enclosure Basic


Figure 54: Rear Fan Order

Figure 55: Sled HDD Order

81
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

5.2 Sled Replacement Procedure


Based on the full configuration of the Storage Enclosure Basic, an enclosure contains up to seven sleds.
Required Tools
• None
To replace a sled, do the following:

Note: Ensure that you store all removed parts in a safe location while replacing the FRU.

1. Shut down the Storage Node from the CMC.

Caution: Shut down only the Storage Node that is paired with the Storage Enclosure Basic
containing the FRU.

a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the desired Storage Node.
Figure 56: A Storage Node Pane in the CMC

c) In the Commands pane, click Shutdown.


Figure 57: The Shutdown Button in the Commands Pane

d) Wait for the Status field to change to DONE.

Warning: Even if all LEDs are off, you must still wait until the CMC shows DONE in the
Status field.

All I/O to the Storage Enclosure Basic attached to this Storage Node is now quiesced.

82
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

2. Unplug the power cables by lifting the power cord retention bale and carefully removing the power cord from the
power supply.

Note: Repeat for both the A and B power supplies.

Figure 58: Removing the Power Cords

Note: Power cords marked in red.

3. Unplug the miniSAS cables from the I/O module.


Figure 59: Removing the MiniSAS Cables

83
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

Note:
• The miniSAS cables are marked in red.
• Take note of which miniSAS cable can from which port to ensure that they are plugged in
correctly when reassembling.

4. Unlock the I/O module from the enclosure by pulling the latch handle out and away from the I/O module.
Figure 60: Unlocking the I/O Module

84
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

5. Slide the I/O module away from the chassis.


Figure 61: Removing the I/O Module

85
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

6. On the front of the sled, depress the latch mechanism button and pull it until it is at a 45 degree angle.
Figure 62: Sled Release Button

Figure 63: Sled Release at 45 Degrees

86
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

7. Pull the failed sled out of the chassis.


Figure 64: Removing the Sled

Note:
• Ensure that you remove and replace the sleds and sled blanks in the same order.
• Repeat the two previous steps until all of the sleds, in need of replacement, have been removed
from the chassis.

87
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

8. On the sled, slide the drive cover forward and up until the cover has been removed.
Figure 65: Sled Cover

88
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

9. To remove the hard disk drives from the failed sled, on the hard drive carrier, depress the two buttons and remove
the drive.
Figure 66: Hard Disk Drive Carrier Buttons

Figure 67: Removing the Hard Disk Drive with Carrier

10. To install the hard disk drives into the replacement sled, on the hard drive carrier, depress the two buttons and insert
the drive into the first sled slot.
11. Repeat the previous step until all drive slots within the sled are populated.
12. Install the replacement sled in the reverse order that you removed it.
13. Install the remaining enclosure components in the reverse order that you removed them.
14. Power on the Storage Node.
The power button is located on the chassis front control panel.

89
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

5.2.1 Hard Disk Drive Replacement Procedure


The hard disk drive FRU is defined as the hard disk drive with the drive carrier. This means that you need to replace the
hard disk drive along with the carrier.
The Active Archive System is designed for a high level of durability. The system protects data against a number
of simultaneous drive or Storage Node failures, as specified by the storage policies in use. The storage policies,
in combination with BitSpread's hierarchical spread rules, implement this protection, as well as protection against
data center failures in a three-geo environment. However, when drives degrade and are decommissioned, careful
consideration must be taken in how and when they are replaced, because the replacement procedure is manually
intensive and, if not carried out correctly, human error can result in replacing the wrong drive, putting existing drives
back in the wrong order, putting sleds back in the wrong order, and so on. Such error, even within a three-geo system,
can lead to serious system degradation, including data loss or unavailability. Therefore, as a preface to this drive
replacement procedure, examine the guidelines below to understand how and when to do drive replacements.
Guidelines for Replacing Multiple Drives in a Single Geo Environment
Service one Storage Enclosure Basic at a time. In other words, follow this entire procedure, from start to finish, on
a single Storage Enclosure Basic, before moving on to the next one. Ensure that the Storage Enclosure Basic being
serviced, and its paired Storage Node, are back online before you repeat this entire procedure on another Storage
Enclosure Basic.

Warning: For a single geo system, this procedure involves taking the paired Storage Node offline, which
can result in data unavailability for both the large file and the small file policy. Therefore, you must put
the MetaStores into read-only mode at the start of this procedure, and then reactivate them at the end of
this procedure.

Guidelines for Replacing Multiple Drives in a Three Geo Environment


To prevent significant reduction to the environment's disk safety, service one Storage Enclosure Basic at a time. In other
words, follow this entire HDD replacement procedure from start to finish on a single Storage Enclosure Basic before
moving on to the next. Ensure that the Storage Enclosure Basic being serviced, and its paired Storage Node, are back
online before you repeat this entire procedure on another Storage Enclosure Basic.
Required Tools: None
Time Estimate: 30 minutes per HDD
Prerequisites:
Obtain replacement HDD(s) from HGST Support.
On single geo systems only, put all MetaStores into read-only mode. For instructions, see Marking a MetaStore as
Read-Only on page 131.

1. Confirm that the CMC displays the faulty drive as DECOMMISSIONED.


Navigate to Dashboard > Administration > Hardware > Disks > Decommissioned.
All decommissioned disks are listed here.
Only disks listed in the Decommissioned list are eligible for replacement. Do not attempt to replace disks that are
in the Decommissioning or Degraded list.
2. Obtain details about the decommissioned disk from the CMC.

90
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

a) In the CMC, navigate to Dashboard > Administration > Hardware > Disks > Decommissioned.
Figure 68: Decommissioned Disks in the CMC

b) Click the desired decommissioned disk.


A PDF is displayed.
c) Click Save or Print.
The PDF contains decommissioned disk details, which you need to refer to for:
• The device name
• The model type and serial number
• A drive map showing the exact location of the disk (name of data center, rack, and node)
• The type of node (CPUNODE/STORAGENODE)
• The hostname of the node (Node name)
• The MAC addresses of the node
• The current status of the node
Figure 69: Decommissioned Disk Details from the CMC

3. Get the IP address of the Storage Node.


a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the correct Storage Node based on the decommissioned disk details.
c) In the Commands pane, click Location LED On.
4. Enable the location LED on the Storage Node.
a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the correct node.
c) In the Commands pane, click Location LED On.
5. Enable the blue blinking location LED on the decommissioned drive.
a) Open an SSH session to any Controller Node.
The OSMI menu appears.
b) Exit the OSMI menu.
The Linux prompt appears.

91
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

c) Open an SSH session to the Storage Node that is paired with the Storage Enclosure Basic containing the
decommissioned drive.
d) Start the Q-Shell.

/opt/qbase3/qshell

Welcome to qshell

? -> Introduction to features.


help() -> python help system.
object? -> Details about 'object'.
object?? -> Extended details about 'object'.

Type q. and press [TAB] to list qshell library


Type i. and press [TAB] to list interactive commands

e) Identify the drive slot number based on the serial number in the decommissioned disk details you obtained in
Step 1.

Note: Replace hostname_of_storage_node with the value of Node name in the


decommissioned disk details, and replace device_serial_number with the value of Serial
in the decommissioned disk details.

In [1]:api = i.config.cloudApiConnection.find('main')

In [2]:mguid = api.machine.find(name='hostname_of_storage_node')['result'][0]

In [3]:print(api.disk.list(machineguid=mguid, serial_number='device_serial_number')
['result'][0]['bus_location'])
EXP_SLOT_69

In the example above, the drive slot number is 69.


f) Exit the Q-Shell.
quit()

g) Identify the sg number of the Storage Enclosure Basic.

lsscsi -g | grep PEAK


[0:0:42:0] enclosu HGST PIKES PEAK 0109 - /dev/sg42

In the example above, the sg number is /dev/sg42.


h) Issue the command to the Storage Enclosure Basic to enable ("set") the LED on the drive identified by the drive
slot number obtained in substep e.

Important: Subtract 1 from the drive slot number, because the index starts from 0.
For example, if the drive slot number is 69, you must use 68 in the command below.

sg_ses /dev/sg42 -p2 -I0,68 -S ident

i) Log out of the Storage Node.


j) Log out of the Controller Node.
6. Shut down the Storage Node from the CMC.

Caution: Shut down only the Storage Node that is paired with the Storage Enclosure Basic
containing the FRU.

a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.

92
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

b) Select the desired Storage Node.


Figure 70: A Storage Node Pane in the CMC

c) In the Commands pane, click Shutdown.


Figure 71: The Shutdown Button in the Commands Pane

d) Wait for the Status field to change to DONE.

Warning: Even if all LEDs are off, you must still wait until the CMC shows DONE in the
Status field.

All I/O to the Storage Enclosure Basic attached to this Storage Node is now quiesced.
7. Go to the rack and identify the correct chassis by the blinking blue LED on its front and back panels.
8. Locate the enclosure that contains the failed hard disk drive.

Note: The enclosure containing the failed drive will have a flashing blue identification LED.

9. Unplug the power cables by lifting the power cord retention bale and carefully removing the power cord from the
power supply.

93
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

Note: Repeat for both the A and B power supplies.

Figure 72: Removing the Power Cords

Note:
• Power cords marked in red.
• Cord retention bale marked in blue.

94
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

10. Unplug the miniSAS cables from the I/O canister.


Figure 73: Removing the MiniSAS Cables

Note:
• The miniSAS cables are marked in red.
• Take note of which miniSAS cable came from which port to ensure that they are plugged in
correctly when reassembling.

95
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

11. Unlock the I/O canister from the enclosure by pulling the latch handle out and away from the I/O canister.
Figure 74: Unlocking the I/O Canister

12. Slide the I/O canister away from the chassis.


Figure 75: Removing the I/O Canister

96
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

13. Locate the failed sled by identifying the Sled Fail/Identify indicator is blinking amber.
Figure 76: Sled HDD Order

97
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

14. On the front of the first sled, depress the latch mechanism button and pull it until it is at a 45 degree angle.
Figure 77: Sled Release Button

Figure 78: Sled Release at 45 Degrees

98
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

15. Remove the sled in need of a new hard disk drive, out of the chassis.
Figure 79: Removing the Sled

Note:
• Store all removed parts in a safe location.
• Ensure that you remove and replace the sleds and sled blanks in the same order.
• Repeat the two previous steps until all of the disks in need of replacement have been removed
from the sled.

99
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

16. On the sled, slide the drive cover forward and up until the cover has been removed.
Figure 80: Sled Cover

17. Identify the drive to be replaced by referring to the drive map you obtained from the CMC and the amber LED.

Tip: The correct drive is the one whose blue arrow is pointing at the illuminated amber LED.

100
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

18. To remove the hard disk drive, on the hard drive carrier, depress the two buttons and remove the drive.
Figure 81: Hard Disk Drive Carrier Buttons

Figure 82: Removing the Hard Disk Drive with Carrier

19. Install the replacement hard disk drive with carrier in the reverse order that you removed it.
20. Install the remaining enclosure components in the reverse order that you removed them.
21. Re-connect the enclosure to the power cords.
22. On the PSUs, identify that the AC and DC LEDs display green indicators.
23. Power on the Storage Node.
The power button is located on the chassis front control panel.
24. Disable the identification LED on the Storage Enclosure Basic.
a) Open an SSH session to any Controller Node.
The OSMI menu appears.
b) Exit the OSMI menu.
The Linux prompt appears.

101
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

c) Open an SSH session to the Storage Node that is paired with this Storage Enclosure Basic.
d) Disable ("clear") the drive LED using the drive slot number you obtained in Step 4e.

Important: Subtract 1 from the drive slot number, because the index starts from 0.
For example, if the drive slot number is 69, you must use 68 in the command below.

sg_ses /dev/sg42 -p2 -I0,68 -C ident

25. Disable the location LED on the Storage Node.


a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the desired Storage Node.
c) In the Commands pane, click Location LED Off.
26. Confirm that the Active Archive System correctly determines the purpose for the new disk.
a) Wait 15 minutes.
b) In the CMC, navigate to Dashboard > Administration > HGST Active Archive System Management >
Logging > Events .
c) In the Events list, check to see that a new empty disk has been detected.
d) In the Jobs list, check to see that an Initializing new disk job has been triggered.
It may take about 2 minutes for the job to appear.
e) In the CMC, navigate to Dashboard > Administration > Hardware > Servers.
f) Select the desired node.
g) Select the Disks tab.
h) Wait for the physical drive that has been replaced, as well as the logical disks, to change status from a red icon
to a green icon.

Note: The physical drive that has been replaced, as well as the logical disks, may take up to 40
minutes to change status.

The Initializing new disk job has completed successfully when the number of degraded disks decreases
by 1.
Postrequisites:
If you replaced the wrong disk, see Troubleshooting.
On single geo systems only, put the MetaStores back into read/write mode. For instructions, see Marking a MetaStore
as Read/Write on page 131.

102
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

5.3 Power Cord Replacement Procedure


Required Tools
• None
To replace a power cord, do the following:
Time Estimate: 7 minutes
1. Unplug the failed power cord by lifting the power cord retention bale and carefully removing the power cord from
the power supply.
Figure 83: Replacing the Failed Power Cord

Note: The power cord is marked in red and the power cord retention bale is marked in blue.

2. Do the following to remove the failed power cord from the rack:
a) Disconnect the failed power cord from the server.
b) From the Enclosure end, pull the power cord through the rail kit cable guides.
c) Pull the power cord up through the side of the rack rail.
d) Pull the power cord through the top of the rack.
3.
Note: Ensure the new power cord is installed in the same location as the failed power cord.

Do the following to install the new power cord into the rack:
a) Run the new power cord through the top of the rack.
b) Pull the power cord down through side of the rack rail.
c) From the Enclosure end, pull the power cord through the rail kit cable guides.
d) Connect the power cable to the server.
4. Pull the power cord through I/O module cable guides.
5. Plug the power cord into the power supply unit.
6. To secure the power cord, press the power cord retention bale into the I/O module.

103
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

5.4 MiniSAS Cable Replacement Procedure


The miniSAS cable connects a Storage Node to its paired Storage Enclosure Basic. It is a 12G miniSAS, miniSAS HD,
28 AWG, 3m or 6m.
Prerequisites
None.
Required Tools
None.
To replace a miniSAS cable, do the following:
Time Estimate: 7 minutes
1. Identify the faulty miniSAS cable.
a) Go to the back of the rack.
b) Look for the Storage Node with a faulty miniSAS cable.
Each Storage Node has a miniSAS cable in ports S1 and S2 as shown in the figure below. The faulty cable has an
amber LED illuminated on its metal connector.
Figure 84: Storage Node MiniSAS Ports

2. Disconnect the faulty miniSAS cable from the Storage Node.


a) Unlatch the faulty miniSAS cable at its Storage Node end by pulling very gently on its pull tab.
b) While pulling gently on its pull tab, grasp its metal connector or cord to pull the cable out of its port.

Warning: Do not pull the cable out by its pull tab, because the pull tab might break.

3. From the I/O module of the Storage Enclosure Basic, unplug the failed miniSAS cable.

104
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

a) Disengage the latch on the faulty miniSAS cable at its Storage Enclosure Basic end by pulling very gently on its
pull tab.
Figure 85: Removing the MiniSAS Cables

Note:
• The miniSAS cables are marked in red.
• Take note of which miniSAS cable came from which port to ensure that they are plugged in
correctly when reassembling.

b) Once the latch is disengaged and the cable is loose, grasp its metal connector or cord (not its pull tab) to pull it out
of its port.
4. Do the following to remove the failed miniSAS cable from the rack:
a) From the enclosure end, pull the miniSAS cable through the rail kit cable guides.
b) Pull the miniSAS cable up through the side of the rack rail.
5. Do the following to install the new miniSAS cable into the rack:

Tip: If you are replacing the 6M cable, install it over the top of the rack for ease of replacement.

Note: Ensure the new miniSAS cable is installed in the same location as the failed miniSAS cable.

a) Connect the new miniSAS cable into the same Storage Node port.
The cable is reseated properly (the latch is engaged) when you hear a click.
b) From the Storage Node, run the new miniSAS cable through the cable guides.
c) Pull the miniSAS cable down through the side of the rack rail.
d) From the enclosure end, pull the miniSAS cable through the rail kit cable guides.
e) Plug the miniSAS cable into the I/O module.
6. Verify that the amber LED on the new miniSAS connector is off.

105
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

5.5 Rear Fan Replacement Procedure


Based on the configuration of a Storage Enclosure Basic, an enclosure can contain three, five, or seven sleds. The system
is designed for fail-in-place operational mode.
Required Tools
• None
Time Estimate: 2 minutes
To replace a rear fan, do the following:

Note:
• Ensure that you store all removed parts in a safe location while replacing the FRU.
• The rear fans are hot-swappable. The enclosure does not need to be powered down in order to replace
them.

1. From the rear of the chassis, remove the failed rear fan by depressing the release button on the top right of the fan.
Figure 86: Fan Release Button

Note: Fan release button highlighted in red.

2. Rotate the top of the fan away from the chassis until the fan pins clear the connectors on the chassis.

Note: Repeat the previous step until all of the fans in need of replacement have been removed.

Figure 87: Rear Fan

3. Remove the fan from the fan rubber bumpers on the chassis.
4. Install the replacement fan in the reverse order that you removed it.

106
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

5.6 Power Supply Unit Replacement Procedure


Required Tools
• None
Time Estimate: 3 minutes
To replace a power supply unit, do the following:

Note: Ensure that you store all removed parts in a safe location while replacing the FRU.

1. Shut down the Storage Node from the CMC.

Caution: Shut down only the Storage Node that is paired with the Storage Enclosure Basic
containing the FRU.

a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the desired Storage Node.
Figure 88: A Storage Node Pane in the CMC

c) In the Commands pane, click Shutdown.


Figure 89: The Shutdown Button in the Commands Pane

d) Wait for the Status field to change to DONE.

Warning: Even if all LEDs are off, you must still wait until the CMC shows DONE in the Status
field.

All I/O to the Storage Enclosure Basic attached to this Storage Node is now quiesced.

107
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

2. Lift the cord retention bale and unplug the power cord from the failed power supply unit.
Figure 90: Removing the Power Cord

Note:
• Cord retention bale marked in blue.
• If you are removing power supply A, you do not need to remove the miniSAS cables. To remove
power supply B, it is recommended that you remove the miniSAS cables for ease of replacement.
To remove the miniSAS cables, pull the blue tab and remove the cable from the port. Repeat for
both miniSAS cables as necessary.

3. Unlock the failed power supply unit by pulling the latch handle out and away from the I/O canister.

Note:
• The power supply unit latch handle should be at 45° when removed.

108
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

• Repeat this step for the remaining power supply unit if necessary.

Figure 91: Removing the Power Supply Unit

4. Remove the power supply unit until free of the I/O canister.
5. Install the replacement power supply unit.
6. Reconnect the miniSAS cables.
7. Plug the power cord back into the replaced power supply.
8. Power on the Storage Node.
The power button is located on the chassis front control panel.

109
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

5.7 I/O Canister Replacement Procedure


Required Tools
• None
Time Estimate: 5 to 7 minutes
To replace the I/O Canister, do the following:

Note:
• Ensure that you store all removed parts in a safe location while replacing the FRU.
• Ensure you are wearing an ESD wrist strap to complete the replacement of the I/O canister.

1. Shut down the Storage Node from the CMC.

Caution: Shut down only the Storage Node that is paired with the Storage Enclosure Basic
containing the FRU.

a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the desired Storage Node.
Figure 92: A Storage Node Pane in the CMC

c) In the Commands pane, click Shutdown.


Figure 93: The Shutdown Button in the Commands Pane

d) Wait for the Status field to change to DONE.

110
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

Warning: Even if all LEDs are off, you must still wait until the CMC shows DONE in the
Status field.

All I/O to the Storage Enclosure Basic attached to this Storage Node is now quiesced.
2. Identify the Storage Enclosure Basic that contains the failed I/O canister.

Note: To identify the failed I/O Canister, verify that the amber light is blinking.

3. Remove the miniSAS cables by pulling the blue tab and remove the cable from the port.
4. Unplug the power cables by lifting the power cord retention bale and carefully removing the power cord from the
power supply.
Figure 94: Removing the MiniSAS Cables

Note:
• The miniSAS cables are marked in red.
• Take note of which miniSAS cable came from which port to ensure that they are plugged in
correctly when reassembling.

111
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

• Repeat for both the A and B power supplies.

Figure 95: Removing the Power Cords

Note:
• Power cords marked in red.
• Cord retention bale marked in blue.

5. Wait approximately 30 seconds after the I/O canister is unplugged to continue with the replacement procedure.

112
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

6. With your palms facing up, place the pointer and middle finger into the latch handle sides.
Figure 96: Latch Handle Identification

Note:
• Latch handle marked in red.
• Rack ears marked in yellow.

7. With your thumbs on the rack ears, pull the latch handle sides and push on the rack ear release.

113
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

8. Pull the latch handle until clear of the rack ear latch.
Figure 97: Latch Handle Clear of Rack Ear

Note:
• Latch handle marked in red.
• Rack ears marked in yellow.

9. Completely remove the miniSAS cables and power cords from the I/O canister.
10. With your palms facing up, reposition your hands so that your thumbs are on the outside and your fingers cradle the
bottom and rear of either side of the I/O canister .
11. Slowly pull the I/O canister away from the chassis.

114
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

Warning: The I/O canister is very back-heavy. Ensure that you are fully supporting the component
during the removal.

Figure 98: Removing the I/O Canister

12. Install the replacement I/O canister in the reverse order that you removed it.

Note: Ensure that the I/O canister is center properly and press firmly to ensure you are able to latch
the replacement I/O canister.

13. Reconnect the miniSAS cables.


14. Reconnect the power cords.
15. Verify that the two green LEDs on both power supply A and B are illuminated.
16. Power on the Storage Node.
The power button is located on the chassis front control panel.

115
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

5.8 Storage Enclosure Basic Capacity Upgrades


This section provides procedures for upgrading the capacity in a minimally populated Storage Enclosure Basic.

5.8.1 About Flexible Capacity


Flexible capacity allows you to increase the capacity of a rack as your capacity needs increase.
A sled column represents one sled in each Storage Enclosure Basic array in a rack. A rack can have as little as one
populated sled column. Non-populated sled columns contain sled blanks. You can populate an additional 1-6 sled
columns by following the procedure in this section. This procedure allows capacity expansion to occur gradually or in
major increments.
The Active Archive System supports having different sled column configurations in different racks. For example, a
single geo system can contain a three-rack cluster in which two racks have all seven sled columns populated, and one
rack has three populated sled columns. However, you must ensure that, within a rack, each sled column is identical. In
other words, every Storage Enclosure Basic within a rack must have the same number of populated sleds.

5.8.1.1 Performance Impact


A capacity upgrade might cause the system to underutilize old disks.
When you populate additional sled columns in a rack, you create an imbalance in the capacity utilization of old disks
compared to new disks. To counteract the imbalance, the system writes a smaller fraction of new data to the old disks.
However, this means that when disks are a performance bottleneck, the system does not fill old disks to their maximum
capacity. Therefore, a rack with capacity upgrade(s) does not necessarily have the same performance as a rack that was
factory installed with the same capacity.
In order to achieve at least the same performance post-upgrade as pre-upgrade, the system can only fill the old disks to
the capacities shown in the following table.
The table shows that if a rack has three sled columns initially, and you add one more sled column, disks in the first three
sled columns can only be filled to 30% capacity if you want to have the same performance as a system with four factory
installed sled columns.

Table 10: Utilization of Old Disks When New Disks Are Added

Number of 1 2 3 4 5 6
Added Sled
Columns
Number
of Factory
Installed
Sled
Columns
1 100% 100% 100% 100% 100% 100%
2 50% 100% 100% 100% 100% 100%*
3 30% 60% 100% 100%* 100%* 100%*
4 20% 50% 70% 100%* 100%* 100%*
5 20% 40% 60%* 80%* 100%* 100%*
6 10% 30%* 50%* 60%* 80%* 100%*

*
Requires installation of an additional rack

116
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

5.8.1.2 Durability Impact


A capacity upgrade does not change the durability of the system.
Durability of the system is mainly determined by the repair time. Repair time is mainly bottlenecked by disk IOPs. For
example, to repair a single object written with an 18/5 policy, assuming the object has 1 decommissioned drive, the
system must read 13 checkblocks and write 1 checkblock. Therefore, if this system had two sled columns initially, and
then expanded to seven sled columns when the original two sled columns were full, the repair time is still determined
by the time it takes to read the required data from the original two sled columns. On the other hand, if the same system
with two sled columns initially were expanded to three sled columns (in other words, just one additional sled column
is added), and then those three sled columns were completely filled, the repair time would be determined by the time
it takes to retrieve data from the one added sled column. As such the durability of a system is "fixed" or set at initial
installation time. A capacity upgrade does not affect the durability of the system.
The following table shows systen durability given different capacity upgrades:

Table 11: Durability with Various Capacity Upgrades

Number of Initially Installed Sled Columns Durability (9's)


1 11
2 12
3 13
4 14
5 14
6 15
7 15

5.8.1.3 Supported Hardware Abstraction Layers (HALs)

Table 12: Part Numbers and Descriptions

Part Number Description HAL Type


1ES0002 Standard rack with Delta HGST_S98_SN
PDU. 98 drives per Storage
Enclosure Basic.
1ES0013 Standard rack with Wye HGST_S98_SN
PDU. 98 drives per Storage
Enclosure Basic.
1ES0053 Base 1-sled rack with Delta HGST_B14_SN
PDU. 14 drives per Storage
Enclosure Basic.
1ES0077 Base 1-sled rack with Wye HGST_B14_SN
PDU. 14 drives per Storage
Enclosure Basic.
1ES0053, 1EX0079 Base 2-sled rack with Delta HGST_B28_SN
PDU. 28 drives per Storage
Enclosure Basic.
1ES0077, 1EX0079 Base 2-sled rack with Wye HGST_B28_SN
PDU. 28 drives per Storage
Enclosure Basic.

117
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

Part Number Description HAL Type


1ES0053, 1EX0079, 1EX0079 Base 3-sled rack with Delta HGST_B42_SN
PDU. 42 drives per Storage
Enclosure Basic.
1ES0077, 1EX0079, 1EX0079 Base 3-sled rack with Wye HGST_B42_SN
PDU. 42 drives per Storage
Enclosure Basic.
1ES0053, 1EX0079, 1EX0079, 1EX0079 Base 4-sled rack with Delta HGST_B56_SN
PDU. 56 drives per Storage
Enclosure Basic.
1ES0077, 1EX0079, 1EX0079, 1EX0079 Base 4-sled rack with Wye HGST_B56_SN
PDU. 56 drives per Storage
Enclosure Basic.
1ES0053, 1EX0079, 1EX0079, 1EX0079, 1EX0079 Base 5-sled rack with Delta HGST_B70_SN
PDU. 70 drives per Storage
Enclosure Basic.
1ES0077, 1EX0079, 1EX0079, 1EX0079, 1EX0079 Base 5-sled rack with Wye HGST_B70_SN
PDU. 70 drives per Storage
Enclosure Basic.
1ES0053, 1EX0079, 1EX0079, 1EX0079, 1EX0079, 1EX0079 Base 6-sled rack with Delta HGST_B84_SN
PDU. 84 drives per Storage
Enclosure Basic.
1ES0077, 1EX0079, 1EX0079, 1EX0079, 1EX0079, 1EX0079 Base 6-sled rack with Wye HGST_B84_SN
PDU. 84 drives per Storage
Enclosure Basic.

5.8.2 Prerequisites
Before performing upgrading the capacity of the system, determine its current state and health. The system must be
in nominal condition, running the latest software and firmware, and contain no degraded disks. If the system is not in
nominal condition, you must repair it first.

• Upgrade EasiScaleTM to version 4.1.1 or greater.


• Upgrade the firmware on the Storage Enclosure Basic I/O module to version 0115 or greater.
• Reset or decommission all degraded disks across all racks in your entire Active Archive System.
• Replace all decommissioned disks on the rack to be upgraded.
• Check for alerts that indicate that there is a non-optimal disk safety on either the normal storage policy or the small
file policy.
For example, check the event log for alerts containing lowered disk safety warnings. If the disk safety is
non-optimal, invoke repair crawling by following the steps in Invoking the Repair Process on page 131.
• Run the system health checker.
• Confirm that all Storage Nodes across all racks in the entire Active Archive System have UP status in the CMC (in
other words, they are all running, online, and reachable).
This is necessary because the capacity upgrade tool reconfigures all maintenance agents on all Storage Nodes.
• Get the rack serial number from the CMC:
a) Navigate to Dashboard > Administration > HGST Active Archive System Management > Locations >
Datacenter Management.
b) Write down the rack's serial number as displayed in the Data Center pane's Description field.
Hereafter, this procedure refers to the rack serial number as serial_number.
• Verify the serial number matches the rack name in the CMC:

118
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

a) Navigate to Dashboard > Administration > HGST Active Archive System Management > Locations >
Datacenter Management.
b) Select the data center which contains the physical rack to be upgraded.
c) In the Racks/Groups portion of the Datacenter Information pane, verify that the rack name and serial number
to be upgraded are the same value. If not, follow the steps in Setting Serial Number to Rack Name on page
134.
• Write down the post upgrade hardware abstraction layer (HAL) type from Supported Hardware Abstraction Layers
(HALs) on page 117.
• Obtain the correct replacement sleds from HGST Support.
• Bring several new, blank disks and sleds to the data center.

Warning: Bring serveral new blank disks and sleds to the data center to protect against a potential
failure while initializing the new disks.

5.8.2.1 Work Table

Table 13: Work Table for Storage Enclosure Basic Capacity Upgrades

Item Value
Rack serial number
Original HAL type
New HAL type

5.8.3 Overview
This is only an overview of the procedure. Do not interpret this overview as the actual procedure.

1. Run the capacity upgrade tool in prepare mode to verify optimal disk safety, check for degraded and
decommissioned drives, and on single geo systems, to make the MetaStore(s) read-only.
2. For each Storage Node:
a) Power down the Storage Node.
b) Replace the sled blank(s) with the purchased populated sled(s).
c) Power up the Storage Node.
3. Run the capacity upgrade tool in upgrade mode to reassign the master of all namespaces (in other words, to balance
the namespaces on all storage daemons).
4. Run the capacity upgrade tool in finalize mode to add new storage daemons, initialize the blockstores, add new
maintenance agents, and, on single geo systems, to set the system back to read/write mode.

5.8.4 Capacity Upgrade Procedure


A capacity upgrade replaces some or all of the sled blanks with fully populated sleds in every Storage Enclosure Basic in
a rack.

Important: This procedure involves putting single geo systems into read-only mode.
This procedure requires you to power down each Storage Node sequentially, which would result in data
unavailability or a lowered disk safety if data were still being ingested at this time. When you power
down the first Storage Node, the system would write a new object with a safety of 2. When you power
up the first Storage Node and power down the second Storage Node, 3 checkblocks of the object would
become unavailable. This would result in a disk safety of -1, and as such, an unavailable object. The
capacity upgrade tool, hgst_capacity_on_demand.py, prevents this scenario by putting the
system into read-only mode.

119
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

Read-only mode does not prevent the system from executing repair tasks. Because all the storage
daemons on a Storage Node all have the same location, there is no risk of writing unbalanced spreads
for repair activity happening during the capacity upgrade procedure. Also, since repairs never write an
object with lower safety, if a repair must replace a checkblock in a Storage Node that is currently down,
the repair is redone the next day.
Powering down one Storage Node may reduce performance; however, this is no different from a normal
upgrade or maintenance operation on a Storage Node.

Warning:
• This procedure is only to be done on one rack at a time.
• Replace sleds from left to right and in sequence.
• Each Storage Enclosure Basic in any given rack must have the same number of populated sleds.
• Ensure that you store all removed parts in a safe location while replacing the sled.

To replace sled blanks with populated sleds, do the following:


1. Meet all prerequisites as instructed in Prerequisites on page 118.
2. Log into the Management Node.
a) Open an SSH session to the Management Node.
b) Type 0 to exit the OSMI menu.
3. Stop log rotation on the Management Node.
At the Linux prompt on the Management Node, type:
mv /etc/logrotate.d/apache ~

4. Run hgst_capacity_on_demand.py in prepare mode to verify optimal disk safety, check for degraded and
decommissioned drives, and, for single geo systems, to make the MetaStore(s) read-only:
At the Linux prompt on the Management Node, run hgst_capacity_on_demand.py with the --prepare
option and the rack serial number you obtained in Prerequisites on page 118:
Run this with super user access.

/opt/qbase3/utils/HGST/hgst_capacity_on_demand.py --rackserial serial_number --


prepare

Wait for this command to complete (usually less than 5 minutes except when there are a large number of buckets
with suboptimal disk safety).
Sample output on a single geo system:

root@HGST-MINI-S3-DC01-R01-CN08:~# /opt/qbase3/utils/HGST/
hgst_capacity_on_demand.py --prepare --rackserial MINIALPS1
2016-03-18 13:35:25,352 INFO Starting capacity on demand tool
2016-03-18 13:35:25,988 INFO Assessing enclosure layout
2016-03-18 13:35:28,113 INFO Starting upgrade preparation
2016-03-18 13:35:28,122 INFO Checking for degraded disks
2016-03-18 13:35:28,140 INFO Checking for decommissioned disks on the storage nodes
to upgrade
2016-03-18 13:35:28,549 INFO Checking for namespaces with low disk safeties. This
can take quite some time on environments with a large number of namespaces
2016-03-18 13:35:28,669 INFO Switching metadatastores to read-only.
2016-03-18 13:35:29,918 INFO Writing prepare tag

120
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

2016-03-18 13:35:29,934 INFO Completed preparation

Table 14: Troubleshooting Prepare Mode Output

Problem Action
You want more details. Rerun the capacity upgrade tool in debug mode (add the
flag --debug).
For more help with the tool, include the --help flag
on the command line.

You get an insufficient access Rerun the capacity upgrade tool with super user access.
privileges error.
You get a Please run on the master Rerun the capacity upgrade tool on the Management
controller error. Node.
The tool output is System node name: Failed This indicates a server communication issue.
to collect device info. (The Storage Node Either the Storage Enclosure Basic is failing
fails to return the device list to the capacity ugprade or the communication path has issues. Hardare
tool). troubleshooting is required.
The tool output is System node name: Failed
to collect storage enclosure info. (The
Storage Node fails to return information on the Storage
Enclosure Basic).
The tool output is System node name: Failed
to collect mount info. (The Storage Node
fails to return the list of mounted devices).
The tool output is System node name: Storage Upgrade the firmware on the Storage Enclosure Basic I/
Enclosure Firmware requires upgrade!. O module to version 0115 or greater.
(The Storage Enclosure Basic firmware does not
support capacity expansion).
The tool output is System node name: n disks This error indicates that we have missing disks.
are not mounted!. (The Storage Node fails to Hardware troubleshooting of the Storage Node is
remount all the disks). required to repair the issues.
The tool output is System node name: Storage This error indicates that the sled upgrade was done
Enclosure sleds not registered. improperly. The sleds were mis-installed.
Service required!. (The Storage Enclosure
misregistered sleds).
There are decommissioned disks on existing sleds. Replace the disks, and start over from step 1. For
instructions on replacing disks, see Hard Disk Drive
Replacement Procedure on page 90.
There are degraded disks on existing sleds. Assess the drive and either reset or decommission
and allow repairs to complete. Then start over from
step 1. For disk troubleshooting workflows, see
Managing Hardware in the HGST Active Archive
System Administration Guide.
There are buckets with suboptimal disk safeties. Invoke repairs on the impacted buckets by following
the instructions in Invoking the Repair Process on page
131. Allow repairs to complete before rerunning.
Depending upon the size of the environment, this could
take several hours. Then start over from step 1.

121
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

Problem Action
You want to confirm that prepare mode completed Look at the output from the tool. You should see a line
successfully. similar to:
2016-03-18 13:35:29,934 INFO Completed
preparation

5. Shut down a single Storage Node through the CMC.

Caution: Shut down one Storage Node only.

a) In the CMC, navigate to Dashboard > Administration > Hardware > Servers > Storage Nodes.
b) Select the desired Storage Node.
Figure 99: A Storage Node Pane in the CMC

c) In the Commands pane, click Shutdown.


Figure 100: The Shutdown Button in the Commands Pane

d) Wait for the Status field to change to DONE.

Warning: Even if all LEDs are off, you must still wait until the CMC shows DONE in the
Status field.

All I/O to the Storage Enclosure Basic attached to this Storage Node is now quiesced.
6. From the Storage Enclosure Basic, unplug the power cables by lifting the power cord retention bale and carefully
removing the power cord from the power supply.

122
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

Note: Repeat for both the A and B power supplies.

Figure 101: Removing the Power Cords

Note: Power cords marked in red.

7. Unplug the MiniSAS HD cables from the I/O module.


Figure 102: Removing the MiniSAS HD Cables

Note:

123
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

• The MiniSAS HD cables are marked in red.


• Take note of which MiniSAS HD cable can from which port to ensure that they are plugged in
correctly when reassembling.

8. Unlock the I/O module from the enclosure by pulling the latch handle out and away from the I/O module.
Figure 103: I/O Canister Handle

Figure 104: Unlocking the I/O Module

124
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

9. Slide the I/O module away from the chassis.


Figure 105: Removing the I/O Module

10. On the front of the sled, depress the latch mechanism button and pull it until it is at a 45 degree angle.
Figure 106: Sled Release Button

125
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

11. Remove the sled blank from the slot to the farthest to the left.
Figure 107: Removing the Sled

Note: Ensure that you remove and replace the sleds and sled blanks in the same order.

12. From the front of the enclosure, slide the populated sled until fully seated.

Note: The sleds must be installed from slot A (far left of the enclosure) to slot G (far right of the
enclosure).

13. Install the remaining enclosure components in the reverse order that you removed them.
14. Power on the Storage Node.
The power button is located on the chassis front control panel.
15. Wait 10 minutes or less for the Machine was rebooted event in the CMC.
This event that indicates that the Storage Node disks and services are back online. For example:
Event Type: OBS-PMACHINE-0106
Event Message: Machine was rebooted
Severity: INFO
Source: HGST-S3-DC01-R01-SN02 (90:E2:BA:7C:45:B1)
Occurrences: 2
First occurrence: 2016-03-21 10:39:16
Last occurrence: 2016-03-21 11:26:36
Details:

Tags: keep_live:0
machineguid:a50d1a4b-d8ea-4dc9-89a8-12276626bf2b
agentguid:06b311fb-a5e6-48c2-bc32-df24288b9f23
typeid:OBS-PMACHINE-0106
machinename:HGST-S3-DC01-R01-SN02

16. Repeat steps 4-16 for other Storage Nodes in the rack.
17. Run hgst_capacity_on_demand.py in upgrade mode to initialize the new disks as blockstores and add
additional storage daemons and maintenance agents:

126
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

a) Open an SSH session to the Management Node.


b) Type 0 to exit the OSMI menu.
c) At the Linux prompt, run hgst_capacity_on_demand.py with the --upgrade-capacity option and
the rack serial number and new (post upgrade) hardware abstraction layer (HAL) type you wrote in the Work
Table on page 119:
Run this with super user access.
/opt/qbase3/utils/HGST/hgst_capacity_on_demand.py --haltype hal_type --
rackserial serial_number --upgrade-capacity

Wait for this command to complete (30 minutes per sled column).
d) Verify that the CMC dashboard displays the upgraded capacity.

Table 15: Troubleshooting Upgrade Mode Output

Problem Action
You want more details. Rerun the capacity upgrade tool in debug mode (add the
flag --debug).
Check the log file in /opt/qbase3/var/log/.
For more help with the tool, include the --help flag
on the command line.

You get an "insufficient access privileges" error. Rerun the capacity upgrade tool with super user access.
You get a "Please run on the master controller" error. Rerun the capacity upgrade tool on the Management
Node.
The HAL type does not match what you specified on A problem with the physical replacement. Verify that
the command line. the HAL type you specified is correct and that all drives
are present in the correct nodes.
There is a disk initialization failure. Triage like a normal bad disk. Reset the HAL type,
clear the new disks and remove from the environment.
Replace the bad drive and rerun the upgrade capacity
workflow.
• If the failure was a drive failure, roll back the
upgrade by following the steps in Recovering From
a Failed Disk Initialization on page 132, then
retry the upgrade.
• If any other problem occurs and/or you do not
have enough spare parts, you must roll back every
Storage Node, put back the old sled blanks, and run
the capacity upgrade tool in finalize mode.

There is an error with the creation of storage daemons, The appropriate action ultimately depends upon the
maintenance agents, or blockstores. error.
There is an error with updating the configuration of the Ensure all nodes are still online and reachable. Bring
maintenance agents. nodes online and rerun the upgrade capacity workflow.
The monitoring agent fails to restart or did not restart in Check the monitoring agent log for the node in the error
time. message and assess if the agent was hung or having
some other problem.

127
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

Problem Action
Verification of the Storage Node monitoring database Run an Aggregate Storage Pool Info policy manually
failed. through the OSMI to pool monitoring data and then
rerun the workflow.
Reassignment of storage daemons failed. Ensure that all storage daemons are running and
reachable. Then rerun the workflow.
A Storage Node is not a recognized HAL type. You may have replaced the sled blank(s) incorrectly or
not in left-to-right order. Start over again from step 3.
Make sure that you specified the HAL type to which
you are upgrading, rather than the existing HAL type,
on the command line.

A Storage Node does not have the correct number of If any replacement sled contains disks that were
unmanaged disks. previously used by EasiScaleTM or not empty, replace
these disks with new, empty ones.
Check to make sure no disk or sled is incorrectly
inserted.
When you think the problem is fixed, re-run the
capacity upgrade tool in upgrade mode. In other words,
go back to step 10.

18. Run hgst_capacity_on_demand.py in finalize mode to verify that there are no unmanaged disks (system-
wide), and, for single geo systems, to make the MetaStore(s) read/write again:
a) Open an SSH session to the Management Node.
b) Type 0 to exit the OSMI menu.
c) At the Linux prompt, run hgst_capacity_on_demand.py with the --finalize option and the rack
serial number you wrote in the Work Table on page 119:
Run this with super user access.
/opt/qbase3/utils/HGST/hgst_capacity_on_demand.py --rackserial serial_number --
finalize

Wait for this command to complete (1 minute).

Table 16: Troubleshooting Finalize Mode Output

Problem Action
You want more details. Rerun the capacity upgrade tool in debug mode (add the
flag --debug).
For more help with the tool, include the --help flag
on the command line.

You get an "insufficient access privileges" error. Rerun the capacity upgrade tool with super user access.
You get a "Please run on the master controller" error. Rerun the capacity upgrade tool on the Management
Node.
There are remaining unmanaged disks. If the upgrade was executed, it should never happen.
Reference the unmanaged disks section of the CMC
with the log file to see if the disk was processed.
19. Start log rotation on the Management Node.

128
FRU Replacement Guide 5 Storage Enclosure Basic Field Replaceable Units

At the Linux prompt on the Management Node, type:


mv ~/apache /etc/logrotate.d/

What To Do Next
Run the system health check again. If there are problems, submit logs to HGST Support.
If the CMC Does Not Show Updated Capacity Statistics
Run the Aggregate Storage Pool Info policy an additional time through the OSMI (1 > 3 > 1) to synchronize the new
available capacity statistics.

129
FRU Replacement Guide A Troubleshooting

A
Appendix

A Troubleshooting
Topics: This chapter provides troubleshooting tips.

• General
• Marking a MetaStore as
Read-Only
• Marking a MetaStore as
Read/Write
• Invoking the Repair
Process
• Recovering From a
Failed Disk Initialization
• Rolling Back a Capacity
Upgrade
• Setting Serial Number to
Rack Name

A.1 General
Problem Recommended Action
The PostgreSQL Fail over the CMC.
partition has failed,
or a NIC has failed Warning: When you are upgrading your setup, do not execute a failover. First
on the Management complete the upgrade before you start the failover.
Node.
To execute a failover, follow the instructions in Managing Hardware in the HGST Active
Archive System Administration Guide.

The wrong disk was If you accidentally replace the wrong disk, it shows up in the CMC as an unmanaged disk. An
replaced. unmanaged disk is a newly installed disk that the Active Archive System cannot determine a
purpose for (in other words, whether it is a replacement disk or really a new disk).

Warning: Adding disks to the Active Archive System or changing the


configuration of any hardware in the Active Archive System is not supported.
Please contact HGST Support for more information.

Correct this problem as follows:


1. Physically remove the new disk, and replace it with the disk that was accidentally removed.
2. In the CMC, navigate to Dashboard > Administration > Hardware > Disks >
Unmanaged.
3. Select the new disk, and in the Commands pane, click Delete.
When you first remove the disk through the CMC, the disk will most likely be added again by
the monitoring agent before you can actually remove the disk from the node. If this happens,
repeat the steps above to delete the disk again.

You shut down a node Connect a monitor to the node's VGA port, and a keyboard to its USB port. Restart the node.
in order to replace it Observe any error messages that it outputs.
or something in it, but

130
FRU Replacement Guide A Troubleshooting

Problem Recommended Action


when you powered on
the new/fixed node,
it did not boot or was
not detected by the
CMC.

A.2 Marking a MetaStore as Read-Only


To manually mark a MetaStore as READONLY, do the following.
1. In the CMC, navigate to Dashboard > Administration > Storage Management > MetaStores.
2. In the Status column of the desired MetaStore, select the option READONLY from the menu.
You can also set the MetaStore status by opening its details pane and clicking Edit.

A.3 Marking a MetaStore as Read/Write


To manually mark a MetaStore as READ/WRITE, do the following.
1. In the CMC, navigate to Dashboard > Administration > Storage Management > MetaStores.
2. In the Status column of the proper MetaStore, select the option READ/WRITE from the dropdown menu.
You can also set the status by opening the details of the MetaStore and starting the Edit wizard.

A.4 Invoking the Repair Process


The repair process can be initiated with the procedure below. In normal runtime situations, repairs actions are taken on
buckets spread out through a 24 hour period, i.e. each bucket is assessed for repairs once per day. Invoking repairs on all
buckets at the same time may introduce a performance hit as all buckets will begin crawling.
Even with this procedure, depending upon the number of objects in the environment, it may take 24-48 hours for the disk
safety to return to normal. Additionally, the repair activity might also impact normal customer performance as repairs
will be executing on all buckets rather than staggered throughout the day.

Note: This procedure does not restart maintenance agents. While restarting maintenance agents
immeditely will ensure all agents start working on repairs immediately, it may have the undesired side
effect of interrupting an existing repair, which will prolong the time in which the disk safety returns to an
optimal level.

1. From the Q-Shell on the Management Node, trigger a repair crawl on all storage daemons:
for sd in q.dss.manage.listStorageDaemons(count=1024).keys():
ipaddr = q.dss.manage.showStorageDaemon(sd)['address'].split(':')[0]
port = int(q.dss.manage.showStorageDaemon(sd)['address'].split(':')[1])
q.dss.manage.repairStartCrawl(nodeIP=ipaddr, port=port)

The maintenance agent logs on the storage nodes (/opt/qbase3/var/log/dss/maintenanceagents/<id>.log) will indicate
that the daemon is executing repairs. If the maintenance agents do not appear to be actively executing, a monitor
crawl can be run against all storage daemons to pull in the latest data:
for sd in q.dss.manage.listStorageDaemons(count=1024).keys():
q.dss.manage.monitorMaster(sd, realtime=True)

131
FRU Replacement Guide A Troubleshooting

2. Watch the monitorStoragePool output to see when the objects have returned to a normal disk safety:
q.dss.manage.monitorStoragePool()

A.5 Recovering From a Failed Disk Initialization


This procedure explains how to recover from a failed disk initialization.
A rollback addresses the scenario in which there are mechanical or hardware failures during the capacity upgrade, and
no spare parts are available. It removes any unmanaged disks or initialized disks without blockstores. This rollback is
possible up until the first new storage daemon is created (which occurs after all disks have been initialized); after this
point, the capacity upgrade procedure is irreversable and must be completed on the rack. In other words, this rollback is
only possible before you run the capacity upgrade tool in upgrade mode.
1. Identify in the kernel logs which disk is problematic that will be replaced.
2. Identify the slot for that disk.
Run the following commands at the Linux prompt on the node which experienced the failure.

Tip: Replace device_path with the appropriate value, such as /dev/sdt.

HDD='device_path'
SLOT="$(sg_ses `lsscsi -g | grep -E 'PIKES|STOR ENCL JBOD' | awk {'print $NF'}` -jj
|grep `lsscsi -tg |grep -w $HDD | awk {'print $3'} |cut -b5-` -B18 |grep SLOT|awk
'{print $1,$2,$3,$4}')"
echo $HDD "is in " $SLOT

3. Revert the node to its former state.


Run the following commands in the Q-Shell on the node which experienced the failure:
a) Before acting on a disk, be very sure it is a new one.

Tip: Replace machine_hostname and old_HAL_type with the appropriate values.

api = i.config.cloudApiConnection.find('main')
mname = 'machine_hostname'
original_hal_type = 'old_HAL_type'
mguid = api.machine.find(name=mname)['result'][0]

For example,

api = i.config.cloudApiConnection.find('main')
mname = 'HGST-S3-DC01-R01-SN05'
original_hal_type = 'HGST_B14_SN'
mguid = api.machine.find(name=mname)['result'][0]

b) Get all disks without blockstores.


all_current_bs_disks = []
for bsguid in api.application.find(machineguid=mguid, name='dssblockstore')
['result']:
bsobj = api.application.getObject(bsguid)
for service in bsobj.services:
if service.name == 'dssblockstore':
all_current_bs_disks.append(service.service2disks[0].diskguid)

c) Find all enclosure disks.


all_potential_bs_disks = []
for dinfo in api.disk.list(machineguid=mguid, disktype='RAWDEVICE')['result']:

132
FRU Replacement Guide A Troubleshooting

if dinfo['size'] > 2000000.0:


all_potential_bs_disks.append(dinfo)

d) Create a list with only disks without blockstores applications.


disks_without_bs = []
for dinfo in all_potential_bs_disks:
if dinfo['guid'] not in all_current_bs_disks:
disks_without_bs.append(dinfo)

e) Remove all partitions.


for dinfo in disks_without_bs:
dguid = dinfo['guid']
partitions = api.disk.listPartitions(dguid)['result']
for partition in partitions:
if 'dss' in partition['label']:
api.disk.removePartition(dguid, partition['order'],
remove_from_system=True)

f) Clean the old drives.


ip = api.machine.getManagementIpaddress(mguid)['result']
user = api.machine.getObject(mguid).accounts[0].login
passwd = api.machine.getObject(mguid).accounts[0].password

from framework.utils import system


for dinfo in disks_without_bs:
command = "smartctl -a %s | grep 'Serial number' | awk '{print $3}'" %
dinfo['devicename']
_, serial, _ = system.execute_remote_command([ip], user, passwd, command)
if dinfo['serial_number'] == serial.strip():
command = "dd if=/dev/zero of=%s bs=1M count=10" % dinfo['devicename']
rc, output, _ = system.execute_remote_command([ip], user, passwd, command)
if rc != 0:
print "Disk not wiped. Error: %s" % output

g) Decommission the drive.


for dinfo in disks_without_bs:
dguid = dinfo['guid']
api.disk.decommission(dguid, trigger_repair=False)

h) Delete the disk.


for dinfo in disks_without_bs:
dguid = dinfo['guid']
api.disk.delete(dguid)

i) Reset the HAL type.


api.machine.updateModelProperties(mguid, hardware_type=original_hal_type)

4. Power down the node physically and replace the problematic disk / sled.
5. Rerun the capacity upgrade tool.

A.6 Rolling Back a Capacity Upgrade


To roll back a capacity upgrade, do the following:
1. Follow the steps in Recovering From a Failed Disk Initialization on page 132 for every Storage Node in the
physical rack.

133
FRU Replacement Guide A Troubleshooting

2. Run hgst_capacity_on_demand.py in finalize mode to verify that there are no unmanaged disks (system-
wide), and to make the MetaStore(s) read/write again (for single geo systems only):
a) Open an SSH session to the Management Node.
b) Type 0 to exit the OSMI menu.
c) At the Linux prompt, run hgst_capacity_on_demand.py with the --finalize option and the rack
serial number you wrote in the Work Table on page 119:
Run this with super user access.
/opt/qbase3/utils/HGST/hgst_capacity_on_demand.py --rackserial serial_number --
finalize

Wait for this command to complete (1 minute).

Table 17: Troubleshooting Finalize Mode Output

Problem Action
You want more details. Rerun the capacity upgrade tool in debug mode (add the
flag --debug).
For more help with the tool, include the --help flag on
the command line.

You get an "insufficient access privileges" error. Rerun the capacity upgrade tool with super user access.
You get a "Please run on the master controller" error. Rerun the capacity upgrade tool on the Management
Node.
There are remaining unmanaged disks. If the upgrade was executed, it should never happen.
Reference the unmanaged disks section of the CMC with
the log file to see if the disk was processed.
3. Start log rotation on the Management Node.
At the Linux prompt on the Management Node, type:
mv ~/apache /etc/logrotate.d/

A.7 Setting Serial Number to Rack Name


Update the serial number to match the rack name.
This section can be skipped if the physical rack serial number and name are the same value. See Prerequisites on page
118.
1. Open an SSH session to the Management Node.
The OSMI menu appears.
2. Type 3 for Machines, Physical Racks and Services.
3. Type 8 for Physical Racks.
4. Type 2 for Modify physical rack.
5. Enter the number corresponding to the physical rack to change from the list if there are multiple physical racks.
Otherwise, the physical rack is selected for you and this step can be ignored.
6. Enter the physical rack's serial number. This should be the same as the rack name and equivalent to the serial number
value obtained on the bottom rear of the rack.

134
FRU Replacement Guide Active Archive System Glossary

Active Archive System Glossary

H
Hand Tools Any tool that is readily available in the market place and
operated by hand.
HDD Hard Disk Drive

Top of H | Top of Glossary

S
SEP SCSI Enclosure Processor
A group of SAS expanders which are located in the same
JBOD/Server enclosure. A SEP operates as a single
customer visible functional unit to provide enclosure
services functionality.

SES SCSI Enclosure Services

Top of S | Top of Glossary

V
VPD Vital Product Data
Field replaceable unit part number, serial number, and so
on, stored in an I2C EEPROM.

Top of V | Top of Glossary

135
FRU Replacement Guide Index

Index
A M
acronyms 135 machine name 13
maintenance agent 119, 131, 132, 133
C MetaStore
read-only 131, 131
capacity upgrade rollback 133 system
capacity upgrade tool 116, 117, 118, 119, 119 env_metastore 13
capacity utilization 116 framework 13
checkblock 116, 117 miniSAS cable 60, 104
conventions 3, 3, 3 model type 117
copyright 2 monitoring agent 131, 131
D N
daemon NIC 130
storage 119, 131, 132, 133 node
data unavailability 116, 117 controller
decommissioned disk details 30, 34, 56, 90 chassis 13
decommissioned disks 118, 119, 119 FRU list 13
degraded disks 118, 119, 119 HDD 30
disk PSU 38
unmanaged 130 SSD 34
wrong 130 storage
disk safety 90, 116, 117, 118, 119, 119, 131, 132 chassis 40
drive map 30, 34, 56, 90 FRU list 40
HDD 56
F PSU 59
failover 130 normal file policy 90
field replaceable unit (FRU) 130
flexible capacity 116, 117 P
part number (P/N) table 118, 119, 119
H part numbers 117
HAL type 132, 133 PDU
hardware abstraction layer (HAL) type 118, 119, 119 FRU list 70
hardware abstraction layer (HAL) type (model type 117 populated sled 118, 119, 119
hgst_capacity_on_demand.py 118, 119, 119 PostgreSQL 34, 130
hostname 13
hot swappable 30, 34, 56 R
Hugo 119, 119 rack
serial number
I setting 134
IOPs 117 viewing 118, 119, 119
item numbers 117 read-only mode 90, 118, 119, 119
related documents 4
L repair 118, 119, 119, 131, 132, 133
repair time 117
location LED 90

136
FRU Replacement Guide Index

S troubleshooting 130
typography 3, 3
SFP+ 1G module 68
SFP+ DAC cable 39, 69
V
single geo 118, 119, 119
sled virtual safety 116, 117
blank 118, 119, 119
populated 118, 119, 119 W
replacement
procedure 82 warnings 13, 40, 61, 70
sled blank 116, 118, 119, 119 weight 5
sled column 116, 116, 117
small file policy 90, 118
smartctl 119, 119
storage enclosure basic
capacity upgrades 116
FRU list 79
HDD
replacement
procedure 90
I/O canister
replacement
procedure 110
miniSAS cable
replacement
procedure 104
power cord
replacement
procedure 103
PSU
replacement
procedure 107
rear fan
replacement
procedure 106
sled
population
procedure 118, 119, 119
visual
indicator
FRU
location 79
storage interconnect
fan 66
FRU list 61
PSU 67
whole 61
storage policy 118

T
three geo 119, 119

137

Вам также может понравиться