Вы находитесь на странице: 1из 5

Daniel Burchett 8:34 AM

Good morning ..
Ok, current status ...
We were preparing to reboot the storagecell np1exda001cel02, since it was the on
e we were getting all the alerts on. I could not fin anything within the alert l
ogs or on the stoage cell.
So, I performed the eheck on the storage cell , everything was ready to go ..
Gudivada, Srikanth 8:36 AM
ok
Daniel Burchett 8:37 AM
Talked to Mark to convince him to let us go forward, when I re-did the check, it
started showing errors. The alert logs on all three ASM instances and the the n
odes also started showing disk errors at 11:48AM.
Gudivada, Srikanth 8:37 AM
yes,we have gone throuh ur mail
Daniel Burchett 8:38 AM
St tht point, Andy Porras and I stopped out work and opened a Sev 1 Oracle SR. A
fter several hours, the deteremined that there was a single disk drive failed an
d sent and Oracle Engineer to replace the drive.
however, they replaced one of hte drive and realancing started in the ASM, the s
torage cell still showed the following :
Gudivada, Srikanth 8:39 AM
oh
Daniel Burchett 8:39 AM
CellCLI> list griddisk
DATA_CD_00_np1exda001cel02 active
DATA_CD_01_np1exda001cel02 active
DATA_CD_02_np1exda001cel02 proactive failure
DATA_CD_03_np1exda001cel02 not present
DATA_CD_04_np1exda001cel02 active
DATA_CD_05_np1exda001cel02 active
DBFS_DG_CD_02_np1exda001cel02 proactive failure
DBFS_DG_CD_03_np1exda001cel02 not present
DBFS_DG_CD_04_np1exda001cel02 active
DBFS_DG_CD_05_np1exda001cel02 active
RECO_CD_00_np1exda001cel02 active
RECO_CD_01_np1exda001cel02 active
RECO_CD_02_np1exda001cel02 proactive failure
RECO_CD_03_np1exda001cel02 not present
RECO_CD_04_np1exda001cel02 active
RECO_CD_05_np1exda001cel02 active
I still have 6 3TB drives not usable.
I have asked Andy to add this ti the SR and see what Oracle says.
So, Will this resolve the performance issue today? I give it about 50/50
Yunus Ali, Shaik 8:41 AM
hi
Gudivada, Srikanth 8:41 AM
ok
Daniel Burchett 8:42 AM
There is a bug on the firmware of the drives that when one fails.. It can impat
the other drives around it, but it can degrade performance..
Yunus Ali, Shaik 8:42 AM
ok
Daniel Burchett 8:42 AM
hi, shaik, can you see what I hav already typed ?
Gudivada, Srikanth 8:43 AM
means, we have to wait again for oracle suggestion
Daniel Burchett 8:43 AM
yes.
Yunus Ali, Shaik 8:43 AM
ok
got it
Daniel Burchett 8:43 AM
However, it "appears" from my perspective that the queries are moveing faster th
an they were earlier today ..
Shaik Yunus Ali 8:44 AM
ok
Daniel Burchett 8:44 AM
So, here are the action plan.
Andy and his hardware team will keep the SR in their control on the storage cell
.
Daniel Burchett 8:47 AM
We will need to open an SR in case there is more than one issue here. Shaik or S
rikanth, can one of you open an SR on the poor query performance and upload teh
trace files and log s they may need. I believe they have a tool to load up every
thing (awrs. trace. etc..) and get it from the three nodes
Shaik Yunus Ali 8:48 AM
ok
Gudivada, Srikanth 8:48 AM
sure
Daniel Burchett 8:48 AM
If the ODS team startes to have issues, raise it from a sev 3 to sev 1, contacti
ng the duty manager immediately to get Oracle working on it ...
I believe Jon Burgess sent and email on the escalation method.
I look for it and forward it to you if you do not have it .
Gudivada, Srikanth 8:50 AM
we dont have
Daniel Burchett 8:50 AM
On another note, did you see the email on Oracle response to the CLickSched/CSFT
disconnect issue?
:)
Yunus Ali, Shaik 8:50 AM
ys
yes
Gudivada, Srikanth 8:50 AM
yes
Daniel Burchett 8:51 AM
I hope he settles down and look at the issue logically.
Yunus Ali, Shaik 8:51 AM
ok
mean these changes need to be applied at Target side but .not at exadata
Daniel Burchett 8:52 AM
yes. 11.2.0.3 has the fix
Yunus Ali, Shaik 8:53 AM
:)
Gudivada, Srikanth 8:53 AM
is any other thing we have to fallow now on ODS database?
Yunus Ali, Shaik 8:54 AM
i am creating the SR# on this CSI
18972029
only
shal i go ahead
Daniel Burchett 8:55 AM
just watch the alert logs .. I need you to work with the hardware team off shore
if they need help. Shaik, They may need some help on checking ASM and rebalanci
ng
Yunus Ali, Shaik 8:55 AM
sure
Daniel Burchett 8:55 AM
Yes, go ahead and open an SR..
Yunus Ali, Shaik 8:56 AM
ok
ROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CO
DE
------------ ----- ---- ---------- ---------- ---------- ---------- ---------- -
---------- --------------------------------------------
1 REBAL RUN 2 2 118269 1523606 2111 665
3 REBAL WAIT 2
SQL>
Daniel Burchett 8:57 AM
we have a call at 10:30 CST
Yunus Ali, Shaik 8:57 AM
i am on top of it Danny
Daniel Burchett 8:57 AM
I know you would be ..:)
Yunus Ali, Shaik 8:58 AM
once done i will inform
Daniel Burchett 8:58 AM
Thank you to both of you for hanging around most of your evening . I really appr
eciate it ..
When I get on the tracks of troubleshooting an issue, I really do not like inter
ruptions from mgmt ..:D
Yunus Ali, Shaik 8:59 AM
:)
Gudivada, Srikanth 8:59 AM
thanks,and welcome:)
Daniel Burchett 9:02 AM
Pradeep Reddy is the Hardware contact
Yunus Ali, Shaik 9:02 AM
ok Danny
Daniel Burchett 9:02 AM
David Clearwater is the mgr ..
Yunus Ali, Shaik 9:02 AM
ok
Gudivada, Srikanth 9:02 AM
ok
Daniel Burchett 9:06 AM
https://meet.Halliburton.com/neeraj.chauhan/R0SER73Z (https://meet.halliburton.c
om/neeraj.chauhan/R0SER73Z)
Join by Phone
281 871 4999
88 250 4999
Find a local number (https://dialin.halliburton.com/)
Conference ID: 21560074
Daniel Burchett 9:18 AM
yes sir
Yunus Ali, Shaik 9:18 AM
create sr sir.
created*
Daniel Burchett 9:18 AM
nice
Daniel Burchett 9:21 AM
Thank you guys .. any questions ?
Yunus Ali, Shaik 9:21 AM
no questions
Daniel Burchett 9:21 AM
On the storage cell.. The following ommends can give you some information on the
disks .
cellcli {hit enter}
CELLCLI> Llist grid
CELLCLI> list physicaldisk detail
CELLCLI> list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
CELLCLI> HELP
The last gives you the commands ..
Yunus Ali, Shaik 9:24 AM
ok
Daniel Burchett 9:24 AM
and it is list grid , not Llist
Yunus Ali, Shaik 9:24 AM
thank danny, with this Hit i am going to intitalize myself at Exadata:D
thanks for the opportunity
Daniel Burchett 9:25 AM
no problem... See whay I am concerned wit hthe HP dba team taking ove the exadat
a?
whay/why
Yunus Ali, Shaik 9:25 AM
yes...
Daniel Burchett 9:25 AM
They do not have a clue .. and if they set an event in the ASM by mistake, they
ca cause trouble with the storage cells.
Shaik Yunus Ali 9:26 AM
yes
Daniel Burchett 9:27 AM
HALRT-02003: Data hard disk failure (Doc ID 1113013.1)
This is a guide of what may need to be done after a drive replacement;
https://support.oracle.com/epmos/faces/DocumentDisplay?id=1386147.1&parent=WIDGE
T_RECENTLY_VIEWED&sourceId=1113013.1
How to Replace a Hard Drive in an Exadata Storage Server (Hard Failure) (Doc ID
1386147.1)
Steps to shut down or reboot an Exadata storage cell without affecting ASM (Doc
ID 1188080.1)
I am going to call it done and go to sleep ...
Yunus Ali, Shaik 9:30 AM
ok
Daniel Burchett 9:30 AM
CAll me ONLY if it is something with the exadata servers:), If I'm not having a
nightmare about getting phone calls from India, I might answer .:)
Yunus Ali, Shaik 9:30 AM
after rebalance is done, is there any role for a DBA , from exadata perpective.
Daniel Burchett 9:30 AM
I now you guys can handle everything else .
Yunus Ali, Shaik 9:30 AM
any issuing of the command..
;)
thanks for the trust.
Daniel Burchett 9:31 AM
No, just monitor .. No, It is pretty much automatice ..
Yunus Ali, Shaik 9:31 AM
ok
Gudivada, Srikanth 9:31 AM
ok, danny got it
Daniel Burchett 9:31 AM
The only thing I might do is increase the power from 2 to power 8
alter diskgroup XXXX rebalance power 8;
but you have to wath the load on the serves ...
wath/watch
Yunus Ali, Shaik 9:32 AM
yes got u
wat is the size of the disk
Daniel Burchett 9:33 AM
3TB
name: 20:10
deviceId: 18
diskType: HardDisk
enclosureDeviceId: 20
errMediaCount: 0
errOtherCount: 0
luns: 0_10
makeModel: "HITACHI H7230AS60SUN3.0T"
physicalFirmware: A310
physicalInsertTime: 2013-03-23T21:08:57-05:00
physicalInterface: sas
physicalSerial: RTDPDD
physicalSize: 2794.5199813842773G
slotNumber: 10
status: normal
Yunus Ali, Shaik 9:34 AM
increase .... as per part experience i use to give 8 or 11 at 10g and 11g
i will have a watch on cpu performance
Daniel Burchett 9:34 AM
yes, on the exadata you could go higher ...
Yunus Ali, Shaik 9:34 AM
if any issues , i will decrease it back to 2
Daniel Burchett 9:34 AM
maybe 4
Yunus Ali, Shaik 9:34 AM
k
Daniel Burchett 9:35 AM
Ok, Have a Great Day and hopefully I'll talk to you guys in 8, 9, may 10 hours f
rom now<:o)
Yunus Ali, Shaik 9:35 AM
ok
Daniel Burchett 9:36 AM
good night
Yunus Ali, Shaik 9:37 AM
ok Danny .
Gudivada, Srikanth 9:37 AM
have great sleep