Вы находитесь на странице: 1из 10

FORECASTING DATABASE DISK SPACE REQUIREMENTS:

A POOR MANS APPROACH


Prepared for the CMG Conference Committee 32
nd
Annual International Conference of The Computer Measurement Group, Inc.
December 4
th
!th 2""#
$eno, %e&ada
'd(ard ). Trettel
%orth(est Airlines, Inc.
In the absence of special purpose monitoring and/or modeling software designed
specifically for forecasting database disk space requirements, a solution was developed
using general purpose database facilities and office suite products. The results achieved
were (1 an understanding of the heretofore unknown trends and patterns in the use of
disk space by individual databases (! the ability to accurately and proactively forecast
the additional disk space needed for individual databases, and (" the ability to reclaim
the forecast unused disk space, all based upon linear regression analyses.
Introduction
*%ecessit+, (ho is the mother of in&ention.,
Plato, The #epublic
Gree- author and philosopher
.42/ 0C 34/ 0C1
As the sheer number, si2e, and comple3it+ of
databases deplo+ed in or4ani2ations continues to
climb each +ear, the effecti&e mana4ement of those
instances becomes a challen4e for the IT staff, and in
particular for those char4ed (ith maintainin4 database
inte4rit+, a&ailabilit+, performance, and reco&erabilit+.
Practices rooted in reacti&e and fre5uentl+
ele&enthhour indi&idual heroics no lon4er ma-e the
4rade in toda+6s business en&ironment. $ather, one
needs to create and adopt repeatable, proacti&e
methodolo4ies in order to pro&ide appropriate IT
ser&ice deli&er+.
In a pre&ious position (ith his present emplo+er, the
author (as a member of a !person database
administration team (hich pro&ided operational and
production support for 2"" distributed database
instances, consistin4 of 4"" databases.
The database mana4ement s+stems .D0M71 utili2ed
consisted of8
9racle 7er&er .9racle1
7+base Adapti&e 7er&er 'nterprise .7+base1
I0M D02:;D0 .;D01
Microsoft 7<) 7er&er .7<) 7er&er1.
9peratin4 s+stem .971 en&ironments included8
;%I= I0M AI=
;%I= 7un Micros+stems 7olaris
Microsoft >indo(s 7er&er.
As is the case (ith most operational support 4roups,
an oncall pa4er rotation schedule placed a team
member in the *hot seat, 24 hours a da+ for a one
(ee- period. Durin4 this tour of dut+, the oncall
person (ould respond to pa4es that (ere
pro4rammaticall+ 4enerated b+ the combination of the
0MC Patrol Monitor for 7+base:9racle:;D0 product
.hereafter referred to as Patrol1 and the Ti&oli
'nterprise Mana4er suite. Additional,
manuall+4enerated pa4es (ere also issued b+ IT
staff members at the roundthecloc- computer
operations center.
In the case of the Patrol4enerated pa4es, the
pa4erdispla+ed te3t outlined the nature of the
problem in terms of the hostin4 ser&er6s name, the
D0M7 instance6s name, and an abbre&iated Patrol
messa4e that described (hich pre&iousl+defined
Patrol threshold had been crossed, or alert e&ent had
occurred. The manuall+ 4enerated pa4es, on the
other hand, contained a contact phone number.
Arri&in4 at (or- one beautiful Minnesota 9ctober da+
in 2""", after ha&in4 spent a particularl+ restless ni4ht
respondin4 to endless pa4er calls, the author decided
that there had been enou4h sleep interruptions and
remote lo4ins to (or- b+ him and his co(or-ers, most
of (hich (ere related to a database needin4 to ha&e
dis- space .manuall+1 added to it in the middle of the
ni4ht. Anal+sis of the Patrol:Ti&oli lo4s re&ealed that,
o&er the course of the pre&ious +ear, ?,"4? of the
?,##" pa4es issued to his (or- 4roup, or #2@, (ere
due to database dis- space problems. That6s 2"
pa4es a (ee- or about 3 pa4es a da+.
These pa4es (ere issued (hen the database had
crossed its !A@ full threshold. This &alue (as the
tri44er point (hich the database 4roup had defined in
Patrol as ur4ent and needin4 immediate attention.
The details on (hat !A@ full actuall+ means in
database terms are pro&ided later in this paper.
In almost all cases the affected database and its
attendant business s+stem or application ne&er
actuall+ e3perienced an outa4e. $ather, the database
had merel+ approached its predefined ?""@ upper
dis- space limit, at (hich point the database (ould
ha&e become unresponsi&e. The oncall person in the
database 4roup manuall+ added space to the
database after the pa4e (as issued in order to
pre&ent a real database outa4e.
In the author6s estimation, (hat (as needed (as a
s+stem to .?1 4ather simple database si2e metrics
from the &arious databases on a re4ular and
automatic basis and .21 submit that data to linear
re4ression anal+sis in order to forecast (hat the dis-
space needs (ould be for each database6s stora4e
components in the future. That &er+ da+ the author
desi4ned, coded, tested, and implemented a simple,
2ero maintenance, selfdisco&erin4 dail+ data
collection mechanism for the #" 9racle instances.
That s+stem continues to run to this da+.
9nce the data collection for 9racle had run for se&eral
(ee-s, a co(or-er (as as-ed to (rite and implement
functionall+e5ui&alent data collectors for 7+base and
for ;D0. Those (ere similarl+ put in place so as to
brin4 these three D0M7s under a common collection
frame(or-. M7 7<) 7er&er (as not included at the
time since it had more selfmana4in4 capabilities in
the database si2in4 area than the other D0M7s.
The collected data (as simplicit+ itself. Bor each
tablespace or database .see Definition of Terms and
Database Concepts belo(1, the follo(in4 items (ere
collected once a da+ and stored in a sin4le database
table8
The date and time of the collection
The instance name
The D0M7 t+pe .7+base, 9racle, ;D01
The tablespace or database name
The number of bt!"#$%%oc$t!d to this
tablespace or database
The number of bt!"#&r!! in this tablespace or
database
The number of bt!"#u"!d (as deri&ed as the
difference bet(een the number allocated and number
free.
M!t'od"
0efore 4ettin4 into the details of the s+stem, let6s la+
the foundation for that discussion (ith some
bac-4round information.
Definition of Terms and Database Concepts
D0M78
*7oft(are pac-a4e that allo(s +ou to use a
computer to create a databaseC add, chan4e,
and delete data in the databaseC sort the data in
the databaseC retrie&e data in the databaseC and
create forms and reports usin4 the data in the
database.,
?
Instance8
*A database instance consists of the runnin4
operatin4 en&ironment (hich allo(s users to
access and use a database. A database .as a
4eneric structured store of data1 becomes an
instance (hen instantiated as a s+stem and
made a&ailable &ia its database mana4ement
s+stem.
7pecific database pro&iders can define
database instances in terms of the precise
hard(are and soft(are resources re5uired to
ma-e them a&ailable8 thus the 9racle database
re5uires allocated s+stem memor+ and at least
one bac-4round process before the database
counts as an instance.,
2
In the author6s en&ironment there (ere appro3imatel+
2"" database instances, each of (hich (as under the
control of one of the D0M7s listed abo&e.
Database8
*A database is a collection of information stored
in a computer in a s+stematic (a+, such that a
computer pro4ram can consult it to ans(er
5uestions. The soft(are used to mana4e and
5uer+ a database is -no(n as a database
mana4ement s+stem .D0M71. The properties of
database s+stems are studied in information
science.,
3
In the case of 9racle and ;D0 there is a onetoone
relationship bet(een an instance and a databaseC
e.4., there is one and onl+ one database associated
(ith each instance.
7+base and 7<) 7er&er, on the other hand, ha&e a
onetoman+ relationship bet(een an instance and its
databasesC e.4., one instance can ha&e multiple
databases defined and mana4ed (ithin it.
The Ph+sical $epresentation of Database Content on
Dis-
All of the *stuff, that6s stored and mana4ed b+ a
database .tables, indices, procedures, pac-a4es,
rules, constraints1 has to ultimatel+ reside on dis-.
0elo( is an o&er&ie( of the different approaches
ta-en b+ the &arious D0M7 architectures.
Or$c%! and UDB use the concept of a t$b%!"($c! as
the metaphor for holdin4 the contents of a database.
There is a onetoman+ relationship bet(een an
instance .database1 and its tablespaces. That is, an
instance can and usuall+ does ha&e a number of
tablespaces associated (ith it. Those tablespaces,
ho(e&er, are not shared amon4 other, unrelated
instances.
A tablespace, in turn, consists of one or more
*d$t$&i%!") .9racle1 or *cont$in!r", .;D01 (hich are
the actual ('"ic$% &i%!" on dis- that are &isible to the
hostin4 97.
Sb$"!, on the other hand, use the concept of a
d$t$b$"! d!*ic! to hold the contents of a database.
A database de&ice is some(hat a-in to a tablespace.
Database de&ices, in turn, consist of ('"ic$% &i%!" on
dis- (hich are &isible to the 97. >hile a database
de&ice can be used b+ multiple databases (ithin an
instance, the practice at the author6s location is to
associate onl+ one database to an+ particular
database de&ice. Therefore, for purposes of trendin4
and anal+sis, dis- utili2ation metrics at the internal
database le&el (ere 4athered and used for
forecastin4, and not at the database de&ice le&el.
In order to define a common terminolo4+ for the
tablespace .9racle, ;D01 and database .7+base1
constructs across the &arious D0M7s, the author
coined the term *d$t$ 'o%d!r,. >hen an instance or
database e3periences a near or complete shorta4e of
dis- space, it e3periences that shorta4e at its *data
holder, le&el. That condition is manifested in DBMS
!rror +!""$,!" to that effect.
7imilar+, from the 976s perspecti&e, the files (hich
hold the database6s content are stored Dust li-e an+
other fileC i.e., inside an 97 &i%! ""t!+. >hile there
are differences bet(een the (a+ the ;%I= and
>indo(s 7er&er file s+stems (or- internall+, lo4icall+
the+ can be &ie(ed as a predefined amount of dis-
space for holdin4 files.
Analo4icall+, a data holder is to an $D0M7 database
as a file is to an 97 file s+stem both are la+ers of
abstraction in the path to the final representation of
database content on dis-.
$e4ardless of ho( file s+stems are instantiated to a
particular 97 ima4e .7A%, %A7, arra+s, internal dis-,
others1, the+ all share the common attribute of ha&in4
been assi4ned a finite si2e that meets the anticipated
needs of that file s+stem. An ima4e of an 97 (ould
t+picall+ ha&e man+ file s+stems defined to it, each
(ith a different si2e.
If a database instance has a ?"G0 file s+stem defined
for its use, that file s+stem can hold an+ number of
files as lon4 as the sum of their si2es is EF ?"G0. An+
attempt to increase the si2e of an e3istin4 file or
create a ne( file that (ould brin4 that sum o&er ?"G0
(ould be met (ith an OS !rror +!""$,! and a denial
of that attempt.
7tatement of the Problem
%o matter (hich D0M7 is in&ol&ed, all databases
operate (ithin the constraint of ha&in4 to house all of
their content (ithin a set of data holders, each of
(hich is predefined to be of a certain si2e.
>hen an+ such data holder is first defined to the
database, it (ill appear to the 97 to be a file or set of
files (hich occupies the full si2e of the defined data
holder.
Bor e3ample, creatin4 a AG0 tablespace in 9racle (ill
result in a file or set of files (hose sum of file s+stem
dis- occupanc+, as seen b+ its hostin4 97, (ill be
AG0. Go(e&er, from the D0M76s perspecti&e at this
point in time, the tablespace is empt+ or "@ full, and
has no database content in it +et. It sho(s up as an
empt+ tablespace (ith the capacit+ to hold AG0 (orth
of database obDects. If the hostin4 97 file s+stem
(ere defined at ?"G0, it (ould see the file s+stem
no( as A"@ full.
As database obDects .tables, indices, etc.1 are defined
and subse5uentl+ populated usin4 that data holder, it
(ill present itself to the D0M7 as housin4 n b+tes. As
n encroaches on AG0 it (ill come up a4ainst the
?""@ full internal D0M7 mar- and the D0M7 (ill not
be able to add an+ more content to that data holder
until it6s is made lar4er, or content is deleted.
At that point the D0M7 (ill return error codes to an+
database operation (hich (ould result in the need for
more dis- space in the effected data holderC e.4., 7<)
I%7'$T re5uests and certain t+pes of 7<) ;PDAT'
re5uests.1 %ote that there (ould be no 97 error
messa4es since no attempt has been made to
increase the si2e of the underl+in4 files.
7uch a condition (ould be seen as a loss of
a&ailabilit+ to parts or all of the application usin4 that
database. In order to 4et the application runnin4
a4ain, a shortterm 5uic- fi3 for this situation (ould be
to increase the si2e of the data holder, pro&idin4 that
the hostin4 file s+stem had unused space in it for such
an increase.
If the file s+stem (ere full, other IT pla+ers (ould ha&e
to be contacted to see if alternati&e solutions could be
triedC e.4., the applications 4roup mi4ht see if there
(as an+ data that could be deleted from the database,
or the host s+stem administration 4roup (ould see if
the+ could increase the si2e of the effected file
s+stems. >hile these options (ere bein4 e3plored,
the application (ould be outofser&ice. >ere this to
occur in the offhours, an e&en 4reater dela+ in
restoration to normal ser&ice (ould be e3pected as
oncall people as contacted to remed+ this basic dis-
space problem from remote locations.
The ma4nitude of this issue became apparent to the
author (hen he sa( that that there (ere o&er 2,H""
data holders that made up the 7+base, 9racle, and
;D0 instances. These 2,H"" data holders in turn
consist of o&er A,?"" indi&idual data files. Mana4in4
2,H"" data holders and A,?"" files in a reacti&e, pa4er
e&entdri&en basis (as simpl+ not (or-in4. A
proacti&e, 5uantitati&el+ based forecastin4 approach
(as needed.
7tatement of the 7olution
In order to pre&ent these t+pes of database dis- space
problems from occurrin4, or at least to 4reatl+ reduce
their li-elihood of occurrence, (hat (as needed (as
information that characteri2ed the usa4e patterns of
each of the data holders in the enterprise o&er time.
Ga&in4 that data (ould allo( one to e3tract the
underl+in4 trends and patterns e3hibited in the data
holders o&er an e3tended period, and to forecast (hat
the future needs (ere of each data holder.
To that end, the author de&ised the simple collector
mentioned earlier for all of the 9racle databases. The
collector itself is a sin4le 7')'CT statement that
le&era4es se&eral P):7<) features. This statement is
run Dust once a da+ on a sin4le 9racle instance. That
sin4le instance has database lin-s set up for all of the
other 9racle instances in the enterprise. This allo(s
the 7<) to 4ather the information from all other
instances in the comple3 &ia database lin-s, and to
4ather all of the information for all data holders on all
instances into a sin4le database table for anal+sis and
lon4er term stora4e
The P):7<) loops throu4h the db$#db#%in-" table,
4enerates the 7<) needed for each instance, and
then e3ecutes the 4enerated 7<). The e3ecution is
serial, 4atherin4 the needed information about all
tablespaces in an+ one instance and then 4oin4 on to
the ne3t instance until information from all instances is
placed into the central repositor+ table.
The information 4athered from each instance (as
described abo&e and is repeated here in its database
format in T$b%! . belo(8
T$b%! .
T$b%! Co%u+n N$+!" $nd D$t$t(!"
Column %ame Datat+pe
batchIdate DAT'
instance JA$CGA$2 2AA
$D0M7 t+pe JA$CGA$2 #
dataIholderIname JA$CGA$2 3"
allocatedIb+tes %;M0'$
freeIb+tes %;M0'$
%ote8 *batchIdate, is the date and time (hen the data
(as collected. In order to pro&ide a consistent &alue
for all of the tablespaces in all of the instances at data
collection time, the current date and time at the start of
the collection process is stored as a constant. It6s
then reused in all of the data e3tracted from each
instance, e&en thou4h the actual e3traction times
mi4ht be a minute or t(o offset from that &alue. Gi&en
the lon4itudinal nature of the data used in the
anal+ses, this difference in time is not a problem.
Ga&in4 a consistent date and time for the all &alues
collected that da+ allo(s 4roupin4 b+ date and time in
subse5uent anal+ses.
As noted earlier, shortl+ after the 9racle collectors
(ere created and put into place, another team
member created collectors for 7+base and ;D0.
These collectors 4ather the same si3 fields as sho(n
in T$b%! . since the data holder concept, b+ intent and
desi4n, is e3tensible to all D0M7s. Due to
architectural dissimilarities bet(een 9racle, 7+base,
and ;D0, the actual means of collectin4 the
information is uni5ue to each D0M7. The e3tracted
data, ho(e&er, has the same meanin4 and is not
sensiti&e to an+ particular D0M7 collection conte3t.
9nce this information (as 4athered to co&er a
reasonable period of time, it (as possible to subDect
the data and its deri&ati&es to a number of anal+ses,
described belo(.
R!"u%t"
Borecastin4 Anal+ses Performed on the Data
In order to use the collected data, it first had to be
placed on a common platform that (ould pro&ide the
anal+tics needed for forecastin4, descripti&e statistics,
etc. >hile the 9racle collector accumulated all of its
obser&ations about each 9racle instance into a sin4le
table on the central 9racle collector instance, the
7+base and ;D0 collectors used a different approach.
The+ created indi&idual flat files in comma separated
&alue .C7J1 format on each 7+base and ;D0
instance6s host. >hat (as needed (as a means to
4ather the content of these three disparate data
sources and put it in one spot.
The initial solution chosen for this forecastin4 s+stem
(as to use Microsoft6s 9)AP 7er&ices, a component
of M7 7<) 7er&er &ersions /." and 2""". The C7J
files from each 7+base and ;D0 instance (ere
automaticall+ 4athered to4ether each da+ and sent &ia
file transfer protocol .BTP1 to an M7 7<) 7er&er
instance. There the+ (ere loaded into a common
table .t#d$t$#'o%d!r1 &ia Data Transformation
7er&ices. 7imilarl+, the 9racle data for each da+ (as
automaticall+ e3tracted out of its table and loaded into
the same location. That table6s la+out is identical to
the one sho(n in T$b%! .. All columns ha&e the *%9T
%;)), attribute.
The intent of desi4nin4 the data structure in this (a+
(as to pro&ide a means of performin4
multidimensional anal+ses alon4 &ariables of interest.
>hile the main focus (as to forecast (hen each
indi&idual data holder (ould run out of space, the
presence of the instance, D0M7, and data holder
columns allo(ed one to diceandslice the data alon4
those &alues. These are discussed later in the paper.
The follo(in4 deri&ed measures (ere created for each
data holder8
0+tesIused8
.b+tesIallocated K b+tesIfree1
Percent ;sed
.0+tesIused:b+tesIallocated1L?""
Percent Bree
.b+tesIfree:b+tesIallocated1L?""
>ith the table in place on M7 7<) 7er&er, the author
defined a multidimensional data structure and set up
anal+ses in M7 9)AP 7er&ices. That structure and its
anal+ses &etted .serial1 time a4ainst bt!"#u"!d for
each uni5ue combination of instance name and data
holder name. This anal+sis used the most recent 3#A
da+6s of dail+ data points for each data holder as
input, and +ielded the slope, intercept, and Pearson
product moment correlation coefficient .s5uared1 for
each data holder on each instance. %umerous other
descripti&e statistics (ere also calculated for each
data holder such as the mean, median, mode,
&ariance, standard de&iation, and number of
obser&ations.
The output of these 9)AP 7er&ices anal+ses (as in
the form of a table (hich contained a ro( for each of
the 2,H"" data holders. The columns in each ro( of
this table (ere the instance name, the D0M7 t+pe, the
data holder name, slope, the intercept, the $
2
, the
number of obser&ations, the mean, the median, the
mode, the &ariance, and the standard de&iation for
that *set,.
Additional columns in this table, b+ (a+ of
pro4rammin4 done b+ the author (ithin 9)AP
7er&cies, (ere the most current &alues for
b+tesIused, b+tesIallocated, and b+tesIfree, alon4
(ith the &alue of the date of the most recent
obser&ation in the past +ear6s set of data for this
instance:D0M7:dataIholder set.
)astl+, 9)AP 7er&ices (as further pro4rammed to
ta-e these &alues and forecast (hat the e3pected
shortfall or surplus (ould be, in b+tes, for each data
holder si3 months from the current date, usin4 the
method described immediatel+ belo(8
;sin4 the simple linear e5uation *+ F m3 M b,, the
author sol&ed for *+, in order to see ho( man+ b+tes
(ould be in use .bt!"#u"!d1 at time *3,, (here *3,
(as displaced ?H" da+s for(ard from the current date.
Appl+in4 the slope *m, calculated for each data holder,
(e arri&ed at the proDected bt!"#u"!d si3 months
from no(.
Burther pro4rammin4 in 9)AP 7er&ices +ielded the
difference bet(een the most current bt!"#$%%oc$t!d
number and the si3month forecast bt!"#u"!d
&alue.
If the difference bet(een the current bt!"#$%%oc$t!d
and the forecast bt!"#u"!d (as positi&e, that
indicated that (e (ould ha&e a surplus of that e3act
ma4nitude si3 months from no( for that particular data
holder in that particular instance. If, on the other
hand, that result (as ne4ati&e, (e (ould ha&e a
shortfall of that si2e in si3 months, and (ould need to
ta-e correcti&e action no( so as to forestall an oncall
co&era4e pa4er e&ent in the future.
The abo&e calculations (ere all predicated upon fillin4
the data holder to ?""@ of its allocated capacit+ at the
si3 month tar4et date, since (e (ere forecastin4
bt!"#u"!d a4ainst the most current
bt!"#$%%oc$t!d. 0ased upon practical e3perience,
and 4i&en the &ariance obser&ed o&er time in each
data holder on each instance, or collecti&el+ across
the D0M7 or instance dimensions, it (as e&ident that
allo(in4 a data holder to approach ?""@ full (as a
dan4erous practice. It did not ta-e into account the
periodic, seasonal, and random ebbs and flo(s that
(ere obser&ed in the data holder6s beha&ior (ith
respect to bt!"#u"!d. %ot all data holders 4re( at a
linear rateC rather, the+ e3hibited trou4hs and crests in
bt!"#u"!d o&er the course of time. The consensus
(ithin the author6s (or-4roup, after ha&in4 loo-ed at
the data for all instances and data holders, (as that a
best practice (ould be to forecast data holders to
reach their /"@ full *saturation, point. That bein4
a4reed, bt!"#u"!d (as adDusted in the e5uation b+
di&idin4 it b+ "./".
The &alues of the surplus:shortfall column for all 2,H""
data holders (ere then e3amined for the lar4est
ne4ati&e &alues. This (as done b+ importin4 the
9)AP cube into '3cel. The conditional formattin4
feature in '3cel (as used to sho( those data holders
(ith a proDected shortfall to ha&e their numbers literall+
*in the red,.
In some cases there (as sufficient space in the
underl+in4 file s+stem.s1 to satisf+ the forecast
shortfall, and the data holder (as increased in si2e b+
the amount calculated b+ one of the database staff.
Go(e&er, in other cases, there (as not sufficient free
space in the underl+in4 file s+stem.s1 and a formal
re5uest (as created for the ;%I= dis- mana4ement
4roup to add the needed number of b+tes. This
proacti&e, forecastin4 approach allo(ed such re5uests
to be fulfilled (ell in ad&ance of the si3 month
proDected shortfall.
The forecast numbers, in and of themsel&es, (ere not
used blindl+. '3amination of the $
2
&alues for each
data holder (as used to assess ho( (ell the
obser&ed data points resonated to the march of time.
If the $
2
&alue (as belo( ".H", &isual inspection of the
plotted bt!"#u"!d data points (as underta-en to
help understand the pattern, if an+, in the data. If the
data (as *all o&er the place, for a particular data
holder, the database team (ould ma-e a best
estimate of (hat to do (ith that indi&idual data holder,
and manuall+ monitor it more closel+ in the comin4 si3
months. 7ince $
2
is the percent of the obser&ed
&ariance that6s accounted for b+ the independent
&ariable .time in this case1, ".H" (as arbitraril+ used
as a line in the sand a4ainst (hich all forecasts (ere
e&aluated for usefulness.
These anal+ses (ere run on a re4ular and automatic
basis e&er+ three months. The results (ere e3amined
b+ the database 4roup to see (hich data holders
needed an adDustment to accommodate their
proDected 4ro(th .or decline1 in the ne3t si3 months.
0e+ond Borecastin48
Additional Insi4hts Pro&ided b+ the Data
Ga&in4 a +ear6s (orth of data in the database no(
allo(ed the author to pose specific 5ueries about the
nature of all the databases in the or4ani2ation. It (as
onl+ b+ ha&in4 this data and e3ploitin4 its emer4ent
properties that these insi4hts (ere possible
Geretofore, such 5uestions had been impossible to
ans(er 5uantitati&el+ due to there bein4 no historical
data. 0est 4uesses (ere made, current realtime data
(as used, and the collecti&e anecdotal e3perience of
the (or-4roup (as combined to produce 7>AG
ans(ers.
%o(, (ith the actual data at hand, a number of
5uestions could be and (ere ans(ered8
Q/ >hich data holders e3hibit has the hi4hest or
lo(est rates of 4ro(thN
A/ 0+ sortin4 the 9)AP cube on the slope &alue,
(e displa+ all data holders ordered b+ their rate of
4ro(th, from positi&e to ne4ati&e.
Q/ >hich data holders e3hibit has the hi4hest
data content occupanc+N
A/ 0+ sortin4 the 9)AP cube on the mean
bt!"#u"!d &alue, (e displa+ all data holders
ordered b+ amount of information the+ store.
Q/ >hich data holders e3hibit has the hi4hest dis-
occupanc+N
A/ 0+ sortin4 the 9)AP cube on the mean
bt!"#$%%oc$t!d &alue, (e displa+ all data
holders ordered b+ amount of dis- space the+
ta-e up.
Q/ >hich data holders are the most o&er or
underutili2edN
A/ 0+ sortin4 the 9)AP cube on their P!rc!nt
U"!d or P!rc!nt Fr!! &alues, (e can displa+ all
data holders ordered b+ (here the+ stand in the
"?""@ data holder full cate4or+.
Perhaps the most &aluable insi4ht (as pro&ided b+
creatin4 a pi&ot table:chart in '3cel, (hich (as
published to the corporate intranet for use b+
mana4ers, de&elopers, business anal+sts, and others.
This allo(ed IT staff to &isuali2e the trends and other
characteristics of the data in an interacti&e manner.
7ince '3cel has an internal limit of #AO ro(s in a
(or-sheet, and (e had 2,H"" data holders (ith 3#A
obser&ations each, or ?,"22,""" ro(s, the ra( data
could not be put into '3cel. Instead, the author
elected to onl+ use the data from the production
instances, and to further a44re4ate that into its (ee-l+
mean &alues. This (as done b+ (ritin4 a tri&ial 7<)
statement to find the (ee-l+ means for bt!"#u"!d,
bt!"#&r!!, and bt!"#$%%oc$t!d for each uni5ue
combination of instance name, D0M7 t+pe , data
holder name, and (ee- number (ithin the +ear.
.7ince the data (as on M7 7<) 7er&er, the datepart
*(ee-, &alue (as used in the G$9;P 0P clause. The
datepart *+ear, (as used in the e3pression to order
the data appropriatel+, since (e had multiple +ears in
the database.1
The data for the pi&ot tables and charts consisted of
the elements sho(n in T$b%! 08
T$b%! 0
Pi*ot T$b%! D$t$ E%!+!nt"
dbmsIprdIname .7+base, 9racle, ;D01
dbIinstance .instance name1
dataIholderIname
dbsI+ear .PPPP1
dbsI(ee- .?A21
Bt!"#u"!d .b+ that data holder1
PercentIBull .for that data holder11
T(o pi&ot tables and t(o pi&ot charts (ere created
from this data8 one that sho(ed the .absolute1
bt!"#u"!d &alues, and one that sho(ed the
.relati&e1 (!rc!nt &u%% &alues.
The bt!"#u"!d pi&ot table and chart had the
follo(in4 characteristics8
Its 3a3is (as time, e3pressed as the past ?2
months, usin4 the +ear and the (ee- (ithin the
+ear. 7ince the past ?2 months (ould span a
+ear in all cases but the be4innin4 of a ne( +ear,
t(o pi&ot select buttons appear on that a3is8 one
for +ear and one for (ee- (ithin +ear. Bor the
most part these buttons (ere unused, and the
entire ?2 month6s of data (as &ie(ed.
Its +a3is plotted bt!"#u"!d.
Pi&ot buttons (ere pro&ided for8
o dbmsIprdIname
o dbIinstace
o data holder name
This allo(s the &ie(s to diceandslice the data alon4
an+ of these dimensions. Bor e3ample, these
5uestions (ere ans(ered in the pi&ot chart8
>hat6s the pattern of bt!"#u"!d o&er the past
+ear for8
o All 9racle instancesN
o All 7+base instancesN
o All ;0D instancesN
o 9racle and 7+base combinedN
o 9racle and ;D0 combinedN
o 7+base and ;D0 combinedN
o 7+base and 9racle and ;D0 combinedN
>hat6s the pattern of bt!"#u"!d o&er the past
+ear for8
o An+ indi&idual instanceN
o An+ combination of instancesN .%ote this
also permits an+ combination of instances
of interest, re4ardless of the D0M7 that6s
hostin4 them.1
>hat6s the pattern of bt!"#u"!d o&er the past
+ear for8
o An+ indi&idual data holderN .%ote that
one must enter an instance name for this
to be meanin4ful. 9ther(ise it (ould
sho( the total &alue for all data holders
that ha&e that name, re4ardless of the
instance name.1
o An+ combination of data holdersN
An e3ample of this pi&ot chart is sho(s in Fi,ur! .
belo(.
Fi,ur! .
Pi*ot C'$rt o& Bt!" U"!d
The (!rc!nt#u"!d (i*ot t$b%! and chart had the
same setup as the bt!"#u"!d pi&ot table. Go(e&er,
since the percent calculation (as performed at the
data holder le&el in the ra( data, it (ould not be &alid
to do an+ rollups on the D0M7 or instance
dimensions. Therefore a (arnin4 messa4e (as
(ritten to appear on the pi&ot chart that read8 $%&nly
meaningful if a data'holder'name is selected, and
that data'holder name is unique across ()*+s and
instances. +upply further dbms'prd'name and
db'instance criteria to ensure uniqueness%,1 Fi,ur!
0 sho(s the pi&ot chart.
0+ usin4 this second pi&ot chart, people in the I7
infrastructure could see (hich data holders are close
to their ?""@ full limits. Also, the pi&ot table content
can be cop+:pasted into a ne( spreadsheet and then
sorted on its percent full &alue to sho( all data
holder6s in the enterprise in order b+ their percent full
&alues. ;sin4 e3cel Bilterin4, these 5ueries can
further be refined into D0M7 t+pe, instance name, and
data holder name.
Fi,ur! 0
Pi*ot C'$rt o& P!rc!nt U"!d
The '3cel spreadsheet (as created such that one can
refresh the ra( data from the source database on M7
7<) 7er&er (ith a sin4le clic-. Therefore, each month
it6s possible to completel+ update the pi&ot tables and
charts (ith &irtuall+ no effort.
Di"cu""ion
0+ measurin4 and storin4 Dust these t(o, simple
metrics e&er+ da+ for each data holder on each
instance .bt!"#$%%oc$t!d, bt!"#&r!!1, the
or4ani2ation (as able to e&ol&e from its pre&ious
reacti&e mode to a more proacti&e and methodical
process.
0enefits
7ome of the benefits that accrued throu4h this shift in
focus and the use of applied mathematics (ere8
>ith this data no( published on a re4ular monthl+
basis to the intranet, the consumers of it ha&e
4ained considerable insi4hts into the seasonal
and other &ariations in their data usa4e patterns.
The (or- 4roup responsible for ac5uirin4 dis-
space for the entire I7 or4ani2ation can no( set
realistic bud4et &alues for ne3t +ear6s dis- space
re5uirements, based upon the hi4her le&el rollups
of the bt!"#u"!d data.
Pa4er call reduction8 the ?,"4? pa4es that (ere
pre&iousl+ issued per +ear for database dis-
space problems dropped to onl+ a handful.
The rates of 4ro(th of the &arious applications or
business s+stems at the or4ani2ation (ere no(
5uantified and published. This allo(ed the IT
or4ani2ation to compare those rates bet(een
applications, +earo&er+ear, etc.
The or4ani2ation can no( identif+ an+ anomalous
rates that mi4ht indicate that an application
chan4e .intended or not1 or business dri&er
&ariation (as ha&in4 a si4nificant impact on the
rate at (hich data (as bein4 accrued in a
database.
Descripti&e statistics can be compared bet(een
data holders to better understand their central
tendencies and dispersion characteristics.
Conc%u"ion
It6s fre5uentl+ ama2in4 ho( Dust a tin+ set of data
obser&ations can form the basis of unco&erin4 the
underl+in4 substrates of an IT infrastructure.
Automaticall+ 4atherin4 Dust t(o &alues per da+ from
each data holder in the enterprise allo(ed the IT staff
to 5uantitati&el+ and &isuall+ depict the patterns that
had had been there all the time the+ had Dust not
been measured. Appl+in4 that -no(led4e freed up
considerable staff time (hich had pre&iousl+ been
consumed b+ unnecessar+, reacti&e pa4in4 e&ents
and b+ dail+ dis- space monitorin4.
%o( that these anal+ses ha&e 4i&en us a 4limpse into
the nature of the or4ani2ation6s database
en&ironment, additional ne3t steps can be considered8
9ne could correlate .or Doin, in *databasee2e,1
97 file s+stem statistics (ith the D0M7 data
abo&e so as to automaticall+ determine if there is
enou4h space in a file s+stem to address the
forecast deficit.
>e could e3plore nonlinear re4ression
relationships, an+ number of classical transforms
.lo4, po(er, e3ponential, %
th
order pol+nomials,
s5uares, cubes, in&erses, etc.1 of the dependent
and independent &ariables could be performed to
determine if an+ combination of those +ields
hi4her $
2
&alues.
The &er+ first D0M7s, (hich appeared in the earl+
hunter4ather phase of IT, re5uired a tremendous
amount of staff time Dust to -eep them runnin4.
Ono(led4e about and e3perience (ith them (as
scarce, often ac5uired as fol-lore from the tribal elders
around the corporate campfires. 7ecret handsha-es
and ma4ical amulets abounded. M+stical robes (ere
fre5uentl+ donned in order to e3orcise the demons
that pla4ued that soft(are.
Go(e&er, o&er time, the D0M7 &endors as a 4roup
added more and more selfmana4in4 capabilities to
those s+stems, (hich made them less and less labor
intensi&e. '&en (ith those enhancements, the pes-+
problems associated (ith database dis- space
persisted.
7ome &endors did add features that selfmana4ed the
data holders b+ automaticall+ e3tendin4 them into
their host file s+stems on an as needed basis,
follo(in4 business rules set up b+ the database
administrators. Jendors are e&en puttin4 in features
that retract o&erallocated data holders into dis- space
footprints that more closel+ resemble their normal
usa4e patterns
As those features become the accepted best practices
in the (or-place, the clerical tedium of mana4in4
database dis- space (ill e&entuall+ become a faded
memor+, much li-e the punch card and the AQ,
flopp+.
The s+stem described in this paper is an attempt to
pro&ide a steppin4 stone to brid4e the 4ap bet(een
the present state of D0M7 capabilities and the future,
and to do so (ith a methodolo4+ firml+ rooted in
5uantitati&e anal+sis.
*7tatistics is the 4rammar of science.,
Oarl Pearson
0ritish mathematician and statistician
.?HA/ ?!3#1
R!&!r!nc!"
?
(((.scolumbiasd.-?2.pa.us:hs:business:4en4ler:
compapp:accintro.htm
2
http8::en.(i-ipedia.or4:(i-i:DatabaseIinstance
3
http8::en.(i-ipedia.or4:(i-i:DatabaseIinstance

Вам также может понравиться