Prepared for the CMG Conference Committee 32 nd Annual International Conference of The Computer Measurement Group, Inc. December 4 th !th 2""# $eno, %e&ada 'd(ard ). Trettel %orth(est Airlines, Inc. In the absence of special purpose monitoring and/or modeling software designed specifically for forecasting database disk space requirements, a solution was developed using general purpose database facilities and office suite products. The results achieved were (1 an understanding of the heretofore unknown trends and patterns in the use of disk space by individual databases (! the ability to accurately and proactively forecast the additional disk space needed for individual databases, and (" the ability to reclaim the forecast unused disk space, all based upon linear regression analyses. Introduction *%ecessit+, (ho is the mother of in&ention., Plato, The #epublic Gree- author and philosopher .42/ 0C 34/ 0C1 As the sheer number, si2e, and comple3it+ of databases deplo+ed in or4ani2ations continues to climb each +ear, the effecti&e mana4ement of those instances becomes a challen4e for the IT staff, and in particular for those char4ed (ith maintainin4 database inte4rit+, a&ailabilit+, performance, and reco&erabilit+. Practices rooted in reacti&e and fre5uentl+ ele&enthhour indi&idual heroics no lon4er ma-e the 4rade in toda+6s business en&ironment. $ather, one needs to create and adopt repeatable, proacti&e methodolo4ies in order to pro&ide appropriate IT ser&ice deli&er+. In a pre&ious position (ith his present emplo+er, the author (as a member of a !person database administration team (hich pro&ided operational and production support for 2"" distributed database instances, consistin4 of 4"" databases. The database mana4ement s+stems .D0M71 utili2ed consisted of8 9racle 7er&er .9racle1 7+base Adapti&e 7er&er 'nterprise .7+base1 I0M D02:;D0 .;D01 Microsoft 7<) 7er&er .7<) 7er&er1. 9peratin4 s+stem .971 en&ironments included8 ;%I= I0M AI= ;%I= 7un Micros+stems 7olaris Microsoft >indo(s 7er&er. As is the case (ith most operational support 4roups, an oncall pa4er rotation schedule placed a team member in the *hot seat, 24 hours a da+ for a one (ee- period. Durin4 this tour of dut+, the oncall person (ould respond to pa4es that (ere pro4rammaticall+ 4enerated b+ the combination of the 0MC Patrol Monitor for 7+base:9racle:;D0 product .hereafter referred to as Patrol1 and the Ti&oli 'nterprise Mana4er suite. Additional, manuall+4enerated pa4es (ere also issued b+ IT staff members at the roundthecloc- computer operations center. In the case of the Patrol4enerated pa4es, the pa4erdispla+ed te3t outlined the nature of the problem in terms of the hostin4 ser&er6s name, the D0M7 instance6s name, and an abbre&iated Patrol messa4e that described (hich pre&iousl+defined Patrol threshold had been crossed, or alert e&ent had occurred. The manuall+ 4enerated pa4es, on the other hand, contained a contact phone number. Arri&in4 at (or- one beautiful Minnesota 9ctober da+ in 2""", after ha&in4 spent a particularl+ restless ni4ht respondin4 to endless pa4er calls, the author decided that there had been enou4h sleep interruptions and remote lo4ins to (or- b+ him and his co(or-ers, most of (hich (ere related to a database needin4 to ha&e dis- space .manuall+1 added to it in the middle of the ni4ht. Anal+sis of the Patrol:Ti&oli lo4s re&ealed that, o&er the course of the pre&ious +ear, ?,"4? of the ?,##" pa4es issued to his (or- 4roup, or #2@, (ere due to database dis- space problems. That6s 2" pa4es a (ee- or about 3 pa4es a da+. These pa4es (ere issued (hen the database had crossed its !A@ full threshold. This &alue (as the tri44er point (hich the database 4roup had defined in Patrol as ur4ent and needin4 immediate attention. The details on (hat !A@ full actuall+ means in database terms are pro&ided later in this paper. In almost all cases the affected database and its attendant business s+stem or application ne&er actuall+ e3perienced an outa4e. $ather, the database had merel+ approached its predefined ?""@ upper dis- space limit, at (hich point the database (ould ha&e become unresponsi&e. The oncall person in the database 4roup manuall+ added space to the database after the pa4e (as issued in order to pre&ent a real database outa4e. In the author6s estimation, (hat (as needed (as a s+stem to .?1 4ather simple database si2e metrics from the &arious databases on a re4ular and automatic basis and .21 submit that data to linear re4ression anal+sis in order to forecast (hat the dis- space needs (ould be for each database6s stora4e components in the future. That &er+ da+ the author desi4ned, coded, tested, and implemented a simple, 2ero maintenance, selfdisco&erin4 dail+ data collection mechanism for the #" 9racle instances. That s+stem continues to run to this da+. 9nce the data collection for 9racle had run for se&eral (ee-s, a co(or-er (as as-ed to (rite and implement functionall+e5ui&alent data collectors for 7+base and for ;D0. Those (ere similarl+ put in place so as to brin4 these three D0M7s under a common collection frame(or-. M7 7<) 7er&er (as not included at the time since it had more selfmana4in4 capabilities in the database si2in4 area than the other D0M7s. The collected data (as simplicit+ itself. Bor each tablespace or database .see Definition of Terms and Database Concepts belo(1, the follo(in4 items (ere collected once a da+ and stored in a sin4le database table8 The date and time of the collection The instance name The D0M7 t+pe .7+base, 9racle, ;D01 The tablespace or database name The number of bt!"#$%%oc$t!d to this tablespace or database The number of bt!"#&r!! in this tablespace or database The number of bt!"#u"!d (as deri&ed as the difference bet(een the number allocated and number free. M!t'od" 0efore 4ettin4 into the details of the s+stem, let6s la+ the foundation for that discussion (ith some bac-4round information. Definition of Terms and Database Concepts D0M78 *7oft(are pac-a4e that allo(s +ou to use a computer to create a databaseC add, chan4e, and delete data in the databaseC sort the data in the databaseC retrie&e data in the databaseC and create forms and reports usin4 the data in the database., ? Instance8 *A database instance consists of the runnin4 operatin4 en&ironment (hich allo(s users to access and use a database. A database .as a 4eneric structured store of data1 becomes an instance (hen instantiated as a s+stem and made a&ailable &ia its database mana4ement s+stem. 7pecific database pro&iders can define database instances in terms of the precise hard(are and soft(are resources re5uired to ma-e them a&ailable8 thus the 9racle database re5uires allocated s+stem memor+ and at least one bac-4round process before the database counts as an instance., 2 In the author6s en&ironment there (ere appro3imatel+ 2"" database instances, each of (hich (as under the control of one of the D0M7s listed abo&e. Database8 *A database is a collection of information stored in a computer in a s+stematic (a+, such that a computer pro4ram can consult it to ans(er 5uestions. The soft(are used to mana4e and 5uer+ a database is -no(n as a database mana4ement s+stem .D0M71. The properties of database s+stems are studied in information science., 3 In the case of 9racle and ;D0 there is a onetoone relationship bet(een an instance and a databaseC e.4., there is one and onl+ one database associated (ith each instance. 7+base and 7<) 7er&er, on the other hand, ha&e a onetoman+ relationship bet(een an instance and its databasesC e.4., one instance can ha&e multiple databases defined and mana4ed (ithin it. The Ph+sical $epresentation of Database Content on Dis- All of the *stuff, that6s stored and mana4ed b+ a database .tables, indices, procedures, pac-a4es, rules, constraints1 has to ultimatel+ reside on dis-. 0elo( is an o&er&ie( of the different approaches ta-en b+ the &arious D0M7 architectures. Or$c%! and UDB use the concept of a t$b%!"($c! as the metaphor for holdin4 the contents of a database. There is a onetoman+ relationship bet(een an instance .database1 and its tablespaces. That is, an instance can and usuall+ does ha&e a number of tablespaces associated (ith it. Those tablespaces, ho(e&er, are not shared amon4 other, unrelated instances. A tablespace, in turn, consists of one or more *d$t$&i%!") .9racle1 or *cont$in!r", .;D01 (hich are the actual ('"ic$% &i%!" on dis- that are &isible to the hostin4 97. Sb$"!, on the other hand, use the concept of a d$t$b$"! d!*ic! to hold the contents of a database. A database de&ice is some(hat a-in to a tablespace. Database de&ices, in turn, consist of ('"ic$% &i%!" on dis- (hich are &isible to the 97. >hile a database de&ice can be used b+ multiple databases (ithin an instance, the practice at the author6s location is to associate onl+ one database to an+ particular database de&ice. Therefore, for purposes of trendin4 and anal+sis, dis- utili2ation metrics at the internal database le&el (ere 4athered and used for forecastin4, and not at the database de&ice le&el. In order to define a common terminolo4+ for the tablespace .9racle, ;D01 and database .7+base1 constructs across the &arious D0M7s, the author coined the term *d$t$ 'o%d!r,. >hen an instance or database e3periences a near or complete shorta4e of dis- space, it e3periences that shorta4e at its *data holder, le&el. That condition is manifested in DBMS !rror +!""$,!" to that effect. 7imilar+, from the 976s perspecti&e, the files (hich hold the database6s content are stored Dust li-e an+ other fileC i.e., inside an 97 &i%! ""t!+. >hile there are differences bet(een the (a+ the ;%I= and >indo(s 7er&er file s+stems (or- internall+, lo4icall+ the+ can be &ie(ed as a predefined amount of dis- space for holdin4 files. Analo4icall+, a data holder is to an $D0M7 database as a file is to an 97 file s+stem both are la+ers of abstraction in the path to the final representation of database content on dis-. $e4ardless of ho( file s+stems are instantiated to a particular 97 ima4e .7A%, %A7, arra+s, internal dis-, others1, the+ all share the common attribute of ha&in4 been assi4ned a finite si2e that meets the anticipated needs of that file s+stem. An ima4e of an 97 (ould t+picall+ ha&e man+ file s+stems defined to it, each (ith a different si2e. If a database instance has a ?"G0 file s+stem defined for its use, that file s+stem can hold an+ number of files as lon4 as the sum of their si2es is EF ?"G0. An+ attempt to increase the si2e of an e3istin4 file or create a ne( file that (ould brin4 that sum o&er ?"G0 (ould be met (ith an OS !rror +!""$,! and a denial of that attempt. 7tatement of the Problem %o matter (hich D0M7 is in&ol&ed, all databases operate (ithin the constraint of ha&in4 to house all of their content (ithin a set of data holders, each of (hich is predefined to be of a certain si2e. >hen an+ such data holder is first defined to the database, it (ill appear to the 97 to be a file or set of files (hich occupies the full si2e of the defined data holder. Bor e3ample, creatin4 a AG0 tablespace in 9racle (ill result in a file or set of files (hose sum of file s+stem dis- occupanc+, as seen b+ its hostin4 97, (ill be AG0. Go(e&er, from the D0M76s perspecti&e at this point in time, the tablespace is empt+ or "@ full, and has no database content in it +et. It sho(s up as an empt+ tablespace (ith the capacit+ to hold AG0 (orth of database obDects. If the hostin4 97 file s+stem (ere defined at ?"G0, it (ould see the file s+stem no( as A"@ full. As database obDects .tables, indices, etc.1 are defined and subse5uentl+ populated usin4 that data holder, it (ill present itself to the D0M7 as housin4 n b+tes. As n encroaches on AG0 it (ill come up a4ainst the ?""@ full internal D0M7 mar- and the D0M7 (ill not be able to add an+ more content to that data holder until it6s is made lar4er, or content is deleted. At that point the D0M7 (ill return error codes to an+ database operation (hich (ould result in the need for more dis- space in the effected data holderC e.4., 7<) I%7'$T re5uests and certain t+pes of 7<) ;PDAT' re5uests.1 %ote that there (ould be no 97 error messa4es since no attempt has been made to increase the si2e of the underl+in4 files. 7uch a condition (ould be seen as a loss of a&ailabilit+ to parts or all of the application usin4 that database. In order to 4et the application runnin4 a4ain, a shortterm 5uic- fi3 for this situation (ould be to increase the si2e of the data holder, pro&idin4 that the hostin4 file s+stem had unused space in it for such an increase. If the file s+stem (ere full, other IT pla+ers (ould ha&e to be contacted to see if alternati&e solutions could be triedC e.4., the applications 4roup mi4ht see if there (as an+ data that could be deleted from the database, or the host s+stem administration 4roup (ould see if the+ could increase the si2e of the effected file s+stems. >hile these options (ere bein4 e3plored, the application (ould be outofser&ice. >ere this to occur in the offhours, an e&en 4reater dela+ in restoration to normal ser&ice (ould be e3pected as oncall people as contacted to remed+ this basic dis- space problem from remote locations. The ma4nitude of this issue became apparent to the author (hen he sa( that that there (ere o&er 2,H"" data holders that made up the 7+base, 9racle, and ;D0 instances. These 2,H"" data holders in turn consist of o&er A,?"" indi&idual data files. Mana4in4 2,H"" data holders and A,?"" files in a reacti&e, pa4er e&entdri&en basis (as simpl+ not (or-in4. A proacti&e, 5uantitati&el+ based forecastin4 approach (as needed. 7tatement of the 7olution In order to pre&ent these t+pes of database dis- space problems from occurrin4, or at least to 4reatl+ reduce their li-elihood of occurrence, (hat (as needed (as information that characteri2ed the usa4e patterns of each of the data holders in the enterprise o&er time. Ga&in4 that data (ould allo( one to e3tract the underl+in4 trends and patterns e3hibited in the data holders o&er an e3tended period, and to forecast (hat the future needs (ere of each data holder. To that end, the author de&ised the simple collector mentioned earlier for all of the 9racle databases. The collector itself is a sin4le 7')'CT statement that le&era4es se&eral P):7<) features. This statement is run Dust once a da+ on a sin4le 9racle instance. That sin4le instance has database lin-s set up for all of the other 9racle instances in the enterprise. This allo(s the 7<) to 4ather the information from all other instances in the comple3 &ia database lin-s, and to 4ather all of the information for all data holders on all instances into a sin4le database table for anal+sis and lon4er term stora4e The P):7<) loops throu4h the db$#db#%in-" table, 4enerates the 7<) needed for each instance, and then e3ecutes the 4enerated 7<). The e3ecution is serial, 4atherin4 the needed information about all tablespaces in an+ one instance and then 4oin4 on to the ne3t instance until information from all instances is placed into the central repositor+ table. The information 4athered from each instance (as described abo&e and is repeated here in its database format in T$b%! . belo(8 T$b%! . T$b%! Co%u+n N$+!" $nd D$t$t(!" Column %ame Datat+pe batchIdate DAT' instance JA$CGA$2 2AA $D0M7 t+pe JA$CGA$2 # dataIholderIname JA$CGA$2 3" allocatedIb+tes %;M0'$ freeIb+tes %;M0'$ %ote8 *batchIdate, is the date and time (hen the data (as collected. In order to pro&ide a consistent &alue for all of the tablespaces in all of the instances at data collection time, the current date and time at the start of the collection process is stored as a constant. It6s then reused in all of the data e3tracted from each instance, e&en thou4h the actual e3traction times mi4ht be a minute or t(o offset from that &alue. Gi&en the lon4itudinal nature of the data used in the anal+ses, this difference in time is not a problem. Ga&in4 a consistent date and time for the all &alues collected that da+ allo(s 4roupin4 b+ date and time in subse5uent anal+ses. As noted earlier, shortl+ after the 9racle collectors (ere created and put into place, another team member created collectors for 7+base and ;D0. These collectors 4ather the same si3 fields as sho(n in T$b%! . since the data holder concept, b+ intent and desi4n, is e3tensible to all D0M7s. Due to architectural dissimilarities bet(een 9racle, 7+base, and ;D0, the actual means of collectin4 the information is uni5ue to each D0M7. The e3tracted data, ho(e&er, has the same meanin4 and is not sensiti&e to an+ particular D0M7 collection conte3t. 9nce this information (as 4athered to co&er a reasonable period of time, it (as possible to subDect the data and its deri&ati&es to a number of anal+ses, described belo(. R!"u%t" Borecastin4 Anal+ses Performed on the Data In order to use the collected data, it first had to be placed on a common platform that (ould pro&ide the anal+tics needed for forecastin4, descripti&e statistics, etc. >hile the 9racle collector accumulated all of its obser&ations about each 9racle instance into a sin4le table on the central 9racle collector instance, the 7+base and ;D0 collectors used a different approach. The+ created indi&idual flat files in comma separated &alue .C7J1 format on each 7+base and ;D0 instance6s host. >hat (as needed (as a means to 4ather the content of these three disparate data sources and put it in one spot. The initial solution chosen for this forecastin4 s+stem (as to use Microsoft6s 9)AP 7er&ices, a component of M7 7<) 7er&er &ersions /." and 2""". The C7J files from each 7+base and ;D0 instance (ere automaticall+ 4athered to4ether each da+ and sent &ia file transfer protocol .BTP1 to an M7 7<) 7er&er instance. There the+ (ere loaded into a common table .t#d$t$#'o%d!r1 &ia Data Transformation 7er&ices. 7imilarl+, the 9racle data for each da+ (as automaticall+ e3tracted out of its table and loaded into the same location. That table6s la+out is identical to the one sho(n in T$b%! .. All columns ha&e the *%9T %;)), attribute. The intent of desi4nin4 the data structure in this (a+ (as to pro&ide a means of performin4 multidimensional anal+ses alon4 &ariables of interest. >hile the main focus (as to forecast (hen each indi&idual data holder (ould run out of space, the presence of the instance, D0M7, and data holder columns allo(ed one to diceandslice the data alon4 those &alues. These are discussed later in the paper. The follo(in4 deri&ed measures (ere created for each data holder8 0+tesIused8 .b+tesIallocated K b+tesIfree1 Percent ;sed .0+tesIused:b+tesIallocated1L?"" Percent Bree .b+tesIfree:b+tesIallocated1L?"" >ith the table in place on M7 7<) 7er&er, the author defined a multidimensional data structure and set up anal+ses in M7 9)AP 7er&ices. That structure and its anal+ses &etted .serial1 time a4ainst bt!"#u"!d for each uni5ue combination of instance name and data holder name. This anal+sis used the most recent 3#A da+6s of dail+ data points for each data holder as input, and +ielded the slope, intercept, and Pearson product moment correlation coefficient .s5uared1 for each data holder on each instance. %umerous other descripti&e statistics (ere also calculated for each data holder such as the mean, median, mode, &ariance, standard de&iation, and number of obser&ations. The output of these 9)AP 7er&ices anal+ses (as in the form of a table (hich contained a ro( for each of the 2,H"" data holders. The columns in each ro( of this table (ere the instance name, the D0M7 t+pe, the data holder name, slope, the intercept, the $ 2 , the number of obser&ations, the mean, the median, the mode, the &ariance, and the standard de&iation for that *set,. Additional columns in this table, b+ (a+ of pro4rammin4 done b+ the author (ithin 9)AP 7er&cies, (ere the most current &alues for b+tesIused, b+tesIallocated, and b+tesIfree, alon4 (ith the &alue of the date of the most recent obser&ation in the past +ear6s set of data for this instance:D0M7:dataIholder set. )astl+, 9)AP 7er&ices (as further pro4rammed to ta-e these &alues and forecast (hat the e3pected shortfall or surplus (ould be, in b+tes, for each data holder si3 months from the current date, usin4 the method described immediatel+ belo(8 ;sin4 the simple linear e5uation *+ F m3 M b,, the author sol&ed for *+, in order to see ho( man+ b+tes (ould be in use .bt!"#u"!d1 at time *3,, (here *3, (as displaced ?H" da+s for(ard from the current date. Appl+in4 the slope *m, calculated for each data holder, (e arri&ed at the proDected bt!"#u"!d si3 months from no(. Burther pro4rammin4 in 9)AP 7er&ices +ielded the difference bet(een the most current bt!"#$%%oc$t!d number and the si3month forecast bt!"#u"!d &alue. If the difference bet(een the current bt!"#$%%oc$t!d and the forecast bt!"#u"!d (as positi&e, that indicated that (e (ould ha&e a surplus of that e3act ma4nitude si3 months from no( for that particular data holder in that particular instance. If, on the other hand, that result (as ne4ati&e, (e (ould ha&e a shortfall of that si2e in si3 months, and (ould need to ta-e correcti&e action no( so as to forestall an oncall co&era4e pa4er e&ent in the future. The abo&e calculations (ere all predicated upon fillin4 the data holder to ?""@ of its allocated capacit+ at the si3 month tar4et date, since (e (ere forecastin4 bt!"#u"!d a4ainst the most current bt!"#$%%oc$t!d. 0ased upon practical e3perience, and 4i&en the &ariance obser&ed o&er time in each data holder on each instance, or collecti&el+ across the D0M7 or instance dimensions, it (as e&ident that allo(in4 a data holder to approach ?""@ full (as a dan4erous practice. It did not ta-e into account the periodic, seasonal, and random ebbs and flo(s that (ere obser&ed in the data holder6s beha&ior (ith respect to bt!"#u"!d. %ot all data holders 4re( at a linear rateC rather, the+ e3hibited trou4hs and crests in bt!"#u"!d o&er the course of time. The consensus (ithin the author6s (or-4roup, after ha&in4 loo-ed at the data for all instances and data holders, (as that a best practice (ould be to forecast data holders to reach their /"@ full *saturation, point. That bein4 a4reed, bt!"#u"!d (as adDusted in the e5uation b+ di&idin4 it b+ "./". The &alues of the surplus:shortfall column for all 2,H"" data holders (ere then e3amined for the lar4est ne4ati&e &alues. This (as done b+ importin4 the 9)AP cube into '3cel. The conditional formattin4 feature in '3cel (as used to sho( those data holders (ith a proDected shortfall to ha&e their numbers literall+ *in the red,. In some cases there (as sufficient space in the underl+in4 file s+stem.s1 to satisf+ the forecast shortfall, and the data holder (as increased in si2e b+ the amount calculated b+ one of the database staff. Go(e&er, in other cases, there (as not sufficient free space in the underl+in4 file s+stem.s1 and a formal re5uest (as created for the ;%I= dis- mana4ement 4roup to add the needed number of b+tes. This proacti&e, forecastin4 approach allo(ed such re5uests to be fulfilled (ell in ad&ance of the si3 month proDected shortfall. The forecast numbers, in and of themsel&es, (ere not used blindl+. '3amination of the $ 2 &alues for each data holder (as used to assess ho( (ell the obser&ed data points resonated to the march of time. If the $ 2 &alue (as belo( ".H", &isual inspection of the plotted bt!"#u"!d data points (as underta-en to help understand the pattern, if an+, in the data. If the data (as *all o&er the place, for a particular data holder, the database team (ould ma-e a best estimate of (hat to do (ith that indi&idual data holder, and manuall+ monitor it more closel+ in the comin4 si3 months. 7ince $ 2 is the percent of the obser&ed &ariance that6s accounted for b+ the independent &ariable .time in this case1, ".H" (as arbitraril+ used as a line in the sand a4ainst (hich all forecasts (ere e&aluated for usefulness. These anal+ses (ere run on a re4ular and automatic basis e&er+ three months. The results (ere e3amined b+ the database 4roup to see (hich data holders needed an adDustment to accommodate their proDected 4ro(th .or decline1 in the ne3t si3 months. 0e+ond Borecastin48 Additional Insi4hts Pro&ided b+ the Data Ga&in4 a +ear6s (orth of data in the database no( allo(ed the author to pose specific 5ueries about the nature of all the databases in the or4ani2ation. It (as onl+ b+ ha&in4 this data and e3ploitin4 its emer4ent properties that these insi4hts (ere possible Geretofore, such 5uestions had been impossible to ans(er 5uantitati&el+ due to there bein4 no historical data. 0est 4uesses (ere made, current realtime data (as used, and the collecti&e anecdotal e3perience of the (or-4roup (as combined to produce 7>AG ans(ers. %o(, (ith the actual data at hand, a number of 5uestions could be and (ere ans(ered8 Q/ >hich data holders e3hibit has the hi4hest or lo(est rates of 4ro(thN A/ 0+ sortin4 the 9)AP cube on the slope &alue, (e displa+ all data holders ordered b+ their rate of 4ro(th, from positi&e to ne4ati&e. Q/ >hich data holders e3hibit has the hi4hest data content occupanc+N A/ 0+ sortin4 the 9)AP cube on the mean bt!"#u"!d &alue, (e displa+ all data holders ordered b+ amount of information the+ store. Q/ >hich data holders e3hibit has the hi4hest dis- occupanc+N A/ 0+ sortin4 the 9)AP cube on the mean bt!"#$%%oc$t!d &alue, (e displa+ all data holders ordered b+ amount of dis- space the+ ta-e up. Q/ >hich data holders are the most o&er or underutili2edN A/ 0+ sortin4 the 9)AP cube on their P!rc!nt U"!d or P!rc!nt Fr!! &alues, (e can displa+ all data holders ordered b+ (here the+ stand in the "?""@ data holder full cate4or+. Perhaps the most &aluable insi4ht (as pro&ided b+ creatin4 a pi&ot table:chart in '3cel, (hich (as published to the corporate intranet for use b+ mana4ers, de&elopers, business anal+sts, and others. This allo(ed IT staff to &isuali2e the trends and other characteristics of the data in an interacti&e manner. 7ince '3cel has an internal limit of #AO ro(s in a (or-sheet, and (e had 2,H"" data holders (ith 3#A obser&ations each, or ?,"22,""" ro(s, the ra( data could not be put into '3cel. Instead, the author elected to onl+ use the data from the production instances, and to further a44re4ate that into its (ee-l+ mean &alues. This (as done b+ (ritin4 a tri&ial 7<) statement to find the (ee-l+ means for bt!"#u"!d, bt!"#&r!!, and bt!"#$%%oc$t!d for each uni5ue combination of instance name, D0M7 t+pe , data holder name, and (ee- number (ithin the +ear. .7ince the data (as on M7 7<) 7er&er, the datepart *(ee-, &alue (as used in the G$9;P 0P clause. The datepart *+ear, (as used in the e3pression to order the data appropriatel+, since (e had multiple +ears in the database.1 The data for the pi&ot tables and charts consisted of the elements sho(n in T$b%! 08 T$b%! 0 Pi*ot T$b%! D$t$ E%!+!nt" dbmsIprdIname .7+base, 9racle, ;D01 dbIinstance .instance name1 dataIholderIname dbsI+ear .PPPP1 dbsI(ee- .?A21 Bt!"#u"!d .b+ that data holder1 PercentIBull .for that data holder11 T(o pi&ot tables and t(o pi&ot charts (ere created from this data8 one that sho(ed the .absolute1 bt!"#u"!d &alues, and one that sho(ed the .relati&e1 (!rc!nt &u%% &alues. The bt!"#u"!d pi&ot table and chart had the follo(in4 characteristics8 Its 3a3is (as time, e3pressed as the past ?2 months, usin4 the +ear and the (ee- (ithin the +ear. 7ince the past ?2 months (ould span a +ear in all cases but the be4innin4 of a ne( +ear, t(o pi&ot select buttons appear on that a3is8 one for +ear and one for (ee- (ithin +ear. Bor the most part these buttons (ere unused, and the entire ?2 month6s of data (as &ie(ed. Its +a3is plotted bt!"#u"!d. Pi&ot buttons (ere pro&ided for8 o dbmsIprdIname o dbIinstace o data holder name This allo(s the &ie(s to diceandslice the data alon4 an+ of these dimensions. Bor e3ample, these 5uestions (ere ans(ered in the pi&ot chart8 >hat6s the pattern of bt!"#u"!d o&er the past +ear for8 o All 9racle instancesN o All 7+base instancesN o All ;0D instancesN o 9racle and 7+base combinedN o 9racle and ;D0 combinedN o 7+base and ;D0 combinedN o 7+base and 9racle and ;D0 combinedN >hat6s the pattern of bt!"#u"!d o&er the past +ear for8 o An+ indi&idual instanceN o An+ combination of instancesN .%ote this also permits an+ combination of instances of interest, re4ardless of the D0M7 that6s hostin4 them.1 >hat6s the pattern of bt!"#u"!d o&er the past +ear for8 o An+ indi&idual data holderN .%ote that one must enter an instance name for this to be meanin4ful. 9ther(ise it (ould sho( the total &alue for all data holders that ha&e that name, re4ardless of the instance name.1 o An+ combination of data holdersN An e3ample of this pi&ot chart is sho(s in Fi,ur! . belo(. Fi,ur! . Pi*ot C'$rt o& Bt!" U"!d The (!rc!nt#u"!d (i*ot t$b%! and chart had the same setup as the bt!"#u"!d pi&ot table. Go(e&er, since the percent calculation (as performed at the data holder le&el in the ra( data, it (ould not be &alid to do an+ rollups on the D0M7 or instance dimensions. Therefore a (arnin4 messa4e (as (ritten to appear on the pi&ot chart that read8 $%&nly meaningful if a data'holder'name is selected, and that data'holder name is unique across ()*+s and instances. +upply further dbms'prd'name and db'instance criteria to ensure uniqueness%,1 Fi,ur! 0 sho(s the pi&ot chart. 0+ usin4 this second pi&ot chart, people in the I7 infrastructure could see (hich data holders are close to their ?""@ full limits. Also, the pi&ot table content can be cop+:pasted into a ne( spreadsheet and then sorted on its percent full &alue to sho( all data holder6s in the enterprise in order b+ their percent full &alues. ;sin4 e3cel Bilterin4, these 5ueries can further be refined into D0M7 t+pe, instance name, and data holder name. Fi,ur! 0 Pi*ot C'$rt o& P!rc!nt U"!d The '3cel spreadsheet (as created such that one can refresh the ra( data from the source database on M7 7<) 7er&er (ith a sin4le clic-. Therefore, each month it6s possible to completel+ update the pi&ot tables and charts (ith &irtuall+ no effort. Di"cu""ion 0+ measurin4 and storin4 Dust these t(o, simple metrics e&er+ da+ for each data holder on each instance .bt!"#$%%oc$t!d, bt!"#&r!!1, the or4ani2ation (as able to e&ol&e from its pre&ious reacti&e mode to a more proacti&e and methodical process. 0enefits 7ome of the benefits that accrued throu4h this shift in focus and the use of applied mathematics (ere8 >ith this data no( published on a re4ular monthl+ basis to the intranet, the consumers of it ha&e 4ained considerable insi4hts into the seasonal and other &ariations in their data usa4e patterns. The (or- 4roup responsible for ac5uirin4 dis- space for the entire I7 or4ani2ation can no( set realistic bud4et &alues for ne3t +ear6s dis- space re5uirements, based upon the hi4her le&el rollups of the bt!"#u"!d data. Pa4er call reduction8 the ?,"4? pa4es that (ere pre&iousl+ issued per +ear for database dis- space problems dropped to onl+ a handful. The rates of 4ro(th of the &arious applications or business s+stems at the or4ani2ation (ere no( 5uantified and published. This allo(ed the IT or4ani2ation to compare those rates bet(een applications, +earo&er+ear, etc. The or4ani2ation can no( identif+ an+ anomalous rates that mi4ht indicate that an application chan4e .intended or not1 or business dri&er &ariation (as ha&in4 a si4nificant impact on the rate at (hich data (as bein4 accrued in a database. Descripti&e statistics can be compared bet(een data holders to better understand their central tendencies and dispersion characteristics. Conc%u"ion It6s fre5uentl+ ama2in4 ho( Dust a tin+ set of data obser&ations can form the basis of unco&erin4 the underl+in4 substrates of an IT infrastructure. Automaticall+ 4atherin4 Dust t(o &alues per da+ from each data holder in the enterprise allo(ed the IT staff to 5uantitati&el+ and &isuall+ depict the patterns that had had been there all the time the+ had Dust not been measured. Appl+in4 that -no(led4e freed up considerable staff time (hich had pre&iousl+ been consumed b+ unnecessar+, reacti&e pa4in4 e&ents and b+ dail+ dis- space monitorin4. %o( that these anal+ses ha&e 4i&en us a 4limpse into the nature of the or4ani2ation6s database en&ironment, additional ne3t steps can be considered8 9ne could correlate .or Doin, in *databasee2e,1 97 file s+stem statistics (ith the D0M7 data abo&e so as to automaticall+ determine if there is enou4h space in a file s+stem to address the forecast deficit. >e could e3plore nonlinear re4ression relationships, an+ number of classical transforms .lo4, po(er, e3ponential, % th order pol+nomials, s5uares, cubes, in&erses, etc.1 of the dependent and independent &ariables could be performed to determine if an+ combination of those +ields hi4her $ 2 &alues. The &er+ first D0M7s, (hich appeared in the earl+ hunter4ather phase of IT, re5uired a tremendous amount of staff time Dust to -eep them runnin4. Ono(led4e about and e3perience (ith them (as scarce, often ac5uired as fol-lore from the tribal elders around the corporate campfires. 7ecret handsha-es and ma4ical amulets abounded. M+stical robes (ere fre5uentl+ donned in order to e3orcise the demons that pla4ued that soft(are. Go(e&er, o&er time, the D0M7 &endors as a 4roup added more and more selfmana4in4 capabilities to those s+stems, (hich made them less and less labor intensi&e. '&en (ith those enhancements, the pes-+ problems associated (ith database dis- space persisted. 7ome &endors did add features that selfmana4ed the data holders b+ automaticall+ e3tendin4 them into their host file s+stems on an as needed basis, follo(in4 business rules set up b+ the database administrators. Jendors are e&en puttin4 in features that retract o&erallocated data holders into dis- space footprints that more closel+ resemble their normal usa4e patterns As those features become the accepted best practices in the (or-place, the clerical tedium of mana4in4 database dis- space (ill e&entuall+ become a faded memor+, much li-e the punch card and the AQ, flopp+. The s+stem described in this paper is an attempt to pro&ide a steppin4 stone to brid4e the 4ap bet(een the present state of D0M7 capabilities and the future, and to do so (ith a methodolo4+ firml+ rooted in 5uantitati&e anal+sis. *7tatistics is the 4rammar of science., Oarl Pearson 0ritish mathematician and statistician .?HA/ ?!3#1 R!&!r!nc!" ? (((.scolumbiasd.-?2.pa.us:hs:business:4en4ler: compapp:accintro.htm 2 http8::en.(i-ipedia.or4:(i-i:DatabaseIinstance 3 http8::en.(i-ipedia.or4:(i-i:DatabaseIinstance