Академический Документы
Профессиональный Документы
Культура Документы
Also:
BAckup
|
Storage
A decade of storage
Theres been plenty of technical innovation over the last 10 years, but in some cases, were still struggling with the same old problems.
Storage magazine, and after blowing out the candles on a virtual birthday cake, i got to thinking about how much has happened storage-wise over those 10 years. So i dug out some old issues, figuring id do some remember when reminiscing while getting a chuckle out of how primitive we were back then. i flipped the pages all the way back to may 2002just the third issue of Storageand prepared myself for a good laugh. the first story that caught my eye had the headline remote Backup Services: the road not taken, and included the line getting remote workers to perform backups is like pulling teeth. ha! thats so . . . so . . . so much like today, actually. ten years ago storage managers were fretting over protecting data at their companies outlying locations. today, there are far more technology alternatives, but the war is still being waged and few companies can boast a comprehensive data backup operation that includes all company locations. im not suggesting that there hasnt been some progress, but its relatively modest, and with the new scourge of storage shopssmartphones and tabletsupon us, its not exactly a rosy picture.
Storage April 2012 writing from the publisher. For permissions or reprint information, please contact Mike Kelly, VP and Group Publisher (mkelly@techtarget.com). Storage May 2010
Copyright 2012, TechTarget. No part of this publication may be transmitted or reproduced in any form, or by any means, without permission in
Storage
A decade of storage
Just a few months later, the mobile data protection issue reared its ugly head again. September 2002s cover declared Backup nightmarewhat you can do about mobile computers. the June 2002 cover story was on storage virtualization, mainly about how it and storage managers were less than enthusiastic about implementing it. Apparently, they had a lot of reasons for avoiding the technology: they cite a litany of obLook around your shop stacles, including a lack today, and if youre of standards, spotty inlucky enough to have teroperability, watereddown management funcbeen there 10 years ago, tionality and the potential you know its a very for showstopping support different place. conflicts. can you imagine that? well, maybe not so hard to imagine, since potential storage virtualizers face many of the same stumbling blocks today. But were making progress; our surveys tell us that approximately two-thirds of companies are at least dabbling with storage virtualization. in that same issue there was an article that began Buzzword alert: the latest word to loom from storage marketers lexicon is utility. yikes! i think i used that same sentence to describe cloud in one of these columns. or was it big data? or maybe compliance? its good to see that weve improved on something in the past 10 yearsthe storage industry has elevated hype into an art form. we kicked off our second year of existence with January 2003s products of the year cover story. most of the award winners are still around in some shape or form, including gold winners Brocade, commvault, Fujitsu, nexsan and overland. in fact, Brocade was a gold winner in the 2011 products of the year competition. we wrapped up 2003 with a December cover story reporting the
Storage
A decade of storage
results of our very first storage salary survey. the overall average salary for that year was $77,554. A lot of things havent changed over the years, but were happy to report that participants in the 2011 survey averaged $86,926. thats slightly more than a 12% boost, which isnt too shabby considering the mother of all recessions happened about midway between then and now. on the other hand, some things do remain the same, and were glad they do: 2012 will mark the 10th consecutive year weve surveyed storage professionals about their compensation. my point isnt that we havent made much progress in the last decade; its quite the opposite. look around your shop today, and if youre lucky enough to have been there 10 years ago, you know its a very different place. And storage technology certainly hasnt been on vacation; just look at the vocabulary of todays headlines solid-state, object, cloud, petabyteand you know storage technology hasnt been standing still. And yet, were still grappling with some of the same issues we tussled with last year and the year before and the year before that. the last feature story in the December 2003 issue of Storage was titled Starting the ilm process. ironically, few if any firms ever started the ilm process. But with a new name (auto-tiering) and a new urgency about it (solid-state storage), its suddenly in vogue. maybe the bottom line is that technology cant solve all storage problems. maybe a true solution is about the processes that are built around and support the technology, how the tech gets used, and how it integrates with other processes and technologies. By that measure, id say weve made considerable progress in the last 10 years. n
Rich Castagna is editorial director of the Storage Media Group.
Storage
A decade of storage
A
Storage April 2012
The impact of the flooding in Thailand on the disk drive supply chain was certainly real, but it looks like disk makers are spinning tales about shortages to justify price hikes.
s you read this, disk drive supplies should have largely recovered
from the horrendous floods that ended earlier this year in thailand, after 175 days of continuous monsoon. Drive prices may also have come back down to earth. maybe. thailand is apparently the source for disk drives. Seagate manufactures there, as does western Digital. As early as october 2011, both were predicting supply shortages throughout 2012 as a byproduct of the floods. For the record, that was three months after the onset of the rain. it took much less time after their announcements for drive prices to spike. integrators were complaining to me that the cost of disk drives had shot up 10 times by late october. curiously, the drive shortage mantra they began back in october continues to be echoed by storage system and server vendors, and by industry analysts at gartner and iDc right up to the deadline for this columndespite the fact that both Seagate and western Digital have given the all clear signal. let me back up a minute for those who havent paid attention to the news of the last few months. truth be told, you may have missed the back story on what happened in thailand. in the tech
Storage
A decade of storage
sector reporting ive been reading, there was hardly a mention of a massive monsoon, triggered by the landfall of tropical Storm nock-ten, that displaced 13.6 million people in thailand. there was no mention that the storm and floods had killed 815 people, that seven major industrial estates were swamped in 10 feet of water, or that the storm and flooding constituted the fourth most costly disaster in recorded history (surpassed only by Japans earthquake/tsunami/nuclear disaster last year, the 1995 kobe earthquake and hurricane katrina). The headline in our tech the headline in our tech industry press is that industry press is that were all about to be majorly inwere all about to be conveniencedend users majorly inconvenienced. by huge price increases on those SAtA disks we buy from Best Buy, compuSA, Frys and so on; while storage array vendors and pc makers will have to deal with drive supply shortages. yet, quizzically, as i sit at my desk in my Florida office, staring at one array with a red light glowing on one bank of drive bays and at another where the drive light is completely dark (different representations of dead drives by different array vendors), i have been reviewing statements in the financial reports of the big two drive makers. By november 28, 2011 (a month after its dismal preview of the coming year), Seagate reported that it would ship 43 million drives, which is only approximately 7 million below its high-end estimate. on January 4, 2012, the company reported it had shipped more disk drives in December than it had expected and would be reporting revenue in excess of its earlier guidance. For its part, western Digital reported on october 17 that the floods would have a significant impact on drive shipments and its ability to meet customer demand. magically, the hitch was re-
Storage
A decade of storage
solved significantly ahead of expectations, according to company spokespersons during the earnings call on January 23. the firm shipped approximately 29 million drives, which seems like a lot. Both companies also reported earnings well in excess of october estimates. western Digital earned $1.51 per share (well beyond the projected $0.65 per share) while Seagate earned $1.32 per share (topping its $1.08 per share estimate). meanwhile, industry analysts at gartner and iDc havent deviated from their doom and gloom forecasts, blaming drive shortages for everything from the decline of the pc market to slowdowns in it hardware spending, which they continue Industry analysts at to insist will dominate Gartner and IDC havent 2012. Better information is coming from wall deviated from their doom Street analysts. During and gloom forecasts, western Digitals earnblaming drive shortages ings call, one gutsy guy for everything from the made the observation that the financial success of decline of the PC market both Seagate and western to slowdowns in IT Digital this quarter was hardware spending. likely a function of price gouging during an alleged supply shortage. when he asked western Digital bosses whether they thought such high margins were sustainable, they declined to comment. For all the hate directed at wall Street these days, i kind of like this guy. unfortunately, his question continues to go unanswered. is the drive shortage manufactured to jack up prices? theres little doubt there were some supply-chain interruptions due to the natural disaster in thailand, but the impact is still influencing the game today? Really?
Storage
A decade of storage
perhaps everyone in the storage array food chain is using the popular theme of thai floods creating supply shortages just to gouge consumers? its been done before . . . too many times to count. remember the reports that came out after the last jump in gas prices, or after hurricanes Andrew or katrina, proving that everybody in the supply chain jumped on the bandwagon to raise their prices even though supplies were more than adequate? could this be another manifestation of the same disease? i worry that it might be. i worry every time i read an article that proclaims a big increase in flash solid-state drive (SSD) uptake as they become price competitive with disk drives. or that certain array vendors are leveraging drive supply shortfalls to push obscenely overpriced deduplication or thin provisioning rigs. meanwhile, two of my disk drives are dead. no way am i paying a 400% markup for the same drives just because somebody is trying to make an extra buck on the backs of 800-plus dead thais. there, i said it. n
Jon William Toigo is a 30-year IT veteran, CEO and managing principal of Toigo Partners International, and chairman of the Data Management Institute.
10
I have too many storage devices and spend too much time managing them. And I still have capacity problems.
Your stuffs too big and theres too much of it. Its too hard to manage, and you still dont have enough space. We have 6 reasons why you need to move up to virtualized storage solutions from Hitachi Data Systems.
Read CapaCity EffiCiEnCy with StoragE tEChnologiES by eSG to leaRn How you can Get leSS wItH HItacHI data SyStemS and see all 6 reasons at
hds.com/go/IWantLess
Integrated
A decade of storage
cloud backup
LAuren WhITehouse
one of the most expedient ways to realize the economic benefits of cloud storage is to integrate your current backup or Dr operations with a cloud backup service.
ClOuD SERvICES aDOPTIOn is growing, and last years tire-kickers are making real investments today. research from milford, mass.-based enterprise Strategy group (eSg) found that 74% of it departments will increase 2012 spending on public cloud computing services to help contain costs, while 63% plan to increase spending on server virtualization software or begin to build out a private cloud on top of their existing virtualized infrastructure (eSg research report, 2012 IT Spending Intentions Survey, January 2012).
12
Storage
A decade of storage
For many, the on-ramp to the cloud is the integration of onpremises functions with cloud infrastructure. A relatively easy entry point to cloud storage services is to integrate on-premises backup operations with a cloud-based service. cloud-based storage and computing provide off-site, long-term storage and/or a disaster recovery (Dr) platform without having to fund/build one. organizations gain additional infrastructure assets, but at a fraction of the cost. the capital costs of equipment, as well as the operational costs of floor space, staff, energy, maintenance, software and equipment updates can be eliminated. redundancy to support business continuity (Bc) requirements is often inherent, without the additional costs typically expected in a do-it-yourself model. cloud backup services can be used to capture and store backup copies to replace a disk-to-disk-to-tape (D2D2t) approach while automatically storing backup sets off-site. the same services can often be used to store replicated instances of production workloads for cloud-based Dr (see integrating with cloud Dr services). For backup and recovery functions, theres a spectrum of cloud integration approaches available (see cloud backup integration models). one tactic that actually eliminates on-site backup infrastructure is backup software as a service (SaaS), which involves running the backup application and storing backup copies in the cloud. Another infrastructure elimination approach is to outsource it services to a managed service provider (mSp), allowing the mSp to host production applications and manage it infrastructure, including backup and recovery operations. there are also multiple ways to integrate on-premises backup functions with the cloud via a disk-to-disk-to-cloud (D2D2c) approach: leveraging a public cloud infrastructure as a service (iaaS), creating a virtual private cloud in a public cloud environment or developing a private cloud infrastructure.
13
Storage
A decade of storage
14
Storage
Amazon, microsoft Azure, nirvanix and rackspace. Some of the backup vendors implementing this form of cloud integration include cA, commvault, iBm, Quest, Symantec and veeam. while the purchase of a cloud storage gateway appliance is an
(Dr) is the primary use case for using cloud services, there are a few scenarios to consider. A cloud can serve as a cold, warm or hot standby site.
if disaster recovery
old standby implementations maintain off-site copies of C data on storage and can make available the infrastructure needed for bare-metal recovery, enabling recovery time within hours to days of an interruption at the primary site. using cloud-based virtual machines (vms) accelerates recovery time over acquiring, deploying and configuring physical equipment and restoring data. arm standby approaches maintain replicated systems in W a dormant state, enabling recovery time objectives to within minutes to hours of an outage. virtual machine images that encapsulate the operating system, application, data and configuration settings make it simpler to synchronize between the primary and cloud-based warm standby site. in the event of a disaster, vms are activated on demand and failover to an instance of the application occurs. ot standby is a scenario where applications and data are H maintained off-site in a running state, enabling the most rapid recoveryusually within seconds to minutes. the hot site can immediately take over operations in the event of a primary site failure. the tradeoff as you go up the recovery site temperature scale is cost and, often, complexity.
15
Storage
A decade of storage
added cost, it can provide greater flexibility in iaaS vendor selection. this type of implementation, however, does have some drawbacks. First, it organizations may have to adjust deduplication, compression and encryption settings. Deduplication, compression and encryption performed by the backup application itself would be redundant to the services offered by a gateway appliance like riverbeds. Also, retention settings for local and cloud storage are typically configured at the gateway applianceand not at the backup applicationwhich can introduce a layer of management complexity. lastly, the backup application only sees the gateway device as the local storage repository; its not aware of copies replicated by the gateway appliance to the cloud tier, a situation that can delay recovery if the backup application requests data that resides only in the cloud.
16
Backup SaaS ubscription service S hared off-site infrastructure S loud-based backup application c
Primary Data Policy Management Shared Compute Shared Storage Infrastructure Management
D2D2CPublic Cloud erpetually licensed backup software p edicated on-site infrastructure D anaged locally m dditional off-site copy on shared A off-site storage accessed via Apis
Primary Data Backup Server Backup Storage Policy and Infrastructure Management aPIs Shared Compute Shared Storage Infrastructure Management
D2D2Cvirtual Private Cloud erpetually licensed backup software p edicated on-site infrastructure D anaged locally m dditional off-site copy on virtually A private shared off-site storage D2D2CPrivate Cloud erpetually licensed backup software p edicated on-site and/or shared D off-site corporate infrastructure anaged locally m
Primary Data Backup Server Backup Storage Policy and Infrastructure Management Shared Compute Shared Storage Infrastructure Management Primary Data Backup Server Backup Storage Policy and Infrastructure Management aPIs Shared Compute Shared Storage Infrastructure Management
Managed Services erpetually licensed software p edicated off-site infrastructure D anaged by third party from or at m an off-site location
Primary Data Backup Server Backup Data Policy and Infrastructure Management
17
Storage
A decade of storage
and maintains them on like infrastructure. For example, iland, an mSp, offers a hosted Dr solution for vmware vSphere virtual machines based on veeam Backup & replication. Similarly, verizon terremark partners with netApp to build out a cloud-based, multitenant backup solution based on netApps storage systems and data protection portfolio. emcs cloud strategy is based on a similar model to deliver remote and replicated backup services. emc mSps use emc Data Domain or emc Avamar at the subscribers site for local protection, and replicate copies to multi-tenant configurations of emc Data Domain or emc Avamar at the mSps site for cloud copies.
18
Storage
A decade of storage
vendors cloud facilitates disaster recovery. Sungards recover2cloud for Server replication replicates physical and virtual systems to the cloud, while recover2cloud for vaulting copies backup sets to the Sungard cloud. evault plug-n-protect is an on-premises appliance that combines with evault offsite replication Service, which replicates the on-site vault to the evault cloud to create a cloud-integrated solution. in addition to the one throat to choke benefits of dealing with a single vendor for an end-to-end solution, what stands out with these vendors is that they also offer recovery services where teams at the cloud data center Integrating on-site backcan facilitate recovery in up with public, private the cloud infrastructure.
integrating on-site backup with public, private or virtual private clouds is only feasible if uplink bandwidth is sufficient. A daily incremental backup of a 100 gB of data at a 10 mbps transfer rate could take nearly 24 hours to complete. upgrading to a 100 mbps connection reduces transfer time significantly to a little more than two hours; however, bandwidth costs are often doubled. thats why its important to take advantage of bandwidth optimization features, such as deduplication and compression. the only gotcha is that data in a deduplicated or compressed state in the cloud still has to be reconstituted and restored to be recognized by the production application. the data isnt in a usable state for a cloud-based Dr scenario and there could be a time delay if a bulk transfer of data from cloud storage needs to occur for on-site recovery either over a bandwidth connection or via shipped portable media. one remedy to this dilemma is to recover in the cloud. implementing a D2D2c strategy for a whole system in the cloudnot
19
Storage
A decade of storage
just the dataimproves recovery time objectives. For subscribers who have virtualized workloads at their primary site, this scenario is straightforward. the portability of a virtual machine encapsulating an application instance streamlines whole system backup and recovery processes. nearly all backup vendors support backup and recovery of virtual systems, so its just a matter of contracting for the compatible cloud resources to create a failover site. Some vendors, namely Arkeia and Zmanda, offer a virtual backup appliance. this allows customers to run the backup server in the cloud and replicate data between the on-premises backup server and the cloud-based one. Data can be restored in the cloud or the cloud storage can be mounted for on-premises backup services. other products such as AppAssure and Symantec Backup exec can perform recoveries in the cloud. Symantecs solution is limited to virtual environments, but AppAssure can protect both physical and virtual systems. replication between the on-premises AppAssure backup server and an AppAssure core instance running in the Amazon elastic compute cloud (ec2) makes it possible to recover in the cloud on demand. the ability to run the backup application in the cloud also helps organizations protect cloud-resident production applications. cloud infrastructure offers tremendous advantages to reduce costs and simplify recovery operations, especially for integrated backup. the on-demand, pay-as-you-go characteristics of cloud storage services are a perfect match for the D2D2c use cases for reducing or eliminating tape media and facilitating disaster recovery. n
Lauren Whitehouse has more than 25 years of experience covering backup and replication, and other data protection technologies.
20
tmare h ig N
for storage pros.
Get your virtual environment under control. Check out our Top 10 Server Virtualization Tips for storage managers: www.SearchStorage.com/Server_Virtualization
A decade of storage
It can still be a struggle at times, but managing storage in virtual server environments is better understood today, with tighter integration and more effective management tools available.
By ChrIs evAns
virtual servers
StorAge mAnAgement hAS developed into a discipline in its own right, driven by the growth of data and the emergence of standards such as Fibre channel (Fc), iScSi and nFS, which have enabled the centralization and standardization of storage systems. As virtualization has become the main technology for server and desktop optimization, storage has been a key component in delivering highly scalable virtualized solutions. without centralized storage, certain features such as nondisruptive virtual machine (vm) migration wouldnt have been possible.
22
Storage
A decade of storage
however, while storage has provided significant benefits, it also poses new challenges for both storage and virtualization administrators. virtualization adds another layer of complexity in understanding the relationship between a server and the storage it uses. that layer of abstraction makes it difficult to translate storagecentric concepts such as logical unit numbers (luns), rAiD groups and disks into virtual objects such as virtual hard disks (vhDs) and virtual machine disks (vmDks). Storage administrators need to take a new approach when delivering storage to virtual environments.
tHe cHALLenges
Disk drive shortage: Fact or fiction?
virtualization creates new operational headaches. Because many vms can exist on a single storage lun, the i/o profile of virtual servers and desktops tends to be more random and unpredictable in nature. the functionality of todays hypervisors enables large amounts of i/o to be generated when moving virtual machines around the storage infrastructure through the use of features such as vmware inc.s Storage vmotion and microsoft corp.s hyper-v live migration. virtualization may also impact heavily on storage utilization, as virtual machines are copied, cloned or otherwise replicated across the environment. we must also consider the operational structures that have been built up in many large organizations. As it infrastructure has grown, the component technologies have tended to split into silos covering the disciplines of storage, networking, servers and databases. once, it was possible for storage admins to go about their business with little regard for the operation of other parts of the infrastructure. But virtualization has changed that world and made it necessary for those isolated silos to integrate like never before.
cHoosing A strAtegy
efficient storage management in virtual environments entails meeting two basic metrics: capacity and performance. while this
23
Storage
A decade of storage
could also be said of nonvirtualized environments, performance is the primary consideration in virtual storage designs as it has more of an impact on the operation of a virtual infrastrucefficient storage ture. Slow response times management in virtual from a single lun are likeenvironments entails ly to affect only a single host in nonvirtualized enmeeting two basic vironments; however, poor metrics: capacity and responses from a large performance. lun supporting many virtual machines can have a much wider impact. this is especially so with a virtual desktop infrastructure (vDi). there are a number of strategies a storage administrator should consider.
24
Storage
array. As the leading hypervisor vendor, vmware has developed a number of Apis, including vStorage Apis for Data protection (vADp) and vStorage Apis for Storage Awareness (vASA). vASA is of increasing importance in the delivery of scalable storage environments, providing configuration information to the hypervisor about storage luns, including replication and performance metrics.
when delivering i/o to virtual environments, performance is everything. typically, virtual environments create more random workloads, making the work of optimizing i/o workloads much harder for the array. there are techniques that can be employed to ensure performance is delivered optimally, including: Wide striping. this involves spreading i/o across as many physical disk spindles as possible. wide striping can be achieved by using large rAiD groups (being mindful of rebuild times for disk failures) or by concatenating rAiD groups into storage pools. this technique is applicable to both file- and block-based storage platforms. Dynamic tiering. like any storage environment, virtual servers will have i/o hotspots, data that generates a large proportion of the i/o workload. hotspot areas can be difficult to predict, so platforms that offer dynamic tiering provide an automated way to ensure the hottest data stays on the fastest disk. this technique is particularly useful where virtual machines have been cloned from a single master image.
25
Storage
sures that disk space is consumed only by data thats written to the disk by the host, rather than reserving a fixed image for each vm. the feature can be implemented in the hypervisor and is a common option with most storage platforms.
A decade of storage
onfigure storage for performance, then capacity. C se performance-enhancing features such as wide striping. u
eploy storage that uses the hardware acceleration via Apis. D Dont use traditional backup methods for virtual machines; look for snapshots and third-party software that use hypervisor backup Apis. se vendor plug-ins to management frameworks such u as vmware vcenter and microsoft System center virtual machine manager (Scvmm). onsider custom solutions. C ove toward automation. M
26
Storage
plug-ins for virtualization management tools, there are a number of storage and third-party vendor tools to consider. Some examples include: MC Corp. Virtual Storage Integrator (VSI). this vmware E vcenter plug-in from emc provides a rich range of storage information directly within the vcenter console and can integrate with citrix XenDesktop in virtual desktop infrastructure (vDi) environments. Wave Software LLC Storage Automator. iwaves Storage i Automator allows policy-based deployment of storage for virtual environments, managing the workflow processes of delivering virtual servers in public cloud-type environments. etApp Inc. SANscreen VM Insight. netApps SAnscreen N platform provides storage visualization, with integration into vmware virtualcenter, operating on netApp or heterogeneous storage configurations. olarWinds Storage Manager. Solarwinds has a variety S of tools that both visualize and help in the optimization of storage for virtual environments, including virtual machine to storage mapping.
27
Storage
AutomAte it
managing dynamically changing virtual environments to optimize capacity and performance can be a time-consuming process. As virtual environments scale and mature, theres a need to move toward more automation of manual optimization processes. hypervisor vendors are starting to include capabilities in their products that allow some of these features to be semi-automated, reducing the onus on the administrator to continually tune the storage environment. in vSphere 5, vmware introduced Storage Distributed resource Scheduler (SDrS), which provides some As virtual environments degree of automation of scale and mature, theres storage allocations. SDrS a need to move toward provides automated initial placement of vmDks, more automation of automated migrations of manual optimization virtual machines to meet processes. capacity and performance goals, as well as affinity rules, ensuring, for example, that high i/o virtual machines are placed on separate hardware. the move to more automated storage management will be an absolute requirement as virtual infrastructures scale and become more service orientated in their delivery. Already, storage vendors are coming to the market with new products that provide provisioning Apis to hook directly into virtual server automation.
A decade of storage
28
Storage
A decade of storage
the backup and restore process. in block storage deployments, traditional backups use the host itself to back up data. this is because the storage array has no awareness of the format of data on a lun. the host places the file system onto the lun, so the backup software relies on the host to provide a stream of files for backup. on all virtual platforms, a vm is stored as a file or series of files, even when using block-based storage arrays. this makes the backup process easier, as backups can be taken simply by taking a copy of the files that make up the virtual machine. Some hypervisor vendors, such as vmware, offer Apis that allow third-party software to view changed block data within the virtual machine itself, providing a highly efficient way of backing up only those files that have changed since the last backup was taken. All hypervisor vendors also provide the ability to snapshot virtual machines. Although this results in a crash consistent copy, in some instances, with agent software, the snapshots can be coordinated with quiescing the host file system to allow consistent snapshots to be taken.
29
Data Deduplication:
Fad, xture... or just a nice feature?
Find out the benets, drawbacks and functions of this technology with our Top 10 Tips on Data Deduplication: www.SearchStorage.com/Data_Deduplication
A decade of storage
big data
By erIC sLACk
storage for
Big data analytics will place new burdens on storage systems. here are some of the key features those systems will need to meet the challenges of big data.
Big DAtA reFerS to data sets that are too large to be captured, handled, analyzed or stored in an appropriate timeframe using traditional infrastructures. Big is, of course, a term relative to the size of the organization and, more importantly, to the scope of the it infrastructure thats in place. Big data also infers analysis, driven by the expectation that theres value in all the information businesses are accumulatingif there was just a way to pull that value out. perhaps it follows from the notion that storage capacity is cheap, but in the effort to cull business intelligence from the
31
Storage
mountains of new data created every day, organizations are saving more of it. theyre also saving data thats already been analyzed, which could potentially be used for trending against future data collections.
A decade of storage
32
Storage
A decade of storage
ments ultimately drive hardware functionality and, in this case, big data analytics are impacting the development of data storage infrastructures. this could mean an opportunity for storage and it infrastructure companies. As data sets continue to grow with both structured and unstructured data, and analysis of that data gets more diverse, current storage system designs will be less able to meet the needs of big data. Storage vendors have begun to respond with block- and file-based systems designed to accommodate many of these requirements. heres a listing of some of the Big data has outgrown characteristics big data its own infrastructure storage infrastructures and its driving the need to incorporate to meet the challenges development of storage, presented by big data. networking and compute
Capacity. Big often
translates into petabytes handle these specific of data, so big data stornew challenges. age systems certainly need to be able to scale. But they also need to scale easily, adding capacity in modules or arrays transparently to users, or at least without taking the system down. Scale-out storage is becoming a popular alternative for this use case. Scale-outs clustered architecture features nodes of storage capacity with embedded processing power and connectivity that can grow seamlessly, avoiding the silos of storage that traditional systems can create. Big data also means a large number of files. managing the accumulation of metadata for file systems at this level can reduce scalability and impact performance, a situation that can be a problem for traditional nAS systems. object-based storage architectures, on the other hand, can allow big data storage systems to
systems designed to
33
Storage
expand file counts into the billions without suffering the overhead problems that traditional file systems encounter. object-based storage systems can also scale geographically, enabling large infrastructures to be spread across multiple locations.
Latency. Big data may also have a real-time component, espe-
A decade of storage
cially in use cases involving web transactions or finance. For example, tailoring web advertising to each users browsing history requires real-time analytics. Storage systems must be able grow to the aforementioned proportions while maintaining performance because latency can produce stale data. here, too, scale-out architectures enable the cluster of storage nodes to increase in processing power and connectivity as they grow in capacity. object-based storage systems can parallelize data streams, further improving throughput. many big data environments will need to provide high iopS performance, such as those in high-performance computing (hpc) environments. Server virtualization will drive high iopS requirements, just as it does in traditional it environments. to meet these challenges, solid-state storage devices can be implemented in many different formats, from a simple server-based cache to all-flashbased scalable storage systems.
access. As companies get better at understanding the potential
of big data analysis, the need to compare differing data sets will bring more people into the data sharing loop. in the quest to create business value, firms are looking at more ways to cross-reference different data objects from various platforms. Storage infrastructures that include global file systems can help address this issue, as they allow multiple users on multiple hosts to access files from many different back-end storage systems in multiple locations.
security. Financial data, medical information and government
intelligence carry their own security standards and requirements. while these may not be different from what current it managers
34
Storage
must accommodate, big data analytics may need to cross-reference data that may not have been co-mingled in the past, which may create some new security considerations.
Cost. Big can also mean expensive. And at the scale many
A decade of storage
organizations are operating their big data environments, cost containment will be an imperative. this means more Many big data storage efficiency within the box, as well as less expensive systems will include components. Storage dean archive component, duplication has already especially for those entered the primary stororganizations dealing age market and, dependwith historical trending ing on the data types involved, could bring some or long-term retention value for big data storage requirements. systems. the ability to reduce capacity consumption on the back end, even by a few percentage points, can provide a significant return on investment as data sets grow. thin provisioning, snapshots and clones may also provide some efficiencies depending on the data types involved. many big data storage systems will include an archive component, especially for those organizations dealing with historical trending or long-term retention requirements. tape is still the most economical storage medium from a capacity/dollar standpoint, and archive systems that support multiterabyte cartridges are becoming the de facto standard in many of these environments. what may have the biggest impact on cost containment is the use of commodity hardware. its clear that big data infrastructures wont be able to rely on the big iron enterprises have traditionally turned to. many of the first and largest big data users have developed their own white-box systems that leverage a commodity-
35
Storage
oriented, cost-saving strategy. But more storage products are now coming out in the form of software that can be installed on existing systems or common, off-the-shelf hardware. in addition, many of these companies are selling their software technologies as commodity appliances or partnering with hardware manufacturers to produce similar offerings.
persistence. many big data applications involve regulatory
A decade of storage
compliance that dictates data be saved for years or decades. medical information is often saved for the life of the patient. Financial information is typically saved for seven years. But big data users are also saving data longer because its part of an historical record or used for time-based analysis. this requirement for longevity means storage manufacturers need to include on-going integrity checks and other long-term reliability features, as well as address the need for data-in-place upgrades.
Flexibility. Because big data storage infrastructures usually get
value creation from big data is internet and social media. companies like Facebook are mining their users personal preferences and creating profiles that can show advertisers which products theyre most interested in, thus increasing ad revenue. not to be outdone, google recently announced a change in its privacy policies, informing the public that it will begin more aggressive cross-referencing of the data collected within its businesses. essentially, google is going to use data from google searches, gmail accounts, Android users activity, plus data from youtube and its other websites to gain insights into user behavior. this translates into improved advertising effectiveness and increased revenue.
an example of
36
Storage
very large, care must be taken in their design so they can grow and evolve along with the analytics component of the mission. Data migration is essentially a thing of the past in the big data world, especially since data may be in multiple locations. A big data storage infrastructure is essentially fixed once you begin to fill it, so it must be able to accommodate different use cases and data scenarios as it evolves.
application awareness. Some of the first big data implemenA decade of storage
tations involved application-specific infrastructures, such as systems developed for government projects or the white-box systems invented by large internet services companies. Application awareness is becoming more common in mainstream storage systems as a way to improve efficiency or performance, and its a technology that should apply to big data environments.
smaller users. As a business requirement, big data will trickle
down to organizations that are much smaller than what some storage infrastructure marketing departments may associate with big data analytics. its not only for the lunatic fringe or oddball use cases anymore, so storage vendors playing in the big data space would do well to provide smaller configurations while focusing on the cost requirements. n
Eric Slack is a senior analyst at Storage Switzerland.
37
Memorizing RAID level definitions and knowing which level does what can be:
Confusing Hard to Remember Useful All of the above
So how much do you think you know about RAID? Find Out For Yourself and Test Your Knowledge with Our Exclusive RAID Quiz! And dont forget to bookmark this page for future RAID-level reference.
Storage
A decade of storage
The oldest cloud storage services have matured into a variety of data protection offerings that can meet the needs of most enterprises.
Get your data out of the building is the best advice i can give
about data protection, and its the part of the process many organizations still struggle with. too many companies fall short when a crisis hits because they were either still planning or had made their data protection solution so convoluted that it failed when they tried to put it into use.
bAckup As A service
with all the talk about the cloud, cloud backupor backup as a service (BaaS)seems like a natural solution to the problem of getting data off-site. it may be true in some cases, but you need to be very clear on a few key points:
39
Storage
A decade of storage
40
Storage
A decade of storage
or from the cloud. in some cases, that third tier is maintained at the vendors data center(s); in others, the application uses a public storage cloud infrastructure, such as Amazon web Services (AwS). For many, this is an attractive alternative. if your current backup app already supports cloud-based storage as a media layer, then all your agents stay as they are. your existing backup server(s) remain in place to perform fast recoveries from local storage and you now have an additional copy of your data out of the building. (read our feature on integrated cloud backup.)
If your data truly coexists in two locations, such as the iScSiextending technology offered by riverbed with its granite product, you already have data at both a branch and a data center using storage that enables point-in-time recoveries.
If your data is based in the cloud but lives locally, such as with
gateway appliances like those from nasuni, StorSimple and other vendors, where local site filers are synchronized using
41
Storage
Amazon Simple Storage Service (S3) or other cloud storage services, each site has native resiliency with disaster recovery as easy as spinning up a clean virtual machine and remounting the cloud-based lun. you could also use an on-premises backup solution of your on-premises filer for additional copies. however you decide to leverage cloud backup services, the goal is still the same: get that data out of the building. n
A decade of storage
Jason Buffington is a senior analyst for data protection at Milford, Mass.-based Enterprise Strategy Group (ESG). He has more than 20 years of experience in the IT industry.
42
Storage
A decade of storage
With hard disk drive prices rising and some models tough to find, there are steps to take to reduce your dependence on hard drives while gaining other benefits along the way.
in thailand has many information technology (it) people and vendors worried: Should they just pay the higher prices to get the drives they need, or turn to alternatives such as solid-state drives (SSDs)? necessity being the mother of invention, i believe this is a great opportunity for it to step back, take a holistic look and perhaps solve several problems at once. of course, some needs cant wait, so paying heavy premiums may be your only recourse. But if your needs arent that urgent, consider the following:
optimize and archive. Are you currently using primary stor-
age capacity optimization tools that help you move data from primary storage to an archive platform such as tape or secondary disk? effective archiving can ease the burden on your primary disk and help defer new hDD purchases. As an added bonus, applications will likely run faster because of lower capacity, and backups will be faster with less data to back up. And you wont continue to back up the same unused data over and over, saving space on whatever backup media you use.
time to dedupe. on the backup side, if you havent started us-
43
Storage
products can reduce disk requirements by a factor of 20 or more. if you only back up to tape right now, dedupe may help shorten backup windows, improve backup reliability and provide faster, more reliable recoveries, but this may mean buying additional disk. if youre already backing up to disk, deduplication is a must. if youre currently using deduplication, make sure its used with all backup sets.
primary data reduction. primary storage data reduction usA decade of storage
ing tools such as compression and deduplication can reduce disk capacities by a ratio in the range of 3-to-1 to 5-to-1, depending on data types. however, im reluctant to overemphasize data reduction, as the products are still new in the marketplace (except for iBms real-time compression [rtc] and Dells ocarina-based products) and some may impact performance. you should either apply them post process, as in the case of netApps built-in data reduction, or do them inline as with iBm rtc. you dont want to trade application performance for capacity savings. Smart archiving should be a higher priority.
think thin provisioning. if you havent taken advantage of thin
provisioning in your storage arrays, this is a good time to turn it on. most popular arrays offer this functionality and many offer a way to convert thick logical unit numbers (luns) to thin luns online and totally nondisruptively. this same technology also applies to clones. See if your current array supports this functionality and thin those luns out.
trim snapshots. its also a good idea to do a check on how
many snapshots you keep. you might find that paring them down has no effect on the level of protection an app needs and you can regain some disk space. of course, you should make sure the snapshot technology you use generates space-efficient snapshots.
use the cloud. if youve been thinking about moving some data
44
Storage
A decade of storage
archiving and disaster recovery-type applications for the initial move of data to the cloud. But you should consider primary storage, too. there are vendors like StorSimple who offer excellent tier 2 primary storage, with cloud as the back end. in effect, they make cloud storage look and behave like primary storage, using caching, SSDs, deduplication, wide-area network (wAn) optimization and other technologies. with these cloud-integrated storage solutions, you effectively shift the hDD shortage problem to the cloud provider.
solid-state options. this is a great time to figure out what
your SSD strategy should be. with SSDs, you can reduce hDD purchases while accelerating applications that are starved for i/o. But you need to develop an SSD strategy rather than just look at it as a hard disk drive alternative. not all applications need SSDs, and not all SSDs are alike, so some research is required to understand which applications would benefit and what type of flash product would be appropriate. A server-based pci express (pcie) card can provide storage and pump up the performance of a single physical server. SSDs can be placed in the hDD slots in servers or arrays, and intelligent software can automatically move data from flash to the hard disk drive and back, depending on its heat factor. there are many solid-state drive options and while they may be good replacements for hard disks, some planning and a strategy are required. if you look around your shop, im sure you can come up with other best practices that may help reduce your companys hard disk requirements. with the storage industry in a jam relative to hDDs, it could be a great time to rise above the panic and take steps that are strategic and consistent with good storage management. n
arun Taneja is founder and president at Taneja Group, an analyst and consulting group focused on storage and storage-centric server technologies.
45
snapshot
Storage
on average, what is the rate of utilization (based on capacity) of your companys nAs systems?
Whats the main reason why your company hasnt deployed any networkattached storage (nAs) systems yet? 23% 20% 17% 17% 13% 10% Standalone file servers sufficient Dont need networked storage will install this year we have a SAn will install multiprotocol this year evaluating
16% 12%
10% 36%
More than 75% 51% to 75% 26% to 50% Less than 25%
36%
User shares Non-mission critical apps Hosting virtual servers Mission-critical apps Backup Other
0%
*Multiple selections permitted
7% 5%
20 40
86%
32
Companies running their nAs systems on 10 Gbps ethernet networks
60
80
100
We are a small shop and use a low-cost virtual san . . . not high performance, but cheap, reliable Survey respondent and fast enough.
46
storage magazine vice President of editorial Mark Schlack editorial Director Rich Castagna senior Managing editor Kim Hefner executive editor Ellen OBrien
searchCloudstorage.com executive editor Ellen OBrien Assistant site editor Rachel Kossman searchDataBackup.com searchDisasterrecovery.com searchsMBstorage.com searchsolidstatestorage.com senior site editor Andrew Burton Managing editor Ed Hannan Assistant site editor John Hilliard Features Writer Todd Erickson searchvirtualstorage.com searchstorageChannel.com senior site editor Sue Troy Assistant site editor Francesca Sales
searchstorage.co.uk
senior site editor Sue Troy uk Bureau Chief Antony Adshead Assistant site editor Francesca Sales storage Decisions TechTarget Conferences Director of editorial events Lindsay Jeanloz editorial events Manager Jacquelyn Hinds
A decade of storage
Contributing editors James Damoulakis, Steve Duplessie, Jacob Gsoedl searchstorage.com executive editor Ellen OBrien senior news Director Dave Raffo senior news Writer Sonia R. Lelii senior Writer Carol Sliwa senior Managing editor Kim Hefner
47