Вы находитесь на странице: 1из 26

Dalit Naor, IBM Haifa Research

Introduction to Cloud :
Cloud and Cloud Storage
Lecture 2
Dr. Dalit Naor
IBM Haifa Research
Storage Systems

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Content
 What is the Cloud paradigm
 Cloud principles and virtualization
 Cloud Storage and cost models
 How is it done?
Cloud-based file systems
Cloud object stores

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

What is a cloud and why is it interesting?


Cloud computing is a model for enabling convenient, on-demand
network access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications, and
services) that can be rapidly provisioned and released with
minimal management effort or service provider interaction.
US National Institute of Standards and Technology, Information Technology
Laboratory

Key features of cloud:


 On-demand
 Shared
 Automated
 Network access

Benefits of cloud:
 Speed and Agility
 Cost Savings
Economies of scale, utilization
improvement and standardization

 Pay-as-you-go for usage

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Common Themes in all Definitions


 Infrastructure as-a-service
 Pay per use model: utility computing
 Scale/Elasticity
Scale up and scale down (!!)

 Easy of use, management


Highly automated management of resource pools

 Lower Cost thru Economy of scale

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Public and Private Clouds:


Business and Operational Models
 Public Cloud

Owned and operated by companies that offer computing resources to others


Used as pay-as-you-go
No need to own hardware, software -> OPEX vs CAPEX
Examples: Amazon Web Service, IBM SoftLayer, Microsoft Azure, Google AppEngine,

 Private Cloud
Owned and operated by a single company for its internal use
Internal datacenters
Taking advantage of clouds efficiencies, such as elasticity, virtualization, cost,..

 Hybrid Cloud the reality !


Uses a private cloud foundation combined with public cloud services.
Uses public for some type of IT services, and standard legacy IT for mission critical
applications
Supports an evolution model , legacy

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Infrastructure, Platform and Software as a Service

Source: R. Paul Singh's Blog

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Virtualization
 Abstraction of the physical layers and resources
Widely exists in computer systems
Memory, Storage, Compute, Networking

 Virtual machines
Back to IBMs mainframe, IBM AIX/Power systems
Revolution: X86 virtualization
- VMWare
- Linux KVM, Xen

 Virtual machines technology is the enablement for cloud computing


 Note: There are also bare-metal clouds, so Cloud holds even without
virtual machine technology

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Different cloud workloads need different classes of storage


 High-performance, co-located
storage for XaaS

E.g. Amazon EBS,


Openstack NOVA

Blocks/file to support compute

 General purpose data center NAS


extension
Files

 Fixed content depot

E.g. Amazon S3,


Openstack Swift

Objects

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Cloud (Online) Storage


 Networked online storage
 Data is stored in virtualized pools of
storage
may span across multiple data centers

 Typically hosted by a third party


 Customers use to store files or data
objects.
 Cloud Object Storage protocols

WAN (Cloud)
Web based HTTP protocol
Put/Get operations, for fixed content
Enables new extensions: integrity, dedup.

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

Doctor

Patient

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Source: http://aws.amazon.com/s3/
10

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Costs
 Cost is typically a combination of
Used Capacity
Network data transfer
Number of requests

 E.g. Storage pricing for Amazon S3

11

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Storage pricing for Amazon S3 - http://aws.amazon.com/s3/

12

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Storage pricing for Amazon S3 - http://aws.amazon.com/s3/

13

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Collect, Store, Organize, Analyze


 Data TCO Total Cost of Ownership

Usable vs Raw capacity


Redundancy and durability levels, e.g. AWS regions and availability zones
Fixed costs: administrators, power, floor space
Optimization : tiering, backups
Security

 if we have a 2 Terabyte model it would cost $155.65 per month in US-West and USEast standard Reduced Redundancy Storage on Amazon S3 storage. At that point, you
may as well treat yourself to the standard storage option which would run $194.56 per
month for the same 2 Terabytes. Over three years, that is over $7,000 to keep 2
Terabytes in the public storage cloud. Most on-premise storage systems would cost
less, but in the disaster recovery use case the abstraction that cloud storage brings is
priceless. But how much power, cooling, and operational expense would be avoided
? How to determine if cloud storage is a cost savings, TechRepublic, March 4, 2013
,
14

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Today, Storage on the cloud is prevailing.


Example: Five best storage cloud providers, June 2013

Free cloud storage is easy to


come by these daysanyone
can give it out, and anyone can
give out lots of it. However, the
best cloud storage providers
give you more than just storage.

http://lifehacker.com/five-best-cloud-storage-providers-614393607
15

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

But Limitations
 Data lock-in
 Security

Multi tenancy
Secure delete
Data confidentiality and auditability
How vulnerable is the cloud infrastructure

 Service Level Agreement - SLAs


 Cost is it REALY cheaper?

16

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

How is it done? the Internals


 Cloud File Systems
 Cloud Object Stores

17

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Scalable File Systems


 Different design points than traditional file systems

New architecture
New , relaxed , protocols and systems operations (I/O and management)
New solutions for resiliency and high availability based on replication, e.g. not RAID
Support for computation
Designed for new workloads: large streaming, sequential Writes or Analytics.

 Assumptions
Based on commodity hardware
Components always fail
- Need self monitoring to detect, tolerate, and recover from failures
Optimized for large files

 Results
No POSIX API
Each chunk is replicated d times (a typical value for d==3)
Smart placement of chunks
Scribed from: Clouddbms2011.pdf

18

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Examples
 Hadoop File System
(HDFS, Yahoo) - 2009

Source: The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur
19

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

HDFS Architecture

20

Source: NextGen Infrastructure for Big Data, IMEX Research


http://imexresearch.com/big_data_infrastructure.pdf

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Examples
 Google File System
(GFS) 2002

Source: http://en.wikipedia.org/wiki/File:GoogleFileSystemGFS.svg
21

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Cloud Object Storage Openstack / Swift


 RESTful APIs
 Swift storage:
http://swift.company.com/v1/account
/container/object
 Get/Put/Delete

Source: Swiftstack documentation http://swiftstack.com/openstack-swift/architecture/


22

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

OpenStack Swift architecture/installation

Source: http://docs.openstack.org/grizzly/openstack-compute/install/apt/content/example-object-storage-installationarchitecture.html
23

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Cloud Object Storage Openstack / Swift

 Building Blocks
Proxy Servers: Handles all incoming API
requests.
Rings: Map logical names of data to locations
on particular disks.
Zones: failure domains
Storage Nodes

Source: Swiftstack documentation http://swiftstack.com/openstack-swift/architecture/


24

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Cloud Object Storage Openstack / Swift


 Data model
Accounts: tenants
Containers: sets of objects
Objects: The data itself, mapped to files on
the local file system
Partitions/Containers : Manage locations
where data lives in the cluster.

 Replication
Everything is stored three times (by default)
Upon a disk failure, the data is replication to
other zones, ensuring three copies

Source : Swiftstack documentation http://swiftstack.com/openstack-swift/architecture/


25

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Dalit Naor, IBM Haifa Research

Summary
 Cloud is a paradigm shift
 Cloud Storage is prevailing
 Cloud Storage requires new storage architectures, e.g.
Cloud file systems
Cloud object stores

26

Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University

http://www.eng.tau.ac.il/semcom

Вам также может понравиться