Академический Документы
Профессиональный Документы
Культура Документы
Data Domain:
DeDuplication Types:
- File based DeDuplication
Fixed-Length Segment DeDuplication
Variable-Length Segment DeDuplication
Post-Process DeDuplication
In-Line DeDuplication
Data Domain System Introduction
DeDuplicating hardware system
Inline
Variable-length segments
Fingerprints
Controller
Processors and RAM
Ethernet and Fibre Channel connections
Storage
-Low-cost SATA disk drives
-RAID 6 in software
-NVRAM used to protect unwritten data
Data Domain DeDuplication:
# Source Based DeDuplication
Uses DD Boost with DSP(distributed segment processing)
# Target Based DeDuplication
accessible through CIFS, NFS, and VTL protocols
Data Domain Global Compression:
# Global Compression Equals to DeDuplication and cant be turned of
# Local Compression Compress data segments before writing to disk, Equals to file compressions(uses
algorithms lz, gz and gzfast) and can be turned of.
Stream-Informed Segment Layout (SISL) scaling architecture:
# SISL architecture provides fast and efficient deduplication:
99% of duplicate data segments are identified inline in RAM before they are stored to disk.
System throughput increases directly as CPU performance increases.
Minimizes the disk footprint by minimizing disk access.
The Data Domain system DeDuplication How it works:
1. Segment Data sliced into segments
2. Fingerprint Segments given fingerprint ID (segment ID)
3. Filter Fingerprint IDs compared to fingerprints in cache1.If fingerprint ID new, continue2. If fingerprint
ID duplicate, reference, then delete
4. Compress Groups of new segments compressed using common technique(lz, gz, gzfast)
5. Write Segments (including fingerprints, metadata, & logs)written to containers,containers written to
disk
Data Invulnerability Architecture:
- Data Invulnerability Architecture is an important EMC Data Domain technology that provides safe and
reliable storage.
The EMC Data Domain operating system (DD OS) is built for data protection. Its elements comprise an
architectural design whose goal is data invulnerability. There are four technologies within the Data
Invulnerability Architecture that fight data loss:
1. End-to-end verification # Verify Stripe Integrity
# Verify user data Integrity
# Verify file system metadata Integrity
2. Fault avoidance and containment
# New data never overwrites good data. (The system never puts existing data at risk.)
# There are fewer complex data structures
# The system includes non-volatile RAM (NVRAM) for fast, safe restarts
3. Continuous fault detection and healing
# periodically rechecks the integrity of the RAID stripes and container logs
# uses RAID system redundancy to heal faults
# During every read, data integrity is re-verified
# Any errors are healed as they are encountered
Using multiple Ethernet network cables, ports, and interfaces (links) in parallel, link aggregation
increases network throughput, across a LAN or LANs, until the maximum computer speed is reached.
>Link Aggregation Bonding Types:
1. Round robin Transmits packets in sequential order from first available link through the last in the
aggregated group.
2. Balanced Data sent over the interfaces as determined by the hash method you select.
3. LACP Similar to balanced except for the control protocol that communicates with the other end and
coordinates what links, within the bond, are available. It provides heartbeat fail-over.
>Link Failover Definition:
# Definition
A virtual interface may include both physical and virtual interfaces as members (called interface group
members).
# How It Works
Link failover is supported by a bonding driver on a Data Domain system. The bonding driver checks the
carrier signal on the active interface every 0.9 seconds. If the carrier signal is lost, the active interface is
changed to another standby interface. An address resolution protocol (ARP) is sent to indicate that the
data must flow to the new interface
>Manage VLAN and IP Alias:
VLAN and IP alias network interfaces are used:
# For network security
#To segregate network traffic
#To speed up network traffic
#To organize a network
How It Works:
If youre not using VLANs, you can use IP aliases. IP aliases are easy to implement and are less expensive
than VLAN, but they are not a true VLAN. For example, you must use one IP address for management
and another IP address to back up or archive data. You can combine VLANs and IP aliases.
Data Management:
>Snapshot:
Snapshot location: /data/col1/backup/
ex: /data/col1/backup/austin/.snapshot ; /data/col1/backup/scla/.snapshot
where, .snapshot is a directory
# Replication dont replicate snapshot of a volume, it has to be manually configured for replication.
>Fast Copy:
A fast copy copies files and directory trees of a source directory to a target directory on a Data Domain
system. You can use the fast copy operation to retrieve data stored in snapshots. Fastcopy takes space
(its like a clone).
>Retention Lock: Licensed feature
- Retention lock is an optional, system-licensed software feature that enables organizations to protect
their data in non-writeable and non-erasable formats for a specified length of time, up to 70 years.
Retention lock protects against:
Accidents and user errors
Malicious activity
Data which has been locked using the retention lock feature makes the data non-writeable and nonerasable. Files cannot be modified even after the retention time for the file expires. The retention period
of a retention-locked file can be extended but not reduced.
- In order for a file to become locked with the retention lock, the files access time (called atime) must
be set to a future date that is beyond the minimum retention period configured on the Data Domain
system.
The act of setting the atime is the signal to the Data Domain system to lock the file. As soon as this
value is set, the file is locked and cannot be deleted or modified before that date.
Data sanitization:
- Data sanitization is sometimes referred to as electronic shredding
With the data sanitization function, deleted files can be overwritten using a DoD/NIST compliant
algorithm and procedures
It removes any trace of deleted files with no residual remains preventing normally deleted data from
being recovered.
5 phases of sanitization:
1. Merge
2. Analysis
3. Enumeration
4. Copy
5. Zero
Data Encryption:
- Also called inline data encryption
Protects data on a Data Domain system from unauthorized access or accidental exposure
Requires software license
When data is backed-up, data enters via NFS, CIFS, VTL, DD Boost and NDMP Tape Server protocols. It is
then:
Segmented
Fingerprinted
Deduplicated (or globally compressed)
Grouped
Locally compressed
Encrypted
Important encryption at a more granular level is not possible. Once enabled all the incoming data will
be encrypted.
- Delta compression is a global compression algorithm that is applied after identity filtering. The
algorithm looks for previous similar segments using a sketch-like technique that sends only the
diference between previous and new segments.
- Delta compression reduces the amount of data to be replicated over low-bandwidth WANs by
eliminating the transfer of redundant data found with replicated deduplicated data. This feature is
typically beneficial to remote sites with lower Data Domain models
Resynchronize Recovered Data:
Resynchronization is the process of recovering (or bringing back into sync) the data between a source
and destination replication pair after a manual break in replication.
EMC DD Boost:
- EMC Data Domain Boost extends the backup optimization benefits of Data Domain deduplication
storage solutions by distributing parts of the deduplication process to the backup server or application
client. DD Boost dramatically increases throughput speeds, minimizes backup LAN load, and improves
backup server utilization.
- In a typical backup environment using in-line deduplication, client data is sent to a Data Domain system
where the data is identified in segments. These segments are identified to be unique data or duplicate
segments. If they are unique, they are compressed and written to the storage subsystem on the Data
Domain.
DD Boost Features:
- Centralized replication awareness and management Backup application well aware of replication
enabled on the DD end and easy recovery of data can be done from the data residing in failover node.
Distributed segment processing (DSP)
Advanced load balancing and failover via interface groups
DD Boost Deduplication and Distributed Segment Processing:
Steps:
1.
2.
3.
4.
5.
License as required
Create devices, pools through backup server management console
Configure backup policies and groups to use Data Domain configured devices
Configure duplicate to use Data Domain configured devices on desired Data Domain systems.
Source DD:
1.
2.
3.
4.
License DD Boost.
Enable DD Boost
Set a Data Domain local user as a DD Boost user.
Create DD Boost storage units
Replica DD:
1. License DD Boost
2. Enable DD Boost
3. Set a Data Domain local user as a DD Boost user.
4. Create DD Boost storage units .
# NetBackupconsole: Configure Data Domain systems as disk storage servers
a.Install Data Domain OST plug-in
b.Configure disk storage servers type OST
c.Create storage lifecycle policy
# Configure Data Domain systems (A and B) for Boost
a.Enable DD Boost
b.Set user
c.Create storage unit and CIFS share
# NBU Console: Configure Backup Policy
a.Create a Backup Policy
b.Apply Storage Lifecycle Policy to Backup Policy
# NBU Console: Monitor Activity for Backup and Opt. Duplication
a.Start backup policy and monitor activity
b.Monitor file replication
Enable VTL
Create a Library
Create Tapes
Import Tapes
Access groups allow clients to access only selected LUNs (media changers or virtual tape drives) on a
system. A client that is set up
for an access group can access only devices that are in its access group.
Note: Avoid making access group changes on a Data Domain system during active backup or restore
jobs. A change may cause an active job to fail. The impact of changes during active jobs depends on a
combination of backup software and host configurations.
View Access Group Information:
LUNs Tab LUN, Library, Device, In-Use Ports, Primary Ports, Secondary Ports
Initiators Tab Initiator, WWPN
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Data Domain Encryption :Encryption of Data at Rest or inline data encryption
Protects from lost/stolen, accidental expose to a lost drive, or intrusion
Requires a license
Enables data on system drives or external storage to be encrypted, while being saved and locked, before
its moved to another location
All ingested data is encrypted
Data that exists on the Data Domain before enabling encryption is not automatically encrypted but can
be later
Inline Encryption happens during the Data Domain SISL Process:
The feature can be enabled on the Encryption tab in File System shows status
Also, do not forget an Encryption passphrase when locking or unlocking file system or disabling
Encryption; do not lose your passphrase, this is imperative
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Data Domain DD860 Technical Specifications Real Size 64 TB
- Applied Backup read throughput we are getting 100 GB/Hour
Logical Capacity (Standard) 1.4 5.7 PB (*)(****)(*****)
Logical Capacity (Redundant) 7.1 28.5 PB(**)(****)(*****)
Max. Throughput (Other) 5.1 TB/hr (Maximum throughput achieved using Symantec OpenStorage and 10
Gb Ethernet)
Max. Throughput (DD Boost) 9.8 TB/hr (***)
Power Dissipation 608 W
Cooling Requirement 2 075 BTU/hr
Data Domain DD7200 Technical Specifications Real Size 96 TB
Capacity (Raw) Max. Usabe: 428 TB;
Max. Usabe w/ DD Extended Retention: 1.7 PB
Logical Capacity (Standard) 4.2 21.4 PB (*)(**)
Logical Capacity (Redundant) w/ DD Extended Retention: 17.1 85.6 PB (*)(**)
Max. Throughput (Other) 11.9 TB/hr (Maximum throughput achieved using NFS and 10 Gb Ethernet) (**)
Max. Throughput (DD Boost) 26.0 TB/hr (Maximum throughput achieved using DD Boost and 10 Gb
Ethernet)
Posted in EMC Data Domain