Dzone Apache Hadoop Deployment

Interested in learning more
about Apache Hadoop?

Cloudera offers comprehensive Apache Hadoop
training and certification. We offer live public
Hadoop training sessions and certification exams
regularly around the globe. We also provide private
group on-site Hadoop training sessions.
Upcoming Classes
Hadoop Training for System Administrators
Redwood City, CA - Feb 17-18
Hadoop Training for Developers
NYC - Feb 22-24
Chicago - Feb 28-Mar 2
Hadoop Training for System Administrators
Chicago - Mar 3-4
Seattle - Mar 7-9
Analyzing Data with Hive & Pig
Redwood City, CA - Mar 8-9
For a full list of scheduled training visit:
www.cloudera.com/info/training
brought to you by..
#133
Get More Refcardz! Visit refcardz.com
CONTENTS INCLUDE:
n
Introduction Apache Hadoop Deployment:
Which Hadoop Distribution?
A Blueprint for Reliable Distributed Computing
n
n
Apache Hadoop Installation
n
Hadoop Monitoring Ports
n
Apache Hadoop Production Deployment By Eugene Ciurana
n
Hot Tips and more...
Minimum Prerequisites
INTRODUCTION
• Java 1.6 from Oracle, version 1.6 update 8 or later; identify
your current JAVA_HOME
This Refcard presents a basic blueprint for deploying
• sshd and ssh for managing Hadoop daemons across
Apache Hadoop HDFS and MapReduce in development and
multiple systems
production environments. Check out Refcard #117, Getting
Started with Apache Hadoop, for basic terminology and for an • rsync for file and directory synchronization across the nodes
overview of the tools available in the Hadoop Project. in the cluster
• Create a service account for user hadoop where $HOME=/
WHICH HADOOP DISTRIBUTION? home/hadoop
Apache Hadoop is a scalable framework for implementing SSH Access

reliable and scalable computational networks. This Refcard Every system in a Hadoop deployment must provide SSH
presents how to deploy and use development and production access for data exchange between nodes. Log in to the node
computational networks. HDFS, MapReduce, and Pig are the as the Hadoop user and run the commands in Listing 1 to
foundational tools for developing Hadoop applications. validate or create the required SSH configuration.
There are two basic Hadoop distributions: Listing 1 - Hadoop SSH Prerequisits
• Apache Hadoop is the main open-source, bleeding-edge keyFile=$HOME/.ssh/id_rsa.pub
distribution from the Apache foundation. pKeyFile=$HOME/.ssh/id_rsa
authKeys=$HOME/.ssh/authorized_keys
• The Cloudera Distribution for Apache Hadoop (CDH) is an if ! ssh localhost -C true ; then \
open-source, enterprise-class distribution for production- if [ ! -e “$keyFile” ]; then \
ready environments. ssh-keygen -t rsa -b 2048 -P ‘’ \
-f “$pKeyFile”; \
The decision of using one or the other distributions depends fi; \
on the organization’s desired objective. cat “$keyFile” >> “$authKeys”; \
• The Apache distribution is fine for experimental learning chmod 0640 “$authKeys”; \
echo “Hadoop SSH configured”; \
exercises and for becoming familiar with how Hadoop is else echo “Hadoop SSH OK”; fi
put together.
The public key for this example is left blank. If this were to run
• CDH removes the guesswork and offers an almost turnkey
on a public network it could be a security hole. Distribute the
product for robustness and stability; it also offers some
public key from the master node to all other nodes for data
tools not available in the Apache distribution.
exchange. All nodes are assumed to run in a secure network
behind the firewall.
Cloudera offers professional services and puts
out an enterprise distribution of Apache
Apache Hadoop Deployment
Hot
Tip
Hadoop. Their toolset complements Apache’s.
Documentation about Cloudera’s CDH is available
Find out how Cloudera’s
from http://docs.cloudera.com. Distribution for Apache
The Apache Hadoop distribution assumes that the person Hadoop makes it easier
installing it is comfortable with configuring a system manually.
CDH, on the other hand, is designed as a drop-in component for to run Hadoop in your
all major Linux distributions.
enterprise.
Linux is the supported platform for production
Hot systems. Windows is adequate but is not www.cloudera.com/downloads/
Tip supported as a development platform.
Comprehensive Apache
Hadoop Training and
Certification
DZone, Inc. | www.dzone.com

3 Apache Hadoop Deployment: A Blueprint for Reliable Distributed Computing
All the bash shell commands in this Refcard are APACHE HADOOP INSTALLATION
Hot available for cutting and pasting from:
Tip http://ciurana.eu/DeployingHadoopDZone This Refcard is a reference for development and production
deployment of the components shown in Figure 1. It includes
the components available in the basic Hadoop distribution and
Enterprise: CDH Prerequisites the enhancements that Cloudera released.
Cloudera simplified the installation process by offering packages
for Ubuntu Server and Red Hat Linux distributions.

" !#
CDH packages have names like CDH2, CDH3, and

Hot so on, corresponding to the CDH version. The
!
!
Tip examples here use CDH3. Use the appropriate
version for your installation.

CDH on Ubuntu Pre-Install Setup
Execute these commands as root or via sudo to add the
Cloudera repositories: Figure 1 - Hadoop Components
Listing 2 - Ubuntu Pre-Install Setup Whether the user intends to run Hadoop in
DISTRO=$(lsb_release -c | cut -f 2) non-distributed or distributed modes, it’s best
REPO=/etc/apt/sources.list.d/cloudera.list Hot to install every required component in every
echo “deb \ Tip machine in the computational network. Any
http://archive.cloudera.com/debian \
$DISTRO-cdh3 contrib” > “$REPO” computer may assume any role thereafter.
echo “deb-src \
http://archive.cloudera.com/debian \
$DISTRO-cdh3 contrib” >> “$REPO” A non-trivial, basic Hadoop installation includes at least
apt-get update
these components:
CDH on Red Hat Pre-Install Setup • Hadoop Common: the basic infrastructure necessary for
Run these commands as root or through sudo to add the yum
running all components and applications
Cloudera repository:
• HDFS: the Hadoop Distributed File System
Listing 3 - Red Hat Pre-Install Setup
• MapReduce: the framework for large data set
curl -sL http://is.gd/3ynKY7 | tee \ distributed processing
/etc/yum.repos.d/cloudera-cdh3.repo | \
awk ‘/^name/’ • Pig: an optional, high-level language for parallel
yum update yum computation and data flow
Ensure that all the pre-required software and configuration Enterprise users often chose CDH because of:
are installed on every machine intended to be a Hadoop
node. Don’t mix and match operating systems, distributions, • Flume: a distributed service for efficient large data
Hadoop, or Java versions! transfers in real-time
• Sqoop: a tool for importing relational databases into
Hadoop for Development
Hadoop clusters
• Hadoop runs as a single Java process, in non-distributed
mode, by default. This configuration is optimal for Apache Hadoop Development Deployment
development and debugging. The steps in this section must be repeated for every node in
• Hadoop also offers a pseudo-distributed mode, in which a Hadoop cluster. Downloads, installation, and configuration
every Hadoop daemon runs in a separate Java process. could be automated with shell scripts. All these steps
This configuration is optimal for development and will be are performed as the service user hadoop, defined in the
prerequisites section.
used for the examples in this guide.
http://hadoop.apache.org/common/releases.html has the latest
If you have an OS X or a Windows development version of the common tools. This guide used version 0.20.2.
Hot workstation, consider using a Linux distribution
1. Download Hadoop from a mirror and unpack it in the
Tip hosted on VirtualBox for running Hadoop. It will /home/hadoop work directory.
help prevent support or compatibility headaches.
2. Set the JAVA_HOME environment variable.
3. Set the run-time environment:
Hadoop for Production
• Production environments are deployed across a group
of machines that make the computational network.
Hadoop must be configured to run in fully distributed,
clustered mode.

Listing 4 - Set the Hadoop Runtime Environment Listing 6 - Testing the Hadoop Installation
version=0.20.2 # change if needed start-all.sh ; sleep 5
identity=”hadoop-dev” hadoop fs -put runtime/conf input
runtimeEnv=”runtime/conf/hadoop-env.sh” hadoop jar runtime/hadoop-*-examples.jar\
ln -s hadoop-”$version” runtime grep input output ‘dfs[a-z.]+’
ln -s runtime/logs .
export HADOOP_HOME=”$HOME”
cp “$runtimeEnv” “$runtimeEnv”.org
echo “export \ Hot You may ignore any warnings or errors about a
HADOOP_SLAVES=$HADOOP_HOME/slaves” \ Tip missing slaves file.
>> “$runtimeEnv”
mkdir “$HADOOP_HOME”/slaves
echo \
“export HADOOP_IDENT_STRING=$identity” >> \ • View the output files in the HDFS volume and stop the
“$runtimeEnv” Hadoop daemons to complete testing the install
echo \
“export JAVA_HOME=$JAVA_HOME” \ Listing 7 - Job Completion and Daemon Termination
>>”$runtimeEnv”
hadoop fs -cat output/*
export \
stop-all.sh
PATH=$PATH:”$HADOOP_HOME”/runtime/bin
unset version; unset identity; unset runtimeEnv
That’s it! Apache Hadoop is installed in your system and ready
Configuration for development.
Pseudo-distributed operation (each daemon runs in a separate
CDH Development Deployment
Java process) requires updates to core-site.xml, hdfs-site.xml,
CDH removes a lot of grueling work from the Hadoop
and the mapred-site.xml. These files configure the master,
installation process by offering ready-to-go packages
the file system, and the MapReduce framework and live in the
for mainstream Linux server distributions. Compare the
runtime/conf directory.
instructions in Listing 8 against the previous section. CDH
simplifies installation and configuration for huge time savings.
Listing 5 - Pseudo-Distributed Operation Config
 Listing 8 - Installing CDH
<configuration>
<property> ver=”0.20”
<name>fs.default.name</name> command=”/usr/bin/aptitude”
<value>hdfs://localhost:9000</value> if [ ! -e “$command” ];
</property> then command=”/usr/bin/yum”; fi
</configuration> “$command” install\
hadoop-”$ver”-conf-pseudo
 unset command ; unset ver
<configuration>
Leveraging some or all of the extra components in Hadoop
<property>
<name>dfs.replication</name> or CDH is another good reason for using it over the Apache
<value>1</value> version. Install Flume or Pig with the instructions in Listing 9.
</property>
</configuration> Listing 9 - Adding Optional Components
apt-get install hadoop-pig

apt-get install flume
<configuration>
apt-get install sqoop
<property>
<name>mapred.job.tracker</name>
Test the CDH Installation
<value>localhost:9001</value>
</property>
The CDH daemons are ready to be executed as services.
</configuration> There is no need to create a service account for executing
them. They can be started or stopped as any other Linux
These files are documented in the Apache Hadoop Clustering service, as shown in Listing 10.
reference, http://is.gd/E32L4s — some parameters are discussed
in this Refcard’s production deployment section. Listing 10 - Starting the CDH Daemons
Test the Hadoop Installation for s in /etc/init.d/hadoop* ; do \
“$s” start; done
Hadoop requires a formatted HDFS cluster to do its work:
CDH will create an HDFS partition when its daemons start. It’s
hadoop namenode -format
another convenience it offers over regular Hadoop. Listing 11
The HDFS volume lives on top of the standard file system. The shows how to validate the installation by:
format command will show this upon successful completion:
• Listing the HDFS module
/tmp/dfs/name has been successfully formatted. • Moving files to the HDFS volume
Start the Hadoop processes and perform these operations to • Running an example job
validate the installation:
• Validating the output
• Use the contents of runtime/conf as known input
• Use Hadoop for finding all text matches in the input
• Check the output directory to ensure it works

Listing 11 - Testing the CDH Installation Listing 12 - Fetching Daemon Metrics

hadoop fs -ls / http://localhost:50070/metrics?format=json
# run a job:
pushd /usr/lib/hadoop All the daemons expose these useful resource paths:
hadoop fs -put /etc/hadoop/conf input
hadoop fs -ls input
• /metrics - various data about the system state
hadoop jar hadoop-*-examples.jar \ • /stacks - stack traces for all threads
grep input output ‘dfs[a-z.]+’
# Validate it ran OK:
• /logs - enables fetching logs from the file system
hadoop fs -cat output/* • /logLevel - interface for setting log4j logging levels
The daemons will continue to run until the server stops. All the Each daemon type also exposes one or more resource paths
Hadoop services are available. specific to its operation. A comprehensive list is available from:
http://is.gd/MBN4qz
Monitoring the Local Installation
Use a browser to check the NameNode or the JobTracker state
through their web UI and web services interfaces. All daemons APACHE HADOOP PRODUCTION DEPLOYMENT
expose their data over HTTP. The users can chose to monitor a
node or daemon interactively using the web UI, like in Figure The fastest way to deploy a Hadoop cluster is by using the
2. Developers, monitoring tools, and system administrators can prepackaged tools in CDH. They include all the same software
use the same ports for tracking the system performance and as the Apache Hadoop distribution but are optimized to run in
state using web service calls. production servers and with tools familiar to system administrators.
Detailed guides that complement this Refcard are

Hot available from Cloudera at http://is.gd/RBWuxm
Tip and from Apache at http://is.gd/ckUpu1.

!
!

! !

"

#

! # #

!
! !
! !
! !

!
Figure 2 - NameNode status web UI # #
!

The web interface can be used for monitoring the JobTracker,
which dispatches tasks to specific nodes in a cluster, the

DataNodes, or the NameNode, which manages directory #

namespaces and file nodes in the file system.
HADOOP MONITORING PORTS Figure 3 - Hadoop Computational Network

The deployment diagram in Figure 3 describes all the
Use the information in Table 1 for configuring a development participating nodes in a computational network. The basic
workstation or production server firewall. procedure for deploying a Hadoop cluster is:
Port Service • Pick a Hadoop distribution

50030 JobTracker • Prepare a basic configuration on one node
50060 TaskTrackers
• Deploy the same pre-configured package across all
50070 NameNode
machines in the cluster
50075 DataNodes
• Configure each machine in the network according to its role
50090 Secondary NameNode
50105 Backup Node

The Apache Hadoop documentation shows this as a rather
involved process. The value-added in CDH is that most of that
Table 1 - Hadoop ports work is already in place. Role-based configuration is very easy
to accomplish. The rest of this Refcard will be based on CDH.
Plugging a Monitoring Agent
The Hadoop daemons also expose internal data over a RESTful Handling Multiple Configurations: Alternatives
interface. Automated monitoring tools like Nagios, Splunk, or Each server role will be determined by its configuration,
SOBA can use them. Listing 12 shows how to fetch a daemon’s since they will all have the same software installed. CDH
metrics as a JSON document: supports the Ubuntu and Red Hat mechanism for handling
alternative configurations.

Check the main page to learn more about Listing 16 - File System Setup
Hot alternatives. mkdir -p /data/1/dfs/nn /data/2/dfs/nn

Tip Ubuntu: man update-alternatives
mkdir -p /data/1/dfs/dn /data/2/dfs/dn \
/data/3/dfs/dn /data/4/dfs/dn
Red Hat: man alternatives mkdir -p /data/1/mapred/local \
/data/2/mapred/local
chown -R hdfs:hadoop /data/1/dfs/nn \
The Linux alternatives mechanism ensures that all files /data/2/dfs/nn /data/1/dfs/dn \
associated with a specific package are selected as a system /data/2/dfs/dn /data/3/dfs/dn \
default. This customization is where all the extra work went /data/4/dfs/dn
into CDH. The CDH installation uses alternatives to set the chown -R mapred:hadoop \
effective CDH configuration. /data/1/mapred/local \
/data/2/mapred/local
Setting Up the Production Configuration chmod -R 755 /data/1/dfs/nn \
/data/2/dfs/nn \
Listing 13 takes a basic Hadoop configuration and sets it up
/data/1/dfs/dn /data/2/dfs/dn \
for production. /data/3/dfs/dn /data/4/dfs/dn
chmod -R 755 /data/1/mapred/local \
Listing 13 - Set the Production Configuration /data/2/mapred/local
ver=”0.20”
prodConf=”/etc/hadoop-$ver/conf.prod”
Starting the Cluster
cp -Rfv /etc/hadoop-”$ver”/conf.empty \ • Start the NameNode to make HDFS available to all nodes
“$prodConf”
• Set the MapReduce owner and permissions in the
chown hadoop:hadoop “$prodConf”
# activate the new configuration: HDFS volume
alt=”/usr/sbin/update-alternatives”
• Start the JobTracker
if [ ! -e “$alt” ]; then alt=”/usr/sbin/alternatives”; fi
“$alt” --install /etc/hadoop-”$ver”/conf \ • Start all other nodes
hadoop-”$ver”-conf “$prodConf” 50
for h in /etc/init.d/hadoop-”$ver”-*; do \ CDH daemons are defined in /etc/init.d — they can be
“$h” restart; done configured to start along with the operating system or they can
be started manually. Execute the command appropriate for each
The server will restart all the Hadoop daemons using the new node type using this example:
production configuration.
Listing 17 - Starting a Node Example

# Run this in every node
ver=0.20
for h in /etc/init.d/hadoop-”$ver”-*; do \
“$h” stop ; done
Use jobtracker, datanode, tasktracker, etc. corresponding to the

node you want to start or stop.
Refer to the Linux distribution’s documentation

Figure 4 - Hadoop Conceptual Topology Hot for information on how to start the /etc/init.d
Tip daemons with the chkconfig tool.
Readying the NameNode for Hadoop
Pick a node from the cluster to act as the NameNode
(see Figure 3). All Hadoop activity depends on having a valid
R/W file system. Format the distributed file system from the Listing 18 - Set the MapReduce Directory Up
NameNode, using user hdfs: sudo -u hdfs hadoop fs -mkdir \
/mapred/system
Listing 14 - Create a New File System sudo -u hdfs hadoop fs -chown mapred \
/mapred/system
sudo -u hdfs hadoop namenode -format
Update the Hadoop Configuration Files
Stop all the nodes to complete the file system, permissions, and
ownership configuration. Optionally, set daemons for automatic Listing 19 - Minimal HDFS Config Update
startup using rc.d. 
<property>
Listing 15 - Stop All Daemons <name>dfs.name.dir</name>
<value>/data/1/dfs/nn,/data/2/dfs/nn
# Run this in every node </value>
ver=0.20 <final>true</final>
for h in /etc/init.d/hadoop-”$ver”-*; do \ </property>
“$h” stop ;\ <property>
# Optional command for auto-start: <name>dfs.data.dir</name>
update-rc.d “$h” defaults; \ <value>
done /data/1/dfs/dn,/data/2/dfs/dn,
/data/3/dfs/dn,/data/4/dfs/dn
File System Setup </value>
Every node in the cluster must be configured with appropriate <final>true</final>
directory ownership and permissions. Execute the commands in </property>
Listing 16 in every node:

The last step consists of configuring the MapReduce nodes to

find their local working and system directories: WHAT’S NEXT?
Listing 20 - Minimal MapReduce Config Update The instructions in this Refcard result in a working development
 or production Hadoop cluster. Hadoop is a complex framework
<property> and requires attention to configure and maintain it. Review
<name>mapred.local.dir</name> the Apache Hadoop and Cloudera CDH documentation. Pay
<value> particular attention to the sections on:
/data/1/mapred/local,
/data/2/mapred/local • How to write MapReduce, Pig, or Hive applications
</value>
<final>true</final> • Multi-node cluster management with ZooKeeper
</property>
<property>
• Hadoop ETL with Sqoop and Flume
<name>mapred.systemdir</name> Happy Hadoop computing!
<value>
/mapred/system
STAYING CURRENT
</value>
<final>true</final>
</property> Do you want to know about specific projects and use cases
Start the JobTracker and all other nodes. You now have a working where Hadoop and data scalability are the hot topics? Join the
Hadoop cluster. Use the commands in Listing 11 to validate that scalability newsletter:
it’s operational. http://ciurana.eu/scalablesystems
ABOUT THE AUTHOR RECOMMENDED BOOK

Eugene Ciurana (http://eugeneciurana.eu) is Hadoop: The Definitive Guide helps you harness
the VP of Technology at Badoo.com, the largest the power of your data. Ideal for processing large
dating site worldwide, and cofounder of SOBA datasets, the Apache Hadoop framework is an
Labs, the most sophisticated public and private open-source implementation of the MapReduce
clouds management software. Eugene is also an algorithm on which Google built its empire. This
open-source evangelist who specializes in the comprehensive resource demonstrates how to
design and implementation of mission-critical, high-availability use Hadoop to build reliable, scalable, distributed systems;
systems. He recently built scalable computational networks for programmers will find details for analyzing large datasets, and
leading financial, software, insurance, SaaS, government, and administrators will learn how to set up and run Hadoop clusters.
healthcare companies in the US, Japan, Mexico, and Europe.
Publications BUY NOW
• Developing with Google App Engine, Apress
• DZone Refcard #117: Getting Started with Apache Hadoop http://oreilly.com/catalog/0636920010388/
• DZone Refcard #105: NoSQL and Data Scalability
• DZone Refcard #43: Scalability and High Availability
• The Tesla Testament: A Thriller, CIMEntertainment
Thank You!
Thanks to all the technical reviewers, especially to Pavel Dovbush
at http://dpp.su
#82
Browse our collection of over 100 Free Cheat Sheets

Get More Refcardz! Visit refcardz.com
CONTENTS INCLUDE:
■
■
About Cloud Computing
Usage Scenarios Getting Started with
Aldon Cloud#64Computing
■
Underlying Concepts
Cost
by...
■
Upcoming Refcardz
youTechnologies ®
■
Data
t toTier
brough Comply.
borate.
Platform Management and more...
■
Chan
ge. Colla By Daniel Rubio
on:
dz. com
grati s
also minimizes the need to make design changes to support
CON
ABOUT CLOUD COMPUTING one time events. TEN TS
INC
s Inte -Patternvall
■
HTML LUD E:
Basics
Automated growthHTM
ref car
Web applications have always been deployed on servers & scalable

L vs XHT technologies
nuou d AntiBy Paul M. Du

■
Valid
ation one time events, cloud ML
connected to what is now deemed the ‘cloud’. Having the capability to support
RichFaces
Usef
ContiPatterns an
■
computing platforms alsoulfacilitate

Open the gradual growth curves
Page ■
Source
Vis it
However, the demands and technology used on such servers faced by web applications.Structure Tools
Core
Key ■
Structur Elem
E: has changed substantially in recent years, especially with al Elem ents
INC LUD gration the entrance of service providers like Amazon, Google and Large scale growth scenarios involvingents
specialized
NTS and mor equipment
rdz !
ous Inte Change
HTML
CO NTE Microsoft. es e... away by
(e.g. load balancers and clusters) are all but abstracted
Continu at Every e chang
About ns to isolat
relying on a cloud computing platform’s technology.
Software i-patter
space
CSS3
■
n
Re fca
e Work
Build
riptio
and Ant These companies Desc have a Privat
are in long deployed trol repos
itory
webmana applications
ge HTM
L BAS
■
Patterns Control lop softw n-con to

■
that adapt and scale
Deve
les toto large user
a versio ing and making them
bases, In addition, several cloud computing ICSplatforms support data
ment ize merg
rn
Version e... Patte it all fi minim le HTM
Manage s and mor e Work knowledgeable in amany ine to
mainl aspects related tos multip
cloud computing. tier technologies that Lexceed the precedent set by Relational
space Comm and XHT
■
Build
re
Privat lop on that utilize HTML MLReduce,

ref c
Practice Database Systems (RDBMS): is usedMap are web service APIs,

■
Deve code lines a system
Build as thescalethe By An
sitory of work prog foundati
Ge t Mo
Repo
This Refcard active
will introduce are within
to you to cloud riente computing, with an
ION d units etc. Some platforms ram support large grapRDBMS deployments.
■
The src
dy Ha
softw
EGRAT e ine loping task-o it and Java s written in hical on of
all attribute
softwar emphasis onDeve es by
Mainl
INT these ines providers, so youComm can better understand
also rece JavaScri user interfac web develop and the rris
Vis it
OUS
Windows Azure Platform

codel chang desc
ding e code Task Level as the
www.dzone.com
NTINU s of buil control what it is a cloudnize

line Policy sourc es as aplatform
computing can offer your ut web output ive data pt. Server-s e in clien ment. the ima alt attribute ribes whe
T CO ces ion Code Orga it chang e witho likew mec ide lang t-side ge is describe re the ima
hanism. fromAND
name
the pro ject’s vers CLOUD COMPUTING PLATFORMS
subm e sourc
applications. ise
ABOU (CI) is
it and with uniqu are from
was onc use HTML
web
The eme pages and uages like Nested
unavaila s alte ge file
rd z!
a pro evel Comm the build softw um ble. rnate can be

gration ed to Task-L Label ies to
build minim UNDERLYING CONCEPTS
e and XHT rging use HTM PHP tags text that
Inte mitt a pro blem ate all activition to the
bare
standard a very loos ML as Ajax Tags
can is disp
found,
ous com to USAGE SCENARIOS Autom configurat
cies t
ization, ely-defi thei tech L
Continu ry change cannot be (and freq
nden ymen layed
tion ned lang r visual eng nologies
Re fca
Build al depe need

a solu ineffective
deplo t
Label manu d tool the same for stan but as if
(i.e., ) nmen overlap
eve , Build stalle
t, use target enviro Amazon EC2: Industry standard it has
software and uagvirtualization ine. HTM uen
with terns ns (i.e.
blem ated ce pre-in whether dards e with b></ , so <a>< tly are) nest
lar pro that become
Autom Redu ymen a> is
ory. via pat d deploEAR) in each has L
Spring Roo
reposit -patter particu s cies Pay only what you consume
tagge or Amazon’s cloud
the curr you cho platform
computing becisome
heavily based moron very little fine. b></ ed insid
lained ) and anti x” the are solution duce
nden For each (e.g. WAR es t
ent stan ose to writ more e imp a></ e
not lega each othe
b> is
be exp text to “fi
al Depeapplication deployment
Web ge until
nden a few years
t librari agonmen
t enviro was similar that will software
industry standard and virtualization apparen ortant, the
e HTM technology.
tterns to pro Minim packa
dard
Mo re
CI can ticular con used can rity all depe that all
targe
simp s will L t. HTM l, but r. Tags
es i-pa tend but to most phone services:
y Integ alize plans with late alloted resources,
file
ts with an and XHT lify all help or XHTML, Reg ardless L VS XH <a><
in a par hes som
etim s. Ant , they ctices,
Binar Centr temp your you prov TML b></
proces in the end bad pra enting
nt nmen
incurred cost
geme whether e a single based on
Creatsuchareresources were consumed
t enviro orthenot. Virtualization
muchallows MLaare physical othe
piece ofr hardware to be understand of HTML
approac ed with the cial, but, rily implem nden
cy Mana rties nt targe es to of the actually web cod ide a solid ing has
essa d to Depe prope into differe chang
utilized by multiple operating
function systems.simplerThis allows ing.resourcesfoundati job adm been arou
efi nec itting
associat to be ben
er
pare
Ge t
te builds commo
are not Fortuna
late Verifi e comm than on
ality be allocated nd for
n com Cloud computing asRun it’sremo
known etoday has changed this. expecte irably, that
befor
They they
etc.
(e.g. bandwidth, elem CPU) tohas
nmemory, exclusively totely
Temp
job has some time
Build
lts whe
ually,
appear effects. rm a
Privat y, contin nt team Every mov
entsinstances. ed to CSS
used
to be, HTML d. Earl
ed resu gration
The various resourcesPerfo consumed by webperio applications
dicall (e.g.
opme page system
individual operating bec Brow y exp . Whi
adverse unintend d Builds sitory Build r to devel common (HTM . ause ser HTM anded le it
ous Inte web dev manufacture L had very far mor has done
Stage Repo
e bandwidth, memory, CPU) areIntegtallied
ration on a per-unit CI serve basis L or XHT
produc Continu Refcard
e Build rm an ack from extensio .) All are
e.com
ML shar limit e than its

pat tern. term this
Privat
(starting from zero) by Perfo all majorated cloud
feedb computing platforms.
occur on As a user of Amazon’s
n. HTM EC2 essecloud
ntiallycomputing platform, you are
es cert result elopers
came
rs add
ed man ed layout anybody
the tion the e, as autom they based proc is
gra such wayain The late a lack of stan up with clev y competi
supp
of as
” cycl Build Send builds
assigned esso
an operating L system
files shou in theplai
same as onelemallents
hosting
ous Inte tional use and test concepts
soon n text ort.
ration ors as ion with r
Integ entat
ld not in st web dar er wor ng stan
Continu conven “build include oper
docum
be cr standar kar dar
the to the to rate devel
While s efer of CI
ion
Gene
Cloud Computing
the not
DZone, Inc.
s on
expand
ISBN-13: 978-1-936502-03-5
140 Preston Executive Dr. ISBN-10: 1-936502-03-8
Suite 100 50795
Cary, NC 27513
DZone communities deliver over 6 million pages each month to 888.678.0399
more than 3.3 million software developers, architects and decision 919.678.0300
makers. DZone offers something for everyone, including news,
Refcardz Feedback Welcome
$7.95
tutorials, cheat sheets, blogs, feature articles, source code and more.
refcardz@dzone.com 9 781936 502035
“DZone is a developer’s dream,” says PC Magazine.
Sponsorship Opportunities
Copyright © 2011 DZone, Inc. All rights reserved. No part of this publication may be reproduced, stored in a
retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, sales@dzone.com Version 1.0
without prior written permission of the publisher.

Dzone Apache Hadoop Deployment

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Dzone Apache Hadoop Deployment

Загружено:

Авторское право:

Доступные форматы

Interested in learning more

about Apache Hadoop?

Apache Hadoop is a scalable framework for implementing SSH Access

DZone, Inc. | www.dzone.com

Tip examples here use CDH3. Use the appropriate 

version for your installation.   

     

DZone, Inc. | www.dzone.com

DZone, Inc. | www.dzone.com

Listing 11 - Testing the CDH Installation Listing 12 - Fetching Daemon Metrics

Detailed guides that complement this Refcard are

DataNodes, or the NameNode, which manages directory # 

HADOOP MONITORING PORTS Figure 3 - Hadoop Computational Network

Port Service • Pick a Hadoop distribution

50105 Backup Node

DZone, Inc. | www.dzone.com

Hot alternatives. mkdir -p /data/1/dfs/nn /data/2/dfs/nn

Use jobtracker, datanode, tasktracker, etc. corresponding to the

Refer to the Linux distribution’s documentation

DZone, Inc. | www.dzone.com

The last step consists of configuring the MapReduce nodes to

ABOUT THE AUTHOR RECOMMENDED BOOK

Browse our collection of over 100 Free Cheat Sheets

Web applications have always been deployed on servers & scalable

nuou d AntiBy Paul M. Du

computing platforms alsoulfacilitate

ous Inte Change

Patterns Control lop softw n-con to

Privat lop on that utilize HTML MLReduce,

Practice Database Systems (RDBMS): is usedMap are web service APIs,

Windows Azure Platform

NTINU s of buil control what it is a cloudnize

a pro evel Comm the build softw um ble. rnate can be

Build al depe need

ML shar limit e than its

Вам также может понравиться

Tip examples here use CDH3. Use the appropriate

version for your installation.

DataNodes, or the NameNode, which manages directory #