Вы находитесь на странице: 1из 28

Apache Hadoop for Windows Platform - CodeProject

12,256,151 members (63,204 online)

home

articles

quick answers

Sign in

discussions

features

community

help

Articles Languages Java General

Article
Browse Code
Stats
Revisions (53)

Apache Hadoop for Windows Platform


Praba Prakash, 16 Jul 2014

Add your own


alternative version

CPOL

Rate this:

4.72 (31 votes)

Alternatives
Comments (144)

Search for articles, questions, tips

Apache Hadoop 2.3 for Big Data Analytics

Check this Video for Apache Hadoop Installation in Windows

Tagged as
Java
Cloud
Dev
DBA
JSON
Application
Stats
224.5K views
58 bookmarked
Posted 16 Apr 2014

1. Introduction
2. Hadoop 2.3 for Windows 7/8/8.1 - Specifically Built for Windows x64
a. Hadoop 2.3 for Windows (112.5 MB)
Finally Github Link:
https://github.com/prabaprakash/Hadoop-2.3
Box Link:
https://app.box.com/s/11fwozokqmc1ohttt117
Google Drive Link:
https://drive.google.com/file/d/0Bz7A6rJcTjx_Q0RDT0FrU3dUTDQ/edit?usp=sharing
Dropbox Link:
https://www.dropbox.com/s/p8xsfmx9g76pn0t/hadoop-2.3.0.tar.gz
b. Pre Configured file - https://github.com/prabaprakash/Hadoop-2.3-Config/archive/master.zip
c. Java SDK / Runtime 1.6 Madatory
Download Links: http://download.oracle.com/otn/java/jdk/6u31-b05/jdk-6u31-windows-x64.exe
Reference Link: http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archivedownloads-javase6-419409.html
3. Map Reduce Jobs in Java
4. Redgate HDFS Explorer http://bigdatainstallers.azurewebsites.net/files/HDFS%20Explorer/beta/1/HDFS%20Explorer%20%20beta.application

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

5. Eclipse Plugin for Hadoop MapReduce Jobs with Simple HDFS Explorer and Code Completion Configuartion
like Visual Studio
a. Eclipse IDE - http://www.eclipse.org/downloads/download.php?
file=/technology/epp/downloads/release/kepler/SR2/eclipse-jee-kepler-SR2-win32-x86_64.zip
b. Hadoop MapReduce Plugin for Eclipse - https://github.com/winghc/hadoop2x-eclipseplugin/archive/master.zip
6. Datasets
7. Recipe Samples
Source Code: https://github.com/prabaprakash/Hadoop-Map-Reduce-Code
Documentation:
https://github.com/prabaprakash/Hadoop-Map-ReduceCode/blob/master/Recipe%20Sample/Recipe%20Documentation.docx
8. If You Need Hadoop 2.5.1 Native Built for Ubuntu 14.10
Setup: https://github.com/prabaprakash/Hadoop-2.5.1-Binary
Config: https://github.com/prabaprakash/Hadoop-2.5.1-Config-Files

1. Introduction
Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a
distributed computing environment.
It is part of the Apache project sponsored by the Apache Software Foundation.
Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of
terabytes.
Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue
operating uninterrupted in case of a node failure.
This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become
inoperative.
Hadoop was inspired by Google's MapReduce, a software framework in which an application is broken down
into numerous small parts.
Any of these parts (also called fragments or blocks) can be run on any node in the cluster. Doug Cutting,
Hadoop's creator, named the framework after his child's stuffed toy elephant.
The current Apache Hadoop ecosystem consists of the Hadoop kernel, MapReduce, the Hadoop distributed file
system (HDFS) and a number of related projects such as Apache Hive, HBase and Zookeeper.
The Hadoop framework is used by major players including Google, Yahoo and IBM, largely for applications
involving search engines and advertising. The preferred operating systems are Windows and Linux but Hadoop
can also work with BSD and OS
We aren't able to understand Apache Hadoop Framework without Interactive Sessions, so I will list some YouTube
playlists that will explain Apache Hadoop interactively/:

Playlist 1 - By Lynn Langit


http://www.youtube.com/playlist?list=PL8C3359ECF726D473

Playlist 2 - By handsonerp
http://www.youtube.com/user/handsonerp/search?query=hadoop

Playlist 3- By Eduraka!
http://www.youtube.com/playlist?list=PL9ooVrP1hQOHpJj0DW8GoQqnkbptAsqjZ

Some Ways to Install Hadoop in Windows


1. Cygwin
a. http://sundersinghc.wordpress.com/2013/04/08/running-hadoop-on-cygwin-in-windows-single-nodecluster/
b. http://bigdata.globant.com/?p=7
c. http://alans.se/blog/2010/hadoop-hbase-cygwin-windows-7-x64/#.U0bamFerMiw
2. Azure HD Insight Emulator
a. http://azure.microsoft.com/en-us/documentation/articles/hdinsight-get-started-emulator/
3. Build Hadoop for Windows
a. By Apache Doc - https://svn.apache.org/viewvc/hadoop/common/branches/branch-2/BUILDING.txt?
view=markup
b. Perfect Guide By Abhijit Ghosh from https://app.box.com/s/11fwozokqmc1ohttt117
4. HortonWorks for Windows (Hadoop 2.0) and also SandBox Images of Hadoop 2.0 for Hyper-V / Vmware /
Virtual Box

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

a. HortonWorks for Windows - http://hortonworks.com/partner/microsoft/


b. Sandbox 2.0 - http://hortonworks.com/products/hortonworks-sandbox/
5. Clodera VM
a. http://www.cloudera.com/content/support/en/downloads.html

Other Cloud Services


1. Azure HD Insight
2. Amazon Elastic Map Reduce
3. IBM Blue Mix - Hadoop Service

2. Hadoop 2.3 for Windows 7/8/8.1 - Specifically Builded


for Windows x64
I built Hadoop 2.3 for windows x64 with the help of steps provided by Abhijit Ghosh from
http://www.srccodes.com/p/article/38/build-install-configure-run-apache-hadoop-2.2.0-microsoft-windows-os
Hide Shrink

[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]

Executed tasks
--- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ hadoop-dist --Building jar: C:\hdp\hadoop-dist\target\hadoop-dist-2.3.0-javadoc.jar
-----------------------------------------------------------------------Reactor Summary:
Apache Hadoop Main ................................ SUCCESS
Apache Hadoop Project POM ......................... SUCCESS
Apache Hadoop Annotations ......................... SUCCESS
Apache Hadoop Assemblies .......................... SUCCESS
Apache Hadoop Project Dist POM .................... SUCCESS
Apache Hadoop Maven Plugins ....................... SUCCESS
Apache Hadoop MiniKDC ............................. SUCCESS
Apache Hadoop Auth ................................ SUCCESS
Apache Hadoop Auth Examples ....................... SUCCESS
Apache Hadoop Common .............................. SUCCESS
Apache Hadoop NFS ................................. SUCCESS
Apache Hadoop Common Project ...................... SUCCESS
Apache Hadoop HDFS ................................ SUCCESS
Apache Hadoop HttpFS .............................. SUCCESS
Apache Hadoop HDFS BookKeeper Journal ............. SUCCESS
Apache Hadoop HDFS-NFS ............................ SUCCESS
Apache Hadoop HDFS Project ........................ SUCCESS
hadoop-yarn ....................................... SUCCESS
hadoop-yarn-api ................................... SUCCESS
hadoop-yarn-common ................................ SUCCESS
hadoop-yarn-server ................................ SUCCESS
hadoop-yarn-server-common ......................... SUCCESS
hadoop-yarn-server-nodemanager .................... SUCCESS
hadoop-yarn-server-web-proxy ...................... SUCCESS
hadoop-yarn-server-resourcemanager ................ SUCCESS
hadoop-yarn-server-tests .......................... SUCCESS
hadoop-yarn-client ................................ SUCCESS
hadoop-yarn-applications .......................... SUCCESS
hadoop-yarn-applications-distributedshell ......... SUCCESS
hadoop-yarn-applications-unmanaged-am-launcher .... SUCCESS
hadoop-yarn-site .................................. SUCCESS
hadoop-yarn-project ............................... SUCCESS
hadoop-mapreduce-client ........................... SUCCESS
hadoop-mapreduce-client-core ...................... SUCCESS
hadoop-mapreduce-client-common .................... SUCCESS
hadoop-mapreduce-client-shuffle ................... SUCCESS
hadoop-mapreduce-client-app ....................... SUCCESS
hadoop-mapreduce-client-hs ........................ SUCCESS
hadoop-mapreduce-client-jobclient ................. SUCCESS
hadoop-mapreduce-client-hs-plugins ................ SUCCESS
Apache Hadoop MapReduce Examples .................. SUCCESS
hadoop-mapreduce .................................. SUCCESS
Apache Hadoop MapReduce Streaming ................. SUCCESS
Apache Hadoop Distributed Copy .................... SUCCESS
Apache Hadoop Archives ............................ SUCCESS
Apache Hadoop Rumen ............................... SUCCESS
Apache Hadoop Gridmix ............................. SUCCESS
Apache Hadoop Data Join ........................... SUCCESS
Apache Hadoop Extras .............................. SUCCESS
Apache Hadoop Pipes ............................... SUCCESS
Apache Hadoop OpenStack support ................... SUCCESS
Apache Hadoop Client .............................. SUCCESS
Apache Hadoop Mini-Cluster ........................ SUCCESS
Apache Hadoop Scheduler Load Simulator ............ SUCCESS
Apache Hadoop Tools Dist .......................... SUCCESS

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

[1.847s]
[3.218s]
[3.812s]
[0.522s]
[3.717s]
[6.613s]
[7.117s]
[5.104s]
[4.230s]
[3:18.829s]
[13.442s]
[0.066s]
[2:45.070s]
[40.280s]
[10.956s]
[5.037s]
[0.075s]
[0.070s]
[1:12.357s]
[46.634s]
[0.071s]
[10.907s]
[25.635s]
[4.293s]
[30.427s]
[3.817s]
[7.340s]
[0.068s]
[3.047s]
[2.346s]
[0.101s]
[4.986s]
[0.137s]
[51.554s]
[28.285s]
[3.548s]
[22.627s]
[12.972s]
[51.921s]
[2.340s]
[9.765s]
[3.397s]
[16.817s]
[37.303s]
[2.773s]
[11.225s]
[7.554s]
[3.982s]
[4.627s]
[0.080s]
[8.620s]
[8.964s]
[0.186s]
[16.472s]
[7.326s]

Copy Code

Apache Hadoop for Windows Platform - CodeProject


[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]

Apache Hadoop Tools ............................... SUCCESS [0.066s]


Apache Hadoop Distribution ........................ SUCCESS [1:09.690s]
-----------------------------------------------------------------------BUILD SUCCESS
-----------------------------------------------------------------------Total time: 17:47.469s
Finished at: Sun Mar 23 18:01:41 IST 2014
Final Memory: 131M/349M
------------------------------------------------------------------------

Step to Installation
1. Download Hadoop 2.3 for Windows (112.5 MB) from my box account https://app.box.com/s/11fwozokqmc1ohttt117
2. Also Download the configuration file from my box account - https://github.com/prabaprakash/Hadoop-2.3Config/archive/master.zip
You have these files with you

fine!
3. Open hadoop-2.3.0.tar.gz with winrar ,extract in local disk

4. Open config.rar with winrar

Open bin directory in winrar. extract yarn.cmd file into c:\hadoop-2.3.0\bin folder

Open config\etc\hadoop extract

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

1.
2.
3.
4.
5.
6.
7.

yarn-site.xml
mapred.xml
https-site.xml
hdfs-site.xml
hadoop-policy.xml
core-site.xml
capacity-scheduler.xml

to c:\hadoop-2.3.0\etc\hadoop and replace it.

5. It's mandatory, because Apache Developer build Hadoop framework using Java 1.6 so, we needed Java 1.6 sdk,
and also Java 1.6 Runtime
a. Download Java SDK 1.6.0_31
http://download.oracle.com/otn/java/jdk/6u31-b05/jdk-6u31-windows-x64.exe
Then Install It
6. Set The Environmental Variables
Control Panel\System and Security\System
Open Advanced System Settings

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

Then, add new variable " HADOOP_HOME " - value " c:\hadoop-2.3.0 "
Also add new variable " JAVA_HOME " - value " java installation path "

System Variables -> Path -> Edit


Add Hadoop bin path, Java 6 bin path -> click ok

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

7. Then Open hadoop-env.cmd in wordpad located in C:\hadoop-2.3.0\etc\hadoop\hadoop-env.cmd


Set the JAVA_HOME path in line 25! remember not JDK bin path.

8. Let Play with Apache Hadoop 2.3


a. Open cmd as adminstrator
Hide Shrink

C:\Windows\system32>cd c:\hadoop-2.3.0\bin
c:\hadoop-2.3.0\bin>hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs
run a generic filesystem user client
version
print the version
jar <jar>
run a jar file
checknative [-a|-h] check native hadoop and compression libraries availabilit
y
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent> <src>* <dest> create a hadoop archi
ve
classpath
prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog
get/set the log level for each daemon
or
CLASSNAME
run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
c:\hadoop-2.3.0\bin>cd c:\hadoop-2.3.0\bin
c:\hadoop-2.3.0\bin>hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs
run a generic filesystem user client
version
print the version
jar <jar>
run a jar file

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Copy Code

Apache Hadoop for Windows Platform - CodeProject


checknative [-a|-h]

check native hadoop and compression libraries availabilit

y
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent> <src>* <dest> create a hadoop archi
ve
classpath
daemonlog
or
CLASSNAME

prints the class path needed to get the


Hadoop jar and the required libraries
get/set the log level for each daemon
run the class named CLASSNAME

Most commands print help when invoked w/o parameters.


c:\hadoop-2.3.0\bin>hadoop namenode -format
</br>

It will create a HDFS in your system and format it.


Hide Copy Code

c:\hadoop-2.3.0\bin>cd..
c:\hadoop-2.3.0>cd sbin
c:\hadoop-2.3.0\sbin>start-dfs.cmd
c:\hadoop-2.3.0\sbin>start-yarn.cmd
starting yarn daemons

So check, whether Apache Namenode & Datanode, Apache Yarn Nodemanger & Yarn Resouce Manager is
running concurrenlty.
OK, let's go to mapreduce

3. Some Map Reduce Jobs


I had seen every where programmer begin their first mapreduce programming using simple WordCount
program.
I was bored, so let's begin with recipe's .
a. Download the Recipeitems-latest.son file ( 26 MB)
http://openrecipes.s3.amazonaws.com/recipeitems-latest.json.gz
b. Create a folder in c:\> named as hwork
Extract recipe-latest.json.gz in c:\>hwork folder. it was about 150 MB.
It contain about 1.5 Lakh of Recipe Items

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

{ "_id" : { "oid" : "5160756b96cc62079cc2db15" }, "name" : "Drop Biscuits and Sausage Gravy",


"ingredients" : "Biscuits\n3 cups All-purpose Flour\n2 Tablespoons Baking Powder\n1/2 teaspoon
Salt\n1-1/2 stick (3/4 Cup) Cold Butter, Cut Into Pieces\n1-1/4 cup Butermilk\n SAUSAGE GRAVY\n1
pound Breakfast Sausage, Hot Or Mild\n1/3 cup All-purpose Flour\n4 cups Whole Milk\n1/2 teaspoon
Seasoned Salt\n2 teaspoons Black Pepper, More To Taste", "url" :
"http://thepioneerwoman.com/cooking/2013/03/drop-biscuits-and-sausage-gravy/", "image" :
"http://static.thepioneerwoman.com/cooking/files/2013/03/bisgrav.jpg", "ts" : { "date" : 1365276011104
}, "cookTime" : "PT30M", "source" : "thepioneerwoman", "recipeYield" : "12", "datePublished" : "201303-11", "prepTime" : "PT10M", "description" : "Late Saturday afternoon, after Marlboro Man had
returned home with the soccer-playing girls, and I had returned home with the..." }
c. Downlod Gson Libray for Java to deserialize the json
https://code.google.com/p/google-gson/downloads/detail?name=google-gson-2.2.4-release.zip&can=2&q=
extract the zip files, copy all jar files and paste into C:\hadoop-2.3.0\share\hadoop\common\lib
folder.....
approximately 1.5 Lakh recipe items are there in json file , my intention is to go the number of items per
"cooktime"
PT0H20M 25
PT0H25M 24
PT0H2M 3
PT0H30M 74
PT0H34M 1
PT0H35M 31
PT0H3M 1
PT0H40M 67
PT0H45M 74
PT0H50M 52
PT0H55M 10
PT0H5M 118
PT0H6M 1
PT0H7M 1
PT0H8M 6
PT0M 80
d. Map Reduce Code
Recipe.java
Hide Shrink

import java.io.IOException;
import
import
import
import
import
import
import
import
import
import

org.apache.hadoop.conf.Configuration;
org.apache.hadoop.fs.Path;
org.apache.hadoop.io.IntWritable;
org.apache.hadoop.io.Text;
org.apache.hadoop.mapreduce.Job;
org.apache.hadoop.mapreduce.Mapper;
org.apache.hadoop.mapreduce.Reducer;
org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
org.apache.hadoop.util.GenericOptionsParser;

import com.google.gson.Gson;
public class Recipe {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
Gson gson = new Gson();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
/* StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
} */

Roo roo=gson.fromJson(value.toString(),Roo.class);
if(roo.cookTime!=null)
{
word.set(roo.cookTime);
}
else
{
word.set("none");
}
context.write(word, one);
}

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Copy Code

Apache Hadoop for Windows Platform - CodeProject


}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
/* for ( String string : otherArgs) {
System.out.println(string);
}*/
if (otherArgs.length != 2) {
System.err.println("Usage: recipe <in> <out>");
System.exit(2);
}
@SuppressWarnings("deprecation")
Job job = new Job(conf, "Recipe");
job.setJarByClass(Recipe.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
// FileInputFormat.addInputPath(job, new Path("hdfs://127.0.0.1:9000/in"));
// FileOutputFormat.setOutputPath(job, new Path("hdfs://127.0.0.1:9000/out"));
System.exit(job.waitForCompletion(true) ? 0 : 1);
// job.submit();
}
}
class Id
{
public String oid;
}

class Ts
{
public long date ;
}
class Roo
{
public Id _id ;
public String name ;
public String ingredients ;
public String url ;
public String image ;
public Ts ts ;
public String cookTime;
public String source ;
public String recipeYield ;
public String datePublished;
public String prepTime ;
public String description;
}

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

By dafault, Hadoop itself read the input file line by line and send it to
Hide Copy Code

class TokenizerMapper

In TokenizeMapper class, we deserializing the JSON string, then initalize it to Roo Class.
So by roo instantiate object, we will get cooktime, then set in to "Mapper context."
In TokenizerReduce class, count the number of items, then set into "Reducer context"
e. We need to compile it.
Create and copy Recipe.java in c:\>hwork folder, then follow the given command
Hide Copy Code

c:\Hwork>javac -classpath C:\hadoop-2.3.0\share\hadoop\common\hadoop-common-2.3.0.jar;C:\hadoop2.3.0\share\hadoop\mapreduce\hadoop-mapreduce-client-core-2.3.0.jar;C:\hadoop2.3.0\share\hadoop\common\lib\gson-2.2.4.jar;C:\hadoop-2.3.0\share\hadoop\common\lib\commons-cli1.2.jar Recipe.java

Now our mapreduce program is compiled successfully. Then we need to create a jar file because Hadoop
need jar file to run it.
To make jar, follow the below command
Hide Copy Code

C:\Hwork>jar -cvf Recipe.jar *.class


added manifest
adding: Id.class(in = 217) (out= 179)(deflated 17%)
adding: Recipe$IntSumReducer.class(in = 1726) (out= 736)(deflated 57%)
adding: Recipe$TokenizerMapper.class(in = 1887) (out= 820)(deflated 56%)
adding: Recipe.class(in = 1861) (out= 1006)(deflated 45%)
adding: Roo.class(in = 435) (out= 293)(deflated 32%)
adding: Ts.class(in = 201) (out= 168)(deflated 16%)

We are ready to run mapreduce program, but before we need to copy c:\>hwork\recipe-items.json file to
Hadoop distributed filesystem, follow the steps given below
Hide Shrink

Copy Code

c:\hadoop-2.3.0\sbin>hadoop fs -mkdir /in

c:\hadoop-2.3.0\sbin>hadoop fs -copyFromLocal c:\Hwork\recipeitems-latest.json /in


So We Copied the file from local to Hadoop Distributed File System...
Run The mapreduce ......
c:\hadoop-2.3.0\sbin>hadoop jar c:\Hwork\Recipe.jar Recipe /in /out
14/04/12 00:52:02 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/04/12 00:52:03 INFO input.FileInputFormat: Total input paths to process : 1
14/04/12 00:52:03 INFO mapreduce.JobSubmitter: number of splits:1
14/04/12 00:52:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1397243723769_0001
14/04/12 00:52:04 INFO impl.YarnClientImpl: Submitted application application_1397243723769_0001
14/04/12 00:52:04 INFO mapreduce.Job: The url to track the job:
http://OmSkathi:8088/proxy/application_1397243723769_0001/
14/04/12 00:52:04 INFO mapreduce.Job: Running job: job_1397243723769_0001
14/04/12 00:52:16 INFO mapreduce.Job: Job job_1397243723769_0001 running in uber mode : false
14/04/12 00:52:16 INFO mapreduce.Job: map 0% reduce 0%
14/04/12 00:52:26 INFO mapreduce.Job: map 100% reduce 0%
14/04/12 00:52:33 INFO mapreduce.Job: map 100% reduce 100%
14/04/12 00:52:34 INFO mapreduce.Job: Job job_1397243723769_0001 completed successfully
14/04/12 00:52:34 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=3872
FILE: Number of bytes written=180889
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=119406749
HDFS: Number of bytes written=2871
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=7383
Total time spent by all reduces in occupied slots (ms)=5121
Total time spent by all map tasks (ms)=7383
Total time spent by all reduce tasks (ms)=5121
Total vcore-seconds taken by all map tasks=7383
Total vcore-seconds taken by all reduce tasks=5121

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject


Total megabyte-seconds taken by all map tasks=7560192
Total megabyte-seconds taken by all reduce tasks=5243904
Map-Reduce Framework
Map input records=146949
Map output records=146949
Map output bytes=1387492
Map output materialized bytes=3872
Input split bytes=113
Combine input records=146949
Combine output records=293
Reduce input groups=293
Reduce shuffle bytes=3872
Reduce input records=293
Reduce output records=293
Spilled Records=586
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=70
CPU time spent (ms)=5108
Physical memory (bytes) snapshot=370135040
Virtual memory (bytes) snapshot=428552192
Total committed heap usage (bytes)=270860288
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=119406636
File Output Format Counters
Bytes Written=2871
Map Reduce Job Completed Successfully ,Now Check Output folder " /out "
c:\hadoop-2.3.0\sbin>hadoop fs -ls /out
Windows_NT-amd64-64
Found 2 items
-rw-r--r-1 PrabaKarthi supergroup
-rw-r--r-1 PrabaKarthi supergroup

0 2014-04-12 00:52 /out/_SUCCESS


2871 2014-04-12 00:52 /out/part-r-00000

Open the ouput files.have a good look , you will enjoy hadoop analytics work by Apache
c:\hadoop-2.3.0\sbin>hadoop fs -cat /out/part-r-00000
P0D
121
P1D
2
P1DT6H
1
P4DT8H
1
PT
8491
PT0H10M
56
PT0H12M
1
PT0H14M
1
PT0H15M
55
PT0H1M
1
PT0H20M
25
PT0H25M
24
PT0H2M
3
PT0H30M
74
PT0H34M
1
PT0H35M
31
PT0H3M
1
PT0H40M
67
PT0H45M
74
PT0H50M
52
PT0H55M
10
PT0H5M
118
PT0H6M
1
PT0H7M
1
PT0H8M
6
PT0M
80
PT1008H
1
PT100M
1
PT10H
102
PT10H0M
2
PT10H10M
5
PT10H15M
4
PT10H20M
1
PT10H25M
1
PT10H30M
5
PT10H35M
1
PT10H40M
1
PT10H45M
1
PT10M
9982

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

So, we done mapreduce job. Every one knows about, but I am going list the tools make your work more
easier, then before.

4. Redgate HDFS Explorer


I get bored while copying local files to Hadoop filesystem using command and also retrive the Hadoop filesystem
data using command. I got this open source software is very fun, first download it (2.5 MB)
http://bigdatainstallers.azurewebsites.net/files/HDFS%20Explorer/beta/1/HDFS%20Explorer%20-%20beta.application
a. Install it, we already copied the configuration files for Hadoop 2.3, so, our hadoop filesystem will be accessible
remotely, also using webclient in Java, C#, Python etc.

b. Open HDFS Explorer


File->Add Connection

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

Browse our Hadoop File System in Graphical File Explorer. Copy the input file from local disk and paste it in
hdfs, also copy the output form hdfs and paste it in your local disk, you can do every operation, what a
traditional file explorer will do. Enjoy with HDFS Explorer

Fine hdfs explorer is good, but I was bored writting mapreduce coding in Notepad++ without perfect
intellisene, indentation. I got eclipse plugin for hadoop mapreduce. Let's go to next topic

5. Eclipse Plugin for Hadoop MapReduce Jobs with


Simple HDFS Explorer and Auto Code Completion
Configuartion like Visual Studio
Eclipse IDE - http://www.eclipse.org/downloads/download.php?
file=/technology/epp/downloads/release/kepler/SR2/eclipse-jee-kepler-SR2-win32-x86_64.zip

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

Hadoop MapReduce Plugin for Eclipse - https://github.com/winghc/hadoop2x-eclipse-plugin/archive/master.zip


Let begin, Download the Above " Eclipse Kepler IDE " ( 250 MB ) , also Download the Hadoop MapReduce Pluign for
Eclipse (23 MB).
a. Extract the Eclipse IDE
For explanation : extract eclipse IDE in D:\>eclipse
b. Open hadoop2x-eclipse-plugin-master.zip
goto "release" directory , extract " hadoop-eclipse-kepler-plugin-2.2.0.jar " file into eclispe\plugin folder

Let Rock and Role.


c. Open Eclipse IDE (Run As Administrator)
Choose Your Own WorkPlace Location -> Click OK
Menu->Window->Open Perspective->Other->Map/Reduce

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

d. I love Visual Studio more so, I need intellisense and code formatting as like Visual Studio (somehow) for
eclipse? Some Configuration, which make work easier
Menu->Window->Preference->Java->Editor->Content Assistent->"Auto Activation"
check enable auto activation
auto activation delay(ms)
auto activation trigger for java : .(abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
auto activation triggers for javadoc:@#
Apply->Ok

e. Configure Hdfs and Map/reduce connection


Map/reduce location->new hadoop location

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

Location name : "some name", and other are same as given below in the image, don't modify because we
configured the mapreduce address, dfs address already in c:\hadoop-2.3.0\etc\hadoop folder.

Simple HDFS Explorer

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

f. File->new project->map /reduce Project->next


It showing error becasue hadoop installation folder not configured correctly

Now, browse installation directory and click apply

Click next

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

Import, usually we will do mistake here becasue Hadoop 2.3 need jdk 6 for runtime/compilation so and so

Click Add Library ->JRE System Library

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

Click installed JRE's

Add -> Standard VM

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

Browse the jdk 1.6 location and click finish


ok->ok->finish->finish

So, Hadoop 2.3 libraries are added -> good , again we got jdk 1.7 error ,we need jdk 1.6

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

Change to jdk 1.6 ->click ok

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

Finally Hadoop 2.3 Libraries a along with Jdk 1.6.

Your Eclipse is Configured Perfectly for Hadoop MapReduce Coding and Exection along with Intellisense. Let's code
1. Add new -> Recipe.java in src folder,then copy and paste the above code

2. Right click -> Recipe.java -> Runs As->Run on Hadoop

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

3. Map Reduce Job is Running

4. Job Completed

Examples
1. Hadoop : WordCount with Custom Record Reader of TextInputFormat

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

6. Datasets
1.
2.
3.
4.
5.

Large Public Datasets


Free Large datasets to experiment with Hadoop
Explain patent data set in Hadoop example
60,000+ Documented UFO Sightings With Text Descriptions And Metadata
Recipe-Items List

Reference Books
1. Hadoop Map Reduce CookBook - Srinath Perera
2. Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and
Graph
3. Hadoop: The Definitive Guide MapReduce for the Cloud

Reference Links
1.
2.
3.
4.

Searchcloudcomputing.techtarget.com
Hadoop: What it is, how it works, and what it can do
IBM: What is Hadoop?
Hadoop at Yahoo

Conclusion
I am sure, this article will be helpful for Beginners & Intermediary Programmers to Bootstrap Apache Hadoop (Big
Data Analytics Framework ) in Windows Environment.
Yours Friendly
Prabakaran.A

License
This article, along with any associated source code and files, is
licensed under The Code Project Open License (CPOL)

Share
EMAIL

About the Author


Praba Prakash

Student
India

Microsoft Student Partner (2014-2015) ,


MS Software Engineering (2011-2016),
VIT Chennai Campus - India
I am very Curious to learn Technologies.....
Curiosity is the key to Creativity
- Akio Morita
"Luck is a dividend of sweat. The more you sweat, the luckier you get"
- Ray Kroc

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Apache Hadoop for Windows Platform - CodeProject

You may also be interested in...


Analyzing some Big Data Using C#,
Azure And Apache Hadoop
Analyzing Stack Overflow Data
Dumps

Using Apache Hadoop with IBM


BigInsights to Deliver Value Quickly

Apache vhosts on Windows Azure


platform

How-To Intel IoT Code Samples:


Watering System

Hadoop For .Net & AngularJS


Developers

How-To Intel IoT Technology Code


Samples: Robot arm in C++

Comments and Discussions

You must Sign In to use this message board.

Search Comments

Profile popupsSpacing Relaxed


Layout Normal
Relaxed
Normal

50
Per page 50

Go

Update

First Prev Next

Error While Setting up Hadoop

Member 12487440

How to integrate hadoop with mongoDB.

Abhijeet Rathore

3-Apr-16 18:32

Hadoop 2.3 installation error

Member 12407397

21-Mar-16 6:57

Error while executing hadoop namnode --format

daya nidhi

29-Feb-16 5:23

Getting error while formatting namenode

Member 12165836

24-Nov-15 23:57

HOW ADD ANTHER DATANODE IN HADOOP FOR WINDOWS

Member 11528825

24-Nov-15 20:26

System Configuration

Member 11882040

3-Aug-15 2:22

Error when implementing step 3

Member 11878438

1-Aug-15 0:34

After "c:\hadoop-2.3.0\sbin&gt;hadoop jar c:\Hwork\Recipe.jar


Recipe /in /out" commend Running is not getting completed.

Member 11833982

13-Jul-15 5:02

Member 12362460

1-Mar-16 8:58

Member 12343921

12-Mar-16 15:39

simply amazing Article!

Member 11750923

11-Jul-15 21:46

excpetion on namenode and dtata while executing start-dfs

Member 11803234

30-Jun-15 4:34

don't find com/sun/tools/javac/main

Member 11794716

26-Jun-15 2:27

Config files and Error while executing namenode format

Member 11759008

11-Jun-15 4:39

Error running job from eclipse

Member 11661634

3-May-15 17:42

Re: After "c:\hadoop-2.3.0\sbin>hadoop jar c:\Hwork\Recipe.jar


Recipe /in /out" commend Running is not getting completed.
Re: After "c:\hadoop-2.3.0\sbin>hadoop jar
c:\Hwork\Recipe.jar Recipe /in /out" commend Running is not
getting completed.

Re: Error running job from eclipse


Namenode & datanode not getting up
config.rar not found

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

26-Apr-16 19:45

Member 11661634

4-May-15 4:27

Member 11601116

12-Apr-15 9:00

Apache Hadoop for Windows Platform - CodeProject


18-Mar-15 5:55

Member 11536117

Good Article

uspatel

Start-dfs--->Unable to load

Member 11410368

Re: Start-dfs--->Unable to load

3-Feb-15 0:41
28-Jan-15 22:53
1-Feb-15 18:33

Member 11410368

Re: Start-dfs--->Unable to load

19-Oct-15 17:30

Ali Shery

Re: Start-dfs--->Unable to load

6-Dec-15 8:05

Member 12189644

how to solve this elipse plugin error?

22-Jan-15 1:07

Member 11343494

Re: how to solve this elipse plugin error?

1-Feb-15 18:31

Member 11410368

Deploying Multiple Nodes

Member 11391542

21-Jan-15 2:47

error counld not find or load main class

anujjaingit

15-Jan-15 2:35

Unable to add Hadoop plugin in Kepler eclipse

Member 11374732

14-Jan-15 17:07

JOBTRACKER AND TASK TRACKER

Member 11360364

7-Jan-15 22:21

How to configure HDFS Explorer

Member 11345797

31-Dec-14 18:00

Re: How to configure HDFS Explorer

2-Jan-15 1:26

Member 11343494

Great Article!!! Getting Exception in thread "main"


java.io.IOException: No FileSystem for scheme: hdfs

escortnotice

19-Dec-14 12:56

click on eclipse plugin map/reduce perspective, nothing


heppened

Member 11309052

13-Dec-14 19:15

Re: click on eclipse plugin map/reduce perspective, nothing


heppened

Member 11343494

2-Jan-15 1:22

Member 11796773

27-Jun-15 5:54

Hi...Excellent Tutorial

Member 11288927

5-Dec-14 3:24

Excellent tutorial, just few problems when running in eclipse

Member 11273919

29-Nov-14 12:40

failed to download hadoop-2.3.0.tar.gz

Member 11143865

28-Nov-14 21:02

Mapreduce job submitted but not running

Prasanthpdp

27-Nov-14 9:54

hadoop fs -copyFromLocal c:\hwork\recipeitems-latest.json /in

dilipkumarreddy

22-Nov-14 18:31

Re: click on eclipse plugin map/reduce perspective, nothing


heppened

Re: hadoop fs -copyFromLocal c:\hwork\recipeitems-latest.json


/in
Re: hadoop fs -copyFromLocal c:\hwork\recipeitems-latest.json
/in
Re: hadoop fs -copyFromLocal c:\hwork\recipeitemslatest.json /in
Hadoop plugin for eclipse is throwing error.

24-Nov-14 2:42

Praba Prakash

24-Nov-14 22:56

dilipkumarreddy

25-Nov-14 0:16

Praba Prakash
Member 11256172

Re: Hadoop plugin for eclipse is throwing error.

22-Nov-14 16:14
24-Nov-14 0:51

Member 1753496

Re: Hadoop plugin for eclipse is throwing error.

27-Nov-14 13:58

Prasanthpdp

Re: Hadoop plugin for eclipse is throwing error.

24-Nov-14 2:46

Praba Prakash

Re: Hadoop plugin for eclipse is throwing error.

Member 11256172

26-Nov-14 10:28

Re: Hadoop plugin for eclipse is throwing error.

Member 11374732

15-Jan-15 20:19

Last Visit: 31-Dec-99 18:00 Last Update: 6-May-16 23:52


General

News

Suggestion

Question

Refresh
Bug

Answer

Joke

Praise

1 2 3 Next
Rant

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Admin

Apache Hadoop for Windows Platform - CodeProject


Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.160426.1 | Last Updated 16 Jul 2014

Layout: fixed | fluid

http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]

Article Copyright 2014 by Praba Prakash


Everything else Copyright CodeProject, 1999-2016

Вам также может понравиться