Apache Hadoop For Windows Platform

Apache Hadoop for Windows Platform - CodeProject
12,256,151 members (63,204 online)
home
articles
quick answers
Sign in
discussions
features
community
help
Articles Languages Java General
Article
Browse Code
Stats
Revisions (53)
Apache Hadoop for Windows Platform

Praba Prakash, 16 Jul 2014
Add your own

alternative version
CPOL
Rate this:
4.72 (31 votes)
Alternatives
Comments (144)
Search for articles, questions, tips
Apache Hadoop 2.3 for Big Data Analytics
Check this Video for Apache Hadoop Installation in Windows
Tagged as
Java
Cloud
Dev
DBA
JSON
Application
Stats
224.5K views
58 bookmarked
Posted 16 Apr 2014
1. Introduction
2. Hadoop 2.3 for Windows 7/8/8.1 - Specifically Built for Windows x64
a. Hadoop 2.3 for Windows (112.5 MB)
Finally Github Link:
https://github.com/prabaprakash/Hadoop-2.3
Box Link:
https://app.box.com/s/11fwozokqmc1ohttt117
Google Drive Link:
https://drive.google.com/file/d/0Bz7A6rJcTjx_Q0RDT0FrU3dUTDQ/edit?usp=sharing
Dropbox Link:
https://www.dropbox.com/s/p8xsfmx9g76pn0t/hadoop-2.3.0.tar.gz
b. Pre Configured file - https://github.com/prabaprakash/Hadoop-2.3-Config/archive/master.zip
c. Java SDK / Runtime 1.6 Madatory
Download Links: http://download.oracle.com/otn/java/jdk/6u31-b05/jdk-6u31-windows-x64.exe
Reference Link: http://www.oracle.com/technetwork/java/javasebusiness/downloads/java-archivedownloads-javase6-419409.html
3. Map Reduce Jobs in Java
4. Redgate HDFS Explorer http://bigdatainstallers.azurewebsites.net/files/HDFS%20Explorer/beta/1/HDFS%20Explorer%20%20beta.application
http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform[07/05/2016 14:53:15]
5. Eclipse Plugin for Hadoop MapReduce Jobs with Simple HDFS Explorer and Code Completion Configuartion
like Visual Studio
a. Eclipse IDE - http://www.eclipse.org/downloads/download.php?
file=/technology/epp/downloads/release/kepler/SR2/eclipse-jee-kepler-SR2-win32-x86_64.zip
b. Hadoop MapReduce Plugin for Eclipse - https://github.com/winghc/hadoop2x-eclipseplugin/archive/master.zip
6. Datasets
7. Recipe Samples
Source Code: https://github.com/prabaprakash/Hadoop-Map-Reduce-Code
Documentation:
https://github.com/prabaprakash/Hadoop-Map-ReduceCode/blob/master/Recipe%20Sample/Recipe%20Documentation.docx
8. If You Need Hadoop 2.5.1 Native Built for Ubuntu 14.10
Setup: https://github.com/prabaprakash/Hadoop-2.5.1-Binary
Config: https://github.com/prabaprakash/Hadoop-2.5.1-Config-Files
1. Introduction
Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a
distributed computing environment.
It is part of the Apache project sponsored by the Apache Software Foundation.
Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of
terabytes.
Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue
operating uninterrupted in case of a node failure.
This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become
inoperative.
Hadoop was inspired by Google's MapReduce, a software framework in which an application is broken down
into numerous small parts.
Any of these parts (also called fragments or blocks) can be run on any node in the cluster. Doug Cutting,
Hadoop's creator, named the framework after his child's stuffed toy elephant.
The current Apache Hadoop ecosystem consists of the Hadoop kernel, MapReduce, the Hadoop distributed file
system (HDFS) and a number of related projects such as Apache Hive, HBase and Zookeeper.
The Hadoop framework is used by major players including Google, Yahoo and IBM, largely for applications
involving search engines and advertising. The preferred operating systems are Windows and Linux but Hadoop
can also work with BSD and OS
We aren't able to understand Apache Hadoop Framework without Interactive Sessions, so I will list some YouTube
playlists that will explain Apache Hadoop interactively/:
Playlist 1 - By Lynn Langit

http://www.youtube.com/playlist?list=PL8C3359ECF726D473
Playlist 2 - By handsonerp
http://www.youtube.com/user/handsonerp/search?query=hadoop
Playlist 3- By Eduraka!
http://www.youtube.com/playlist?list=PL9ooVrP1hQOHpJj0DW8GoQqnkbptAsqjZ
Some Ways to Install Hadoop in Windows

1. Cygwin
a. http://sundersinghc.wordpress.com/2013/04/08/running-hadoop-on-cygwin-in-windows-single-nodecluster/
b. http://bigdata.globant.com/?p=7
c. http://alans.se/blog/2010/hadoop-hbase-cygwin-windows-7-x64/#.U0bamFerMiw
2. Azure HD Insight Emulator
a. http://azure.microsoft.com/en-us/documentation/articles/hdinsight-get-started-emulator/
3. Build Hadoop for Windows
a. By Apache Doc - https://svn.apache.org/viewvc/hadoop/common/branches/branch-2/BUILDING.txt?
view=markup
b. Perfect Guide By Abhijit Ghosh from https://app.box.com/s/11fwozokqmc1ohttt117
4. HortonWorks for Windows (Hadoop 2.0) and also SandBox Images of Hadoop 2.0 for Hyper-V / Vmware /
Virtual Box
a. HortonWorks for Windows - http://hortonworks.com/partner/microsoft/

b. Sandbox 2.0 - http://hortonworks.com/products/hortonworks-sandbox/
5. Clodera VM
a. http://www.cloudera.com/content/support/en/downloads.html
Other Cloud Services

1. Azure HD Insight
2. Amazon Elastic Map Reduce
3. IBM Blue Mix - Hadoop Service
2. Hadoop 2.3 for Windows 7/8/8.1 - Specifically Builded

for Windows x64
I built Hadoop 2.3 for windows x64 with the help of steps provided by Abhijit Ghosh from
http://www.srccodes.com/p/article/38/build-install-configure-run-apache-hadoop-2.2.0-microsoft-windows-os
Hide Shrink
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
Executed tasks
--- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ hadoop-dist --Building jar: C:\hdp\hadoop-dist\target\hadoop-dist-2.3.0-javadoc.jar
-----------------------------------------------------------------------Reactor Summary:
Apache Hadoop Main ................................ SUCCESS
Apache Hadoop Project POM ......................... SUCCESS
Apache Hadoop Annotations ......................... SUCCESS
Apache Hadoop Assemblies .......................... SUCCESS
Apache Hadoop Project Dist POM .................... SUCCESS
Apache Hadoop Maven Plugins ....................... SUCCESS
Apache Hadoop MiniKDC ............................. SUCCESS
Apache Hadoop Auth ................................ SUCCESS
Apache Hadoop Auth Examples ....................... SUCCESS
Apache Hadoop Common .............................. SUCCESS
Apache Hadoop NFS ................................. SUCCESS
Apache Hadoop Common Project ...................... SUCCESS
Apache Hadoop HDFS ................................ SUCCESS
Apache Hadoop HttpFS .............................. SUCCESS
Apache Hadoop HDFS BookKeeper Journal ............. SUCCESS
Apache Hadoop HDFS-NFS ............................ SUCCESS
Apache Hadoop HDFS Project ........................ SUCCESS
hadoop-yarn ....................................... SUCCESS
hadoop-yarn-api ................................... SUCCESS
hadoop-yarn-common ................................ SUCCESS
hadoop-yarn-server ................................ SUCCESS
hadoop-yarn-server-common ......................... SUCCESS
hadoop-yarn-server-nodemanager .................... SUCCESS
hadoop-yarn-server-web-proxy ...................... SUCCESS
hadoop-yarn-server-resourcemanager ................ SUCCESS
hadoop-yarn-server-tests .......................... SUCCESS
hadoop-yarn-client ................................ SUCCESS
hadoop-yarn-applications .......................... SUCCESS
hadoop-yarn-applications-distributedshell ......... SUCCESS
hadoop-yarn-applications-unmanaged-am-launcher .... SUCCESS
hadoop-yarn-site .................................. SUCCESS
hadoop-yarn-project ............................... SUCCESS
hadoop-mapreduce-client ........................... SUCCESS
hadoop-mapreduce-client-core ...................... SUCCESS
hadoop-mapreduce-client-common .................... SUCCESS
hadoop-mapreduce-client-shuffle ................... SUCCESS
hadoop-mapreduce-client-app ....................... SUCCESS
hadoop-mapreduce-client-hs ........................ SUCCESS
hadoop-mapreduce-client-jobclient ................. SUCCESS
hadoop-mapreduce-client-hs-plugins ................ SUCCESS
Apache Hadoop MapReduce Examples .................. SUCCESS
hadoop-mapreduce .................................. SUCCESS
Apache Hadoop MapReduce Streaming ................. SUCCESS
Apache Hadoop Distributed Copy .................... SUCCESS
Apache Hadoop Archives ............................ SUCCESS
Apache Hadoop Rumen ............................... SUCCESS
Apache Hadoop Gridmix ............................. SUCCESS
Apache Hadoop Data Join ........................... SUCCESS
Apache Hadoop Extras .............................. SUCCESS
Apache Hadoop Pipes ............................... SUCCESS
Apache Hadoop OpenStack support ................... SUCCESS
Apache Hadoop Client .............................. SUCCESS
Apache Hadoop Mini-Cluster ........................ SUCCESS
Apache Hadoop Scheduler Load Simulator ............ SUCCESS
Apache Hadoop Tools Dist .......................... SUCCESS
[1.847s]
[3.218s]
[3.812s]
[0.522s]
[3.717s]
[6.613s]
[7.117s]
[5.104s]
[4.230s]
[3:18.829s]
[13.442s]
[0.066s]
[2:45.070s]
[40.280s]
[10.956s]
[5.037s]
[0.075s]
[0.070s]
[1:12.357s]
[46.634s]
[0.071s]
[10.907s]
[25.635s]
[4.293s]
[30.427s]
[3.817s]
[7.340s]
[0.068s]
[3.047s]
[2.346s]
[0.101s]
[4.986s]
[0.137s]
[51.554s]
[28.285s]
[3.548s]
[22.627s]
[12.972s]
[51.921s]
[2.340s]
[9.765s]
[3.397s]
[16.817s]
[37.303s]
[2.773s]
[11.225s]
[7.554s]
[3.982s]
[4.627s]
[0.080s]
[8.620s]
[8.964s]
[0.186s]
[16.472s]
[7.326s]
Copy Code

[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
[INFO]
Apache Hadoop Tools ............................... SUCCESS [0.066s]

Apache Hadoop Distribution ........................ SUCCESS [1:09.690s]
-----------------------------------------------------------------------BUILD SUCCESS
-----------------------------------------------------------------------Total time: 17:47.469s
Finished at: Sun Mar 23 18:01:41 IST 2014
Final Memory: 131M/349M
------------------------------------------------------------------------
Step to Installation
1. Download Hadoop 2.3 for Windows (112.5 MB) from my box account https://app.box.com/s/11fwozokqmc1ohttt117
2. Also Download the configuration file from my box account - https://github.com/prabaprakash/Hadoop-2.3Config/archive/master.zip
You have these files with you
fine!
3. Open hadoop-2.3.0.tar.gz with winrar ,extract in local disk
4. Open config.rar with winrar
Open bin directory in winrar. extract yarn.cmd file into c:\hadoop-2.3.0\bin folder
Open config\etc\hadoop extract
1.
2.
3.
4.
5.
6.
7.
yarn-site.xml
mapred.xml
https-site.xml
hdfs-site.xml
hadoop-policy.xml
core-site.xml
capacity-scheduler.xml
to c:\hadoop-2.3.0\etc\hadoop and replace it.
5. It's mandatory, because Apache Developer build Hadoop framework using Java 1.6 so, we needed Java 1.6 sdk,
and also Java 1.6 Runtime
a. Download Java SDK 1.6.0_31
http://download.oracle.com/otn/java/jdk/6u31-b05/jdk-6u31-windows-x64.exe
Then Install It
6. Set The Environmental Variables
Control Panel\System and Security\System
Open Advanced System Settings
Then, add new variable " HADOOP_HOME " - value " c:\hadoop-2.3.0 "
Also add new variable " JAVA_HOME " - value " java installation path "
System Variables -> Path -> Edit

Add Hadoop bin path, Java 6 bin path -> click ok
7. Then Open hadoop-env.cmd in wordpad located in C:\hadoop-2.3.0\etc\hadoop\hadoop-env.cmd

Set the JAVA_HOME path in line 25! remember not JDK bin path.
8. Let Play with Apache Hadoop 2.3

a. Open cmd as adminstrator
Hide Shrink
C:\Windows\system32>cd c:\hadoop-2.3.0\bin
c:\hadoop-2.3.0\bin>hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs
run a generic filesystem user client
version
print the version
jar <jar>
run a jar file
checknative [-a|-h] check native hadoop and compression libraries availabilit
y
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent> <src>* <dest> create a hadoop archi
ve
classpath
prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog
get/set the log level for each daemon
or
CLASSNAME
run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
c:\hadoop-2.3.0\bin>cd c:\hadoop-2.3.0\bin
c:\hadoop-2.3.0\bin>hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs
run a generic filesystem user client
version
print the version
jar <jar>
run a jar file
Copy Code

checknative [-a|-h]
check native hadoop and compression libraries availabilit
y
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent> <src>* <dest> create a hadoop archi
ve
classpath
daemonlog
or
CLASSNAME
prints the class path needed to get the

Hadoop jar and the required libraries
get/set the log level for each daemon
run the class named CLASSNAME
Most commands print help when invoked w/o parameters.

c:\hadoop-2.3.0\bin>hadoop namenode -format
</br>
It will create a HDFS in your system and format it.

Hide Copy Code
c:\hadoop-2.3.0\bin>cd..
c:\hadoop-2.3.0>cd sbin
c:\hadoop-2.3.0\sbin>start-dfs.cmd
c:\hadoop-2.3.0\sbin>start-yarn.cmd
starting yarn daemons
So check, whether Apache Namenode & Datanode, Apache Yarn Nodemanger & Yarn Resouce Manager is
running concurrenlty.
OK, let's go to mapreduce
3. Some Map Reduce Jobs

I had seen every where programmer begin their first mapreduce programming using simple WordCount
program.
I was bored, so let's begin with recipe's .
a. Download the Recipeitems-latest.son file ( 26 MB)
http://openrecipes.s3.amazonaws.com/recipeitems-latest.json.gz
b. Create a folder in c:\> named as hwork
Extract recipe-latest.json.gz in c:\>hwork folder. it was about 150 MB.
It contain about 1.5 Lakh of Recipe Items
{ "_id" : { "oid" : "5160756b96cc62079cc2db15" }, "name" : "Drop Biscuits and Sausage Gravy",

"ingredients" : "Biscuits\n3 cups All-purpose Flour\n2 Tablespoons Baking Powder\n1/2 teaspoon
Salt\n1-1/2 stick (3/4 Cup) Cold Butter, Cut Into Pieces\n1-1/4 cup Butermilk\n SAUSAGE GRAVY\n1
pound Breakfast Sausage, Hot Or Mild\n1/3 cup All-purpose Flour\n4 cups Whole Milk\n1/2 teaspoon
Seasoned Salt\n2 teaspoons Black Pepper, More To Taste", "url" :
"http://thepioneerwoman.com/cooking/2013/03/drop-biscuits-and-sausage-gravy/", "image" :
"http://static.thepioneerwoman.com/cooking/files/2013/03/bisgrav.jpg", "ts" : { "date" : 1365276011104
}, "cookTime" : "PT30M", "source" : "thepioneerwoman", "recipeYield" : "12", "datePublished" : "201303-11", "prepTime" : "PT10M", "description" : "Late Saturday afternoon, after Marlboro Man had
returned home with the soccer-playing girls, and I had returned home with the..." }
c. Downlod Gson Libray for Java to deserialize the json
https://code.google.com/p/google-gson/downloads/detail?name=google-gson-2.2.4-release.zip&can=2&q=
extract the zip files, copy all jar files and paste into C:\hadoop-2.3.0\share\hadoop\common\lib
folder.....
approximately 1.5 Lakh recipe items are there in json file , my intention is to go the number of items per
"cooktime"
PT0H20M 25
PT0H25M 24
PT0H2M 3
PT0H30M 74
PT0H34M 1
PT0H35M 31
PT0H3M 1
PT0H40M 67
PT0H45M 74
PT0H50M 52
PT0H55M 10
PT0H5M 118
PT0H6M 1
PT0H7M 1
PT0H8M 6
PT0M 80
d. Map Reduce Code
Recipe.java
Hide Shrink
import java.io.IOException;
import
import
import
import
import
import
import
import
import
import
org.apache.hadoop.conf.Configuration;
org.apache.hadoop.fs.Path;
org.apache.hadoop.io.IntWritable;
org.apache.hadoop.io.Text;
org.apache.hadoop.mapreduce.Job;
org.apache.hadoop.mapreduce.Mapper;
org.apache.hadoop.mapreduce.Reducer;
org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
org.apache.hadoop.util.GenericOptionsParser;
import com.google.gson.Gson;
public class Recipe {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
Gson gson = new Gson();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
/* StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
} */
Roo roo=gson.fromJson(value.toString(),Roo.class);
if(roo.cookTime!=null)
{
word.set(roo.cookTime);
}
else
{
word.set("none");
}
context.write(word, one);
}
Copy Code

}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
/* for ( String string : otherArgs) {
System.out.println(string);
}*/
if (otherArgs.length != 2) {
System.err.println("Usage: recipe <in> <out>");
System.exit(2);
}
@SuppressWarnings("deprecation")
Job job = new Job(conf, "Recipe");
job.setJarByClass(Recipe.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
// FileInputFormat.addInputPath(job, new Path("hdfs://127.0.0.1:9000/in"));
// FileOutputFormat.setOutputPath(job, new Path("hdfs://127.0.0.1:9000/out"));
System.exit(job.waitForCompletion(true) ? 0 : 1);
// job.submit();
}
}
class Id
{
public String oid;
}
class Ts
{
public long date ;
}
class Roo
{
public Id _id ;
public String name ;
public String ingredients ;
public String url ;
public String image ;
public Ts ts ;
public String cookTime;
public String source ;
public String recipeYield ;
public String datePublished;
public String prepTime ;
public String description;
}
By dafault, Hadoop itself read the input file line by line and send it to
Hide Copy Code
class TokenizerMapper
In TokenizeMapper class, we deserializing the JSON string, then initalize it to Roo Class.
So by roo instantiate object, we will get cooktime, then set in to "Mapper context."
In TokenizerReduce class, count the number of items, then set into "Reducer context"
e. We need to compile it.
Create and copy Recipe.java in c:\>hwork folder, then follow the given command
Hide Copy Code
c:\Hwork>javac -classpath C:\hadoop-2.3.0\share\hadoop\common\hadoop-common-2.3.0.jar;C:\hadoop2.3.0\share\hadoop\mapreduce\hadoop-mapreduce-client-core-2.3.0.jar;C:\hadoop2.3.0\share\hadoop\common\lib\gson-2.2.4.jar;C:\hadoop-2.3.0\share\hadoop\common\lib\commons-cli1.2.jar Recipe.java
Now our mapreduce program is compiled successfully. Then we need to create a jar file because Hadoop
need jar file to run it.
To make jar, follow the below command
Hide Copy Code
C:\Hwork>jar -cvf Recipe.jar *.class

added manifest
adding: Id.class(in = 217) (out= 179)(deflated 17%)
adding: Recipe$IntSumReducer.class(in = 1726) (out= 736)(deflated 57%)
adding: Recipe$TokenizerMapper.class(in = 1887) (out= 820)(deflated 56%)
adding: Recipe.class(in = 1861) (out= 1006)(deflated 45%)
adding: Roo.class(in = 435) (out= 293)(deflated 32%)
adding: Ts.class(in = 201) (out= 168)(deflated 16%)
We are ready to run mapreduce program, but before we need to copy c:\>hwork\recipe-items.json file to
Hadoop distributed filesystem, follow the steps given below
Hide Shrink
Copy Code
c:\hadoop-2.3.0\sbin>hadoop fs -mkdir /in
c:\hadoop-2.3.0\sbin>hadoop fs -copyFromLocal c:\Hwork\recipeitems-latest.json /in

So We Copied the file from local to Hadoop Distributed File System...
Run The mapreduce ......
c:\hadoop-2.3.0\sbin>hadoop jar c:\Hwork\Recipe.jar Recipe /in /out
14/04/12 00:52:02 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/04/12 00:52:03 INFO input.FileInputFormat: Total input paths to process : 1
14/04/12 00:52:03 INFO mapreduce.JobSubmitter: number of splits:1
14/04/12 00:52:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1397243723769_0001
14/04/12 00:52:04 INFO impl.YarnClientImpl: Submitted application application_1397243723769_0001
14/04/12 00:52:04 INFO mapreduce.Job: The url to track the job:
http://OmSkathi:8088/proxy/application_1397243723769_0001/
14/04/12 00:52:04 INFO mapreduce.Job: Running job: job_1397243723769_0001
14/04/12 00:52:16 INFO mapreduce.Job: Job job_1397243723769_0001 running in uber mode : false
14/04/12 00:52:16 INFO mapreduce.Job: map 0% reduce 0%
14/04/12 00:52:34 INFO mapreduce.Job: Job job_1397243723769_0001 completed successfully
14/04/12 00:52:34 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=3872
FILE: Number of bytes written=180889
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=119406749
HDFS: Number of bytes written=2871
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=7383
Total time spent by all reduces in occupied slots (ms)=5121
Total time spent by all map tasks (ms)=7383
Total time spent by all reduce tasks (ms)=5121
Total vcore-seconds taken by all map tasks=7383
Total vcore-seconds taken by all reduce tasks=5121

Total megabyte-seconds taken by all map tasks=7560192
Total megabyte-seconds taken by all reduce tasks=5243904
Map-Reduce Framework
Map input records=146949
Map output records=146949
Map output bytes=1387492
Map output materialized bytes=3872
Input split bytes=113
Combine input records=146949
Combine output records=293
Reduce input groups=293
Reduce shuffle bytes=3872
Reduce input records=293
Reduce output records=293
Spilled Records=586
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=70
CPU time spent (ms)=5108
Physical memory (bytes) snapshot=370135040
Virtual memory (bytes) snapshot=428552192
Total committed heap usage (bytes)=270860288
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=119406636
File Output Format Counters
Bytes Written=2871
Map Reduce Job Completed Successfully ,Now Check Output folder " /out "
c:\hadoop-2.3.0\sbin>hadoop fs -ls /out
Windows_NT-amd64-64
Found 2 items
-rw-r--r-1 PrabaKarthi supergroup
-rw-r--r-1 PrabaKarthi supergroup
0 2014-04-12 00:52 /out/_SUCCESS

2871 2014-04-12 00:52 /out/part-r-00000
Open the ouput files.have a good look , you will enjoy hadoop analytics work by Apache
c:\hadoop-2.3.0\sbin>hadoop fs -cat /out/part-r-00000
P0D
121
P1D
2
P1DT6H
1
P4DT8H
1
PT
8491
PT0H10M
56
PT0H12M
1
PT0H14M
1
PT0H15M
55
PT0H1M
1
PT0H20M
25
PT0H25M
24
PT0H2M
3
PT0H30M
74
PT0H34M
1
PT0H35M
31
PT0H3M
1
PT0H40M
67
PT0H45M
74
PT0H50M
52
PT0H55M
10
PT0H5M
118
PT0H6M
1
PT0H7M
1
PT0H8M
6
PT0M
80
PT1008H
1
PT100M
1
PT10H
102
PT10H0M
2
PT10H10M
5
PT10H15M
4
PT10H20M
1
PT10H25M
1
PT10H30M
5
PT10H35M
1
PT10H40M
1
PT10H45M
1
PT10M
9982
So, we done mapreduce job. Every one knows about, but I am going list the tools make your work more
easier, then before.
4. Redgate HDFS Explorer

I get bored while copying local files to Hadoop filesystem using command and also retrive the Hadoop filesystem
data using command. I got this open source software is very fun, first download it (2.5 MB)
http://bigdatainstallers.azurewebsites.net/files/HDFS%20Explorer/beta/1/HDFS%20Explorer%20-%20beta.application
a. Install it, we already copied the configuration files for Hadoop 2.3, so, our hadoop filesystem will be accessible
remotely, also using webclient in Java, C#, Python etc.
b. Open HDFS Explorer

File->Add Connection
Browse our Hadoop File System in Graphical File Explorer. Copy the input file from local disk and paste it in
hdfs, also copy the output form hdfs and paste it in your local disk, you can do every operation, what a
traditional file explorer will do. Enjoy with HDFS Explorer
Fine hdfs explorer is good, but I was bored writting mapreduce coding in Notepad++ without perfect
intellisene, indentation. I got eclipse plugin for hadoop mapreduce. Let's go to next topic
5. Eclipse Plugin for Hadoop MapReduce Jobs with

Simple HDFS Explorer and Auto Code Completion
Configuartion like Visual Studio
Eclipse IDE - http://www.eclipse.org/downloads/download.php?
file=/technology/epp/downloads/release/kepler/SR2/eclipse-jee-kepler-SR2-win32-x86_64.zip
Hadoop MapReduce Plugin for Eclipse - https://github.com/winghc/hadoop2x-eclipse-plugin/archive/master.zip

Let begin, Download the Above " Eclipse Kepler IDE " ( 250 MB ) , also Download the Hadoop MapReduce Pluign for
Eclipse (23 MB).
a. Extract the Eclipse IDE
For explanation : extract eclipse IDE in D:\>eclipse
b. Open hadoop2x-eclipse-plugin-master.zip
goto "release" directory , extract " hadoop-eclipse-kepler-plugin-2.2.0.jar " file into eclispe\plugin folder
Let Rock and Role.

c. Open Eclipse IDE (Run As Administrator)
Choose Your Own WorkPlace Location -> Click OK
Menu->Window->Open Perspective->Other->Map/Reduce
d. I love Visual Studio more so, I need intellisense and code formatting as like Visual Studio (somehow) for
eclipse? Some Configuration, which make work easier
Menu->Window->Preference->Java->Editor->Content Assistent->"Auto Activation"
check enable auto activation
auto activation delay(ms)
auto activation trigger for java : .(abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
auto activation triggers for javadoc:@#
Apply->Ok
e. Configure Hdfs and Map/reduce connection

Map/reduce location->new hadoop location
Location name : "some name", and other are same as given below in the image, don't modify because we
configured the mapreduce address, dfs address already in c:\hadoop-2.3.0\etc\hadoop folder.
Simple HDFS Explorer
f. File->new project->map /reduce Project->next

It showing error becasue hadoop installation folder not configured correctly
Now, browse installation directory and click apply
Click next
Import, usually we will do mistake here becasue Hadoop 2.3 need jdk 6 for runtime/compilation so and so
Click Add Library ->JRE System Library
Click installed JRE's
Add -> Standard VM
Browse the jdk 1.6 location and click finish

ok->ok->finish->finish
So, Hadoop 2.3 libraries are added -> good , again we got jdk 1.7 error ,we need jdk 1.6
Change to jdk 1.6 ->click ok
Finally Hadoop 2.3 Libraries a along with Jdk 1.6.
Your Eclipse is Configured Perfectly for Hadoop MapReduce Coding and Exection along with Intellisense. Let's code
1. Add new -> Recipe.java in src folder,then copy and paste the above code
2. Right click -> Recipe.java -> Runs As->Run on Hadoop
3. Map Reduce Job is Running
4. Job Completed
Examples
1. Hadoop : WordCount with Custom Record Reader of TextInputFormat
6. Datasets
1.
2.
3.
4.
5.
Large Public Datasets

Free Large datasets to experiment with Hadoop
Explain patent data set in Hadoop example
60,000+ Documented UFO Sightings With Text Descriptions And Metadata
Recipe-Items List
Reference Books
1. Hadoop Map Reduce CookBook - Srinath Perera
2. Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and
Graph
3. Hadoop: The Definitive Guide MapReduce for the Cloud
Reference Links
1.
2.
3.
4.
Searchcloudcomputing.techtarget.com
Hadoop: What it is, how it works, and what it can do
IBM: What is Hadoop?
Hadoop at Yahoo
Conclusion
I am sure, this article will be helpful for Beginners & Intermediary Programmers to Bootstrap Apache Hadoop (Big
Data Analytics Framework ) in Windows Environment.
Yours Friendly
Prabakaran.A
License
This article, along with any associated source code and files, is
licensed under The Code Project Open License (CPOL)
Share
EMAIL
About the Author

Praba Prakash
Student
India
Microsoft Student Partner (2014-2015) ,

MS Software Engineering (2011-2016),
VIT Chennai Campus - India
I am very Curious to learn Technologies.....
Curiosity is the key to Creativity
- Akio Morita
"Luck is a dividend of sweat. The more you sweat, the luckier you get"
- Ray Kroc
You may also be interested in...

Analyzing some Big Data Using C#,
Azure And Apache Hadoop
Analyzing Stack Overflow Data
Dumps
Using Apache Hadoop with IBM

BigInsights to Deliver Value Quickly
Apache vhosts on Windows Azure

platform
How-To Intel IoT Code Samples:

Watering System
Hadoop For .Net & AngularJS

Developers
How-To Intel IoT Technology Code

Samples: Robot arm in C++
Comments and Discussions
You must Sign In to use this message board.
Search Comments
Profile popupsSpacing Relaxed

Layout Normal
Relaxed
Normal
50
Per page 50
Go
Update
First Prev Next
Error While Setting up Hadoop
Member 12487440
How to integrate hadoop with mongoDB.
Abhijeet Rathore
3-Apr-16 18:32
Hadoop 2.3 installation error
Member 12407397
21-Mar-16 6:57
Error while executing hadoop namnode --format
daya nidhi
29-Feb-16 5:23
Getting error while formatting namenode
Member 12165836
24-Nov-15 23:57
HOW ADD ANTHER DATANODE IN HADOOP FOR WINDOWS
Member 11528825
24-Nov-15 20:26
System Configuration
Member 11882040
3-Aug-15 2:22
Error when implementing step 3
Member 11878438
1-Aug-15 0:34
After "c:\hadoop-2.3.0\sbin>hadoop jar c:\Hwork\Recipe.jar

Recipe /in /out" commend Running is not getting completed.
Member 11833982
13-Jul-15 5:02
Member 12362460
1-Mar-16 8:58
Member 12343921
12-Mar-16 15:39
simply amazing Article!
Member 11750923
11-Jul-15 21:46
excpetion on namenode and dtata while executing start-dfs
Member 11803234
30-Jun-15 4:34
don't find com/sun/tools/javac/main
Member 11794716
26-Jun-15 2:27
Config files and Error while executing namenode format
Member 11759008
11-Jun-15 4:39
Error running job from eclipse
Member 11661634
3-May-15 17:42
Re: After "c:\hadoop-2.3.0\sbin>hadoop jar c:\Hwork\Recipe.jar

Recipe /in /out" commend Running is not getting completed.
Re: After "c:\hadoop-2.3.0\sbin>hadoop jar
c:\Hwork\Recipe.jar Recipe /in /out" commend Running is not
getting completed.
Re: Error running job from eclipse

Namenode & datanode not getting up
config.rar not found
26-Apr-16 19:45
Member 11661634
4-May-15 4:27
Member 11601116
12-Apr-15 9:00

18-Mar-15 5:55
Member 11536117
Good Article
uspatel
Start-dfs--->Unable to load
Member 11410368
Re: Start-dfs--->Unable to load
3-Feb-15 0:41
28-Jan-15 22:53
1-Feb-15 18:33
Member 11410368
19-Oct-15 17:30
Ali Shery
6-Dec-15 8:05
Member 12189644
how to solve this elipse plugin error?
22-Jan-15 1:07
Member 11343494
Re: how to solve this elipse plugin error?
1-Feb-15 18:31
Member 11410368
Deploying Multiple Nodes
Member 11391542
21-Jan-15 2:47
error counld not find or load main class
anujjaingit
15-Jan-15 2:35
Unable to add Hadoop plugin in Kepler eclipse
Member 11374732
14-Jan-15 17:07
JOBTRACKER AND TASK TRACKER
Member 11360364
7-Jan-15 22:21
How to configure HDFS Explorer
Member 11345797
31-Dec-14 18:00
Re: How to configure HDFS Explorer
2-Jan-15 1:26
Member 11343494
Great Article!!! Getting Exception in thread "main"

java.io.IOException: No FileSystem for scheme: hdfs
escortnotice
19-Dec-14 12:56
click on eclipse plugin map/reduce perspective, nothing

heppened
Member 11309052
13-Dec-14 19:15
Re: click on eclipse plugin map/reduce perspective, nothing

heppened
Member 11343494
2-Jan-15 1:22
Member 11796773
27-Jun-15 5:54
Hi...Excellent Tutorial
Member 11288927
5-Dec-14 3:24
Excellent tutorial, just few problems when running in eclipse
Member 11273919
29-Nov-14 12:40
failed to download hadoop-2.3.0.tar.gz
Member 11143865
28-Nov-14 21:02
Mapreduce job submitted but not running
Prasanthpdp
27-Nov-14 9:54
hadoop fs -copyFromLocal c:\hwork\recipeitems-latest.json /in
dilipkumarreddy
22-Nov-14 18:31
Re: click on eclipse plugin map/reduce perspective, nothing

heppened
Re: hadoop fs -copyFromLocal c:\hwork\recipeitems-latest.json

/in
Re: hadoop fs -copyFromLocal c:\hwork\recipeitems-latest.json
/in
Re: hadoop fs -copyFromLocal c:\hwork\recipeitemslatest.json /in
Hadoop plugin for eclipse is throwing error.
24-Nov-14 2:42
Praba Prakash
24-Nov-14 22:56
dilipkumarreddy
25-Nov-14 0:16
Praba Prakash
Member 11256172
Re: Hadoop plugin for eclipse is throwing error.
22-Nov-14 16:14
24-Nov-14 0:51
Member 1753496
27-Nov-14 13:58
Prasanthpdp
24-Nov-14 2:46
Praba Prakash
Member 11256172
26-Nov-14 10:28
Member 11374732
15-Jan-15 20:19
Last Visit: 31-Dec-99 18:00 Last Update: 6-May-16 23:52

General
News
Suggestion
Question
Refresh
Bug
Answer
Joke
Praise
1 2 3 Next
Rant
Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.
Admin

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.160426.1 | Last Updated 16 Jul 2014
Layout: fixed | fluid
Article Copyright 2014 by Praba Prakash

Everything else Copyright CodeProject, 1999-2016

Apache Hadoop For Windows Platform - CodeProject

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Apache Hadoop For Windows Platform - CodeProject

Загружено:

Авторское право:

Доступные форматы

Apache Hadoop for Windows Platform - CodeProject

12,256,151 members (63,204 online)

Articles Languages Java General

Add your own

4.72 (31 votes)

Search for articles, questions, tips

Apache Hadoop 2.3 for Big Data Analytics

Check this Video for Apache Hadoop Installation in Windows