Вы находитесь на странице: 1из 10

Copying a File from Local to HDFS using the Java API

In this session, we will learn how to copy a file present in the local filesystem to HDFS. Before directly
getting into the explanation of Java code for solving this problem let's first get comfortable with the
various Java classes which will be used for reading and writing files in HDFS.

The Java file for all the three examples covered in this session can be downloaded from the following
link.

The important Java classes in HDFS api are:

FileSystem: The object of FileSystem class treats the entire HDFS as a single disk even though the files are
stored across multiple nodes. For creating an object of FileSystem class, the static method get() defined
in the FileSystem class is called. The get() method takes the object of Configuration class as an argument.
In the following code "conf" is an object of Configuration class.

FileSystem fs = FileSystem.get(conf);

Configuration: The object of the Configuration class can be used to set various configurations parameters
of the Hadoop cluster such as the location of the namenode, size of the block, replication factors etc.

Path: The object of Path class points to a file in HDFS. In the following line of code, which demonstrates
the creation of an object of class Path, "file" is a String object having the full path of the file in HDFS.

Path path = new Path(file);

FSDataInputStream: This class is similar to FileInputStream present in Java I/O api. The object of this
class is used for reading a file present in HDFS. The object of the class FSDataInputStream is created by
calling the open() method defined in the FileSystem class. The open() method returns an object of
FSDataInputStream.

FSDataInputStream in = fs.open(path);

FSDataOutputStream: This class is similar to FileOutputStream present in Java I/O api. The object of this
class is used for writing contents into a file in HDFS. The object of the class FSDataOutputStream is
created by calling the create() method defined in the FileSystem class. The create() method returns an
object of FSDataOutputStream.
FSDataOutputStream out = fs.create(path);

Following are the steps involved in copying a file from a local file system to HDFS using JAVA API-

First of all, the choice, source and destination path from the user is accepted using the BufferedReader
class. As the program is using "switch" case, for demonstrating the copy of a file from local to HDFS the
choice value should be entered as 1 by the user. Note that in this example source address is an address
of a file in local filesystem whereas the destination path is an address of a directory in HDFS.

BufferedReader br=new BufferedReader(new InputStreamReader(System.in));

choice=Integer.parseInt(br.readLine());

source=br.readLine();

dest=br.readLine();

After taking the input, we create a Configuration object. Configuration object includes basic parameters,
i.e. the location of namenode, chunk or block size, the replication factors, etc. In this example, the
configuration object is used to set the HDFS default name.

String hdfsPath = "hdfs://quickstart.cloudera:8020", source="", dest="";

Configuration conf;

conf = new Configuration();

conf.set("fs.default.name", hdfsPath);

Having done all this, the configuration object is all set and is passed along with the source and
destination filename to the addFile() method.

FileSystemOperationsDemo.addFile(source, dest, conf);

In addFile() method a FileSystem object is created.

FileSystem fileSystem = FileSystem.get(conf);


From the source path, only the filename is extracted and added at the end of the destination path.

String filename = source.substring(source.lastIndexOf('/') + 1,source.length());

if (dest.charAt(dest.length() - 1) != '/')

dest = dest + "/" + filename;

else

dest = dest + filename;

In this step, a Path object is created by passing HDFS destination file name. If the path already exists then
the code returns from this point.

Path path = new Path(dest);

if (fileSystem.exists(path))

System.out.println("File " + dest + " already exists");

return;

If the path doesn’t exist in the file system, then an object of the class FSDataOutputStream is created for
writing in a file located in HDFS. This is done by calling create() method on the FileSystem object.

FSDataOutputStream out = fileSystem.create(path);

Similarly, in order to read a file from the local file system, an object of BufferedInputStream is created by
passing a File object pointing to the source file.

InputStream in = new BufferedInputStream(new FileInputStream(new File(source)));

Now In a loop, the data is read by calling the read() method. The data read from the file present in the
local filesystem is stored temporarily in a buffer of type byte array. The size of the buffer is 1024 bytes.
The contents stored in this buffer is written to the file present in HDFS using the FSDataOutputStream
object. The loop terminates when the end of the source file is reached.
byte[] b = new byte[1024];

int numBytes = 0;

while ((numBytes = in.read(b)) > 0)

out.write(b, 0, numBytes);

At last, the input and output streams along with the FileSystem object are closed.

in.close();

out.close();

fileSystem.close();

So now that have got a pretty good understanding of the Java code, let’s run the code and check its
working.

import java.io.BufferedInputStream;

import java.io.BufferedOutputStream;

import java.io.File;

import java.io.FileInputStream;

import java.io.FileOutputStream;

import java.io.IOException;

import java.io.InputStream;

import java.io.OutputStream;

import java.util.ArrayList;
import java.io.InputStreamReader;

import java.io.BufferedReader;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FSDataInputStream;

import org.apache.hadoop.fs.FSDataOutputStream;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

class FileSystemOperationsDemo

public static void addFile(String source, String dest, Configuration conf) throws IOException

FileSystem fileSystem = FileSystem.get(conf);

String filename = source.substring(source.lastIndexOf('/') + 1,source.length());

if (dest.charAt(dest.length() - 1) != '/')

dest = dest + "/" + filename;

else

dest = dest + filename;

Path path = new Path(dest);

if (fileSystem.exists(path))

System.out.println("File " + dest + " already exists");

return;
}

FSDataOutputStream out = fileSystem.create(path);

InputStream in = new BufferedInputStream(new FileInputStream(new File(source)));

byte[] b = new byte[1024];

int numBytes = 0;

while ((numBytes = in.read(b)) > 0)

out.write(b, 0, numBytes);

in.close();

out.close();

fileSystem.close();

public static void readFile(String file, Configuration conf) throws IOException

FileSystem fileSystem = FileSystem.get(conf);

Path path = new Path(file);

if (!fileSystem.exists(path))

System.out.println("File " + file + " does not exists");

return;

FSDataInputStream in = fileSystem.open(path);

String filename = file.substring(file.lastIndexOf('/') + 1,file.length());

OutputStream out = new BufferedOutputStream(new FileOutputStream(new File(filename)));

byte[] b = new byte[1024];


int numBytes = 0;

while ((numBytes = in.read(b)) > 0)

out.write(b, 0, numBytes);

in.close();

out.close();

fileSystem.close();

public static void deleteFile(String file, Configuration conf) throws IOException

FileSystem fileSystem = FileSystem.get(conf);

Path path = new Path(file);

if (!fileSystem.exists(path))

System.out.println("File " + file + " does not exists");

return;

fileSystem.delete(new Path(file), true);

fileSystem.close();

public class FileSystemOperationsTest


{

public static void main( String [] a) throws Exception

// hdfs://quickstart.cloudera:8020

String hdfsPath = "hdfs://quickstart.cloudera:8020", source="", dest="";

Configuration conf;

int choice;

BufferedReader br=new BufferedReader(new InputStreamReader(System.in));

while(true)

//

System.out.println("Enter 1 for Local to HDFS");

System.out.println("Enter 2 for HDFS to local");

System.out.println("Enter 3 for deletion from HDFS");

System.out.println("Enter 4 for exit...");

choice=Integer.parseInt(br.readLine());

switch(choice)
{

case 1:

System.out.println("Enter local source and HDFS destination


paths...");

source=br.readLine();

dest=br.readLine();

conf = new Configuration();

conf.set("fs.default.name", hdfsPath);

FileSystemOperationsDemo.addFile(source, dest, conf);

break;

case 2:

System.out.println("Enter HDFS source...");

source=br.readLine();

conf = new Configuration();

conf.set("fs.default.name", hdfsPath);

FileSystemOperationsDemo.readFile(source, conf);

break;

case 3:

System.out.println("Enter HDFS source to be deleted...");

source=br.readLine();

conf = new Configuration();

conf.set("fs.default.name", hdfsPath);

FileSystemOperationsDemo.deleteFile(source, conf);

break;

default:

System.out.println("Exiting...");
return;

Вам также может понравиться