Вы находитесь на странице: 1из 30

Reading and Writing

Text Files in Java


John Lamertina
(Dietel Java 5.0 Chp 14, 19, 29)
April 2007
Content
 Reading and Writing Data Files (chp 14)
 String Tokenizer to Parse Data (chp 29)
 Comma Separated Value (CSV) Files – an
exercise which applies:
 Multi-dimensional arrays (chp 7)
 Exception Handling (chp 13)
 Files (chp 14)
 ArrayList Collection (chp 19)
 Tokenizer (chp 29)
Data Hierarchy
 Field – a group of characters or bytes that
conveys meaning
 Record – a group of related fields
 File – a group of related records
 Record key – identifies a record as belonging to
a particular person or entity – used for easy
retrieval of specific records
 Sequential file – file in which records are stored
in order by the record-key field

Reading & Writing Files 3


Java Streams and Files
 Each file is a sequential stream of bytes
 Operating system provides mechanism to
determine end of file
 End-of-filemarker
 Count of total bytes in file
 Java program processing a stream of
bytes receives an indication from the
operating system when program reaches
end of stream
Reading & Writing Files 4
File - Object - Stream
 Java opens file by creating an object and associating a
stream with it
 Standard streams – each stream can be redirected
 System.in – standard input stream object, can be
redirected with method setIn
 System.out – standard output stream object, can be
redirected with method setOut
 System.err – standard error stream object, can be
redirected with method setErr

Reading & Writing Files 5


Classes related to Files
 java.io classes
 FileInputStream and FileOutputStream – byte-based I/O
 FileReader and FileWriter – character-based I/O
 ObjectInputStream and ObjectOutputStream – used for
input and output of objects or variables of primitive data
types
 File – useful for obtaining information about files and
directories
 Classes Scanner and Formatter
 Scanner – can be used to easily read data from a file
 Formatter – can be used to easily write data to a file

Reading & Writing Files 6


File Class
 Common File methods
 exists – return true if file exists where it is
specified
 isFile – returns true if File is a file, not a
directory
 isDirectory – returns true if File is a directory
 getPath – return file path as a string
 list – retrieve contents of a directory

Reading & Writing Files 7


Write with Formatter Class
 Formatter class can be used to open a text file for writing
 Pass name of file to constructor
 If file does not exist, will be created
 If file already exists, contents are truncated (discarded)
 Use method format to write formatted text to file
 Use method close to close the Formatter object (if
method not called, OS normally closes file when program
exits)
 Example: see figure 14.7 (p 686-7)

Reading & Writing Files 8


Possible Exceptions
 SecurityException – occurs when opening file
using Formatter object, if user does not have
permission to write data to file
 FileNotFoundException – occurs when opening
file using Formatter object, if file cannot be found
and new file cannot be created
 NoSuchElementException – occurs when invalid
input is read in by a Scanner object
 FormatterClosedException – occurs when an
attempt is made to write to a file using an already
closed Formatter object

Reading & Writing Files 9


Read with Scanner Class
 Scanner object can be used to read data
sequentially from a text file
 Pass File object representing file to be read to
Scanner constructor
 FileNotFoundException occurs if file cannot be
found
 Data read from file using same methods as for
keyboard input – nextInt, nextDouble, next, etc.
 IllegalStateException occurs if attempt is
made to read from closed Scanner object
 Example: see Figure 14.11 (p 690-1)

Reading & Writing Files 10


Tokens: Fields of a Record
 Tokenization breaks a statement, sentence, or
line of data into individual pieces
 Tokens are the individual pieces
 Words from a sentence
 Keywords, identifiers, operators from a Java
statement
 Individual data items or fields of a record (that were
separated by white space, tab, new line, comma, or
other delimiter)

String Tokenizer 11
String Classes
 Class java.lang.String
 Class java.lang.StringBuffer
 Class java.util.StringTokenizer

String Tokenizer 12
StringTokenizer
 Breaks a string into component tokens
 Default delimiters: “ \t \n \r \f”
 space, tab, new line, return, or form feed
 Specify other delimiter(s) at construction or in
method nextToken:
 String delimiter = “ , \n”;
StringTokenizer tokens = new StringTokenizer(sentence, delimiter); -or-
 String newDelimiterString = “|,”;
tokens.nextToken(newDelimiterString);

String Tokenizer 13
Example 29.18
import java.util.Scanner;
import java.util.StringTokenizer;

public class TokenTest {

public static void main (String[] args) {


Scanner scan = new Scanner(System.in);
System.out.println("Enter a sentence to tokenize and press Enter:");
String sentence = scan.nextLine();

// default delimiter is " \t\n\r\f"


String delimiter = " ,\n";
StringTokenizer tokens = new StringTokenizer(sentence, delimiter);
System.out.printf("Number of elements: %d\n", tokens.countTokens());

System.out.println("The tokens are:");


while (tokens.hasMoreTokens())
System.out.println(tokens.nextToken());
}
}

(Refer to p 1378)
String Tokenizer 14
Comma Separated Value (CSV)
Data Files
 Fields are separated by commas
 For data exchange between disparate
systems
 Pseudo standard used by Microsoft Excel
and other systems

Comma Separated Values 15


CSV File Format Rules
1. Each record is one line
2. Fields are separated by comma delimiters
3. Leading and trailing white space in a field is ignored unless the
field is enclosed in double quotes
4. First record in a CSV may be a header of field names. A CSV
application needs some boolean indication of whether first record
is a header.
5. Empty fields are indicated by consecutive comma delimiters.
Thus every record should have the same number of delimiters
6. Fields with embedded commas must be enclosed in double
quotes

For more information:


http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
Comma Separated Values 16
CSV Format vs StringTokenizer
 StringTokenizer with a comma delimiter will read
most CSV files, but does not account for empty
fields or a quoted field with embedded commas:
 Empty fields in a CSV file are indicated by
consecutive commas. Example:
 123, John ,, Doe (Middle Name field is blank)

 Fields with embedded commas are enclosed in


quotes. Example:
 456 , “King , the Gorilla” , Kong

Comma Separated Values 17


Exercise Part 1
 Develop and test classes to read and write CSV
data files, satisfying the first four “CSV File
Format Rules” (listed on a previous slide). Your
completed classes must:
 Handle the usual possible file exceptions
 Read CSV-formatted data from one or more files into
a single array
 Print the data array
 Write data from the array to a single file in CSV format
 Test your CSV reader to read and print sample
files:
 TestFile1.csv
 TestFile2.csv

Comma Separated Values 18


Multi-dimensional Arrays
 Java implements multi-dimensional arrays
as arrays of 1-dimensional arrays.
 Rows can actually have different numbers of
columns. Example:
int b[][];
b = new int[ 2 ][ ]; // create 2 rows
b[ 0 ] = new int[ 5 ]; // create 5 columns for row 0
b[ 1 ] = new int[ 3 ]; // create 3 columns for row 1

(Refer to p 311-315)
Comma Separated Values 19
Array Dimension: Length
 Recall that for a one-dimensional array:
int a[ ] = new int[ 10 ];
int size = a.length;

 For a two-dimensional array:


int b[][] = new int[ 10 ][ 20 ];
int size1 = b.length; // number of rows
int size2 = b[ i ].length; // number of cols for i-th row

Comma Separated Values 20


TestFile1.cvs
987, Thomas ,Jefferson,7 Estate Ave.,Loretto, PA, 15940
413, Martha,Washington,1600 Penna Ave,Washington, DC,20002
123, Martin , Martina ,777 Williams Ct.,Smallville, PA,15990
990, Shelby, Roosevelt,15 Jackson Pl,NYC,NY, 12345

TestFile2.cvs
ID, FName, LName, StreetAddress, City, State, Zip
123, John ,Dozer,120 Main st.,Loretto, PA, 15940
107, Jane,Washington,220 Hobokin Ave.,Philadelphia, PA,0911
123, William , Adams ,120 Jefferson St.,Johnstown, PA,15904
451, Brenda, Bronson,127 Terrace Road,Barrows,AK, 99789
729, Brainfield,Blanktowm, PA, 16600

Comma Separated Values 21


Exercise Part 2
 Develop an application that uses your CSV reader and
writer classes
 Read the test files (or create your own test files) and
perform data validity checks by displaying an appropriate
error message and the offending record(s):
 If any fields are missing
 If extra fields are found
 If any records have duplicate IDs
 If any record has an invalid zip code (i.e. not exactly 5 digits)
 Write all records to a single CSV file (i.e. concatenate
the multiple test files in a single file)

Comma Separated Values 22


Exercise Part 3 (extra credit)
 Extend your classes to be fully compliant
with the “CSV File Format Rules”.
 Hint: Review some existing CSV Java
libraries online.

Comma Separated Values 23


Hints 1.a
CSVFile
- boolean hasHeaderRow;
- String fileName;
- Scanner input;
- List<String> records;
- String data[][];
- int numRecords;
- int maxNumFields;
+ CSVFile(String fileName)
+ CSVFile(boolean headerRow, String fileName)
+ boolean getHasHeaderRow()
+ String getFileName()
+ int getNumRecords()
+ int getMaxNumFields()
+ void getData(String a[][])
+ void openFile()
+ void readRecords()
+ void parseFields()
+ void printData()

Comma Separated Values 24


Hints 1.b
import java.io.File;
import java.util.Scanner;
import java.io.FileNotFoundException;
import java.lang.IllegalStateException;
import java.util.NoSuchElementException;
import java.util.List;
import java.util.ArrayList;
import java.util.StringTokenizer;

Comma Separated Values 25


Hints 1.c
public void openFile() {
try {
input = new Scanner(new File(fileName));
}
catch (FileNotFoundException fileNotFound) {
...

public void readRecords() {


// Read all lines (records) from the file into an ArrayList
records = new ArrayList<String>();
try {
while (input.hasNext())
records.add( input.nextLine() );
...

Comma Separated Values 26


Hints 1.d
public void parseFields() {
String delimiter = ",\n";

// Create two-dimensional array to hold data (see Deitel, p 313-315)


int rows = records.size(); // #rows for array = #lines in file
data = new String[rows][]; // create the rows for the array
int row = 0;

for (String record : records) {


StringTokenizer tokens = new StringTokenizer(record,delimiter);
int cols = tokens.countTokens();
data[row] = new String[cols]; // create columns for current row
int col = 0;
while (tokens.hasMoreTokens()) {
data[row][col] = tokens.nextToken();
col++;
}

Comma Separated Values 27


Hints 1.e
public static void main (String[] args) {

CSVFile file1 = new CSVFile(true,"TestFile1.csv");


file1.openFile();
file1.readRecords();
file1.parseFields();
file1.printData();
String fileData[][] =
new String[file1.getNumRecords()][file1.getMaxNumFields()];
file1.getData(fileData);

Comma Separated Values 28


CSV Libraries
 http://ostermiller.org/utils/CSV.html
 http://opencsv.sourceforge.net/

Вам также может понравиться