Write a Java Program That Reads Sales Data From File
Reading files in Java is the cause for a lot of confusion. There are multiple means of accomplishing the same job and it's oft not clear which file reading method is best to use. Something that's quick and muddied for a small case file might not be the best method to utilize when you demand to read a very large file. Something that worked in an before Java version, might not be the preferred method anymore.
This article aims to be the definitive guide for reading files in Java 7, 8 and nine. I'm going to encompass all the ways you tin can read files in Java. Too ofttimes, you'll read an article that tells you lot ane way to read a file, only to find subsequently there are other ways to do that. I'm actually going to cover xv different means to read a file in Coffee. I'1000 going to encompass reading files in multiple ways with the core Coffee libraries also as 2 tertiary party libraries.
Simply that's not all – what good is knowing how to practise something in multiple ways if you don't know which way is best for your situation?
I likewise put each of these methods to a existent performance test and document the results. That way, you will have some difficult data to know the functioning metrics of each method.
Methodology
JDK Versions
Java code samples don't live in isolation, specially when information technology comes to Java I/O, every bit the API keeps evolving. All lawmaking for this article has been tested on:
- Coffee SE 7 (jdk1.7.0_80)
- Coffee SE eight (jdk1.viii.0_162)
- Java SE nine (jdk-9.0.4)
When there is an incompatibility, it will exist stated in that department. Otherwise, the code works unaltered for different Coffee versions. The chief incompatibility is the use of lambda expressions which was introduced in Java eight.
Java File Reading Libraries
There are multiple ways of reading from files in Java. This article aims to exist a comprehensive collection of all the dissimilar methods. I volition cover:
- coffee.io.FileReader.read()
- java.io.BufferedReader.readLine()
- coffee.io.FileInputStream.read()
- coffee.io.BufferedInputStream.read()
- java.nio.file.Files.readAllBytes()
- coffee.nio.file.Files.readAllLines()
- java.nio.file.Files.lines()
- java.util.Scanner.nextLine()
- org.apache.commons.io.FileUtils.readLines() – Apache Commons
- com.google.mutual.io.Files.readLines() – Google Guava
Closing File Resource
Prior to JDK7, when opening a file in Java, all file resource would need to be manually closed using a endeavor-catch-finally block. JDK7 introduced the try-with-resources statement, which simplifies the process of closing streams. You no longer need to write explicit lawmaking to close streams because the JVM volition automatically close the stream for y'all, whether an exception occurred or not. All examples used in this article use the endeavour-with-resources statement for importing, loading, parsing and closing files.
File Location
All examples will read examination files from C:\temp.
Encoding
Character encoding is not explicitly saved with text files and so Java makes assumptions near the encoding when reading files. Usually, the supposition is correct only sometimes yous want to be explicit when instructing your programs to read from files. When encoding isn't correct, y'all'll see funny characters appear when reading files.
All examples for reading text files utilise 2 encoding variations:
Default organization encoding where no encoding is specified and explicitly setting the encoding to UTF-8.
Download Code
All code files are bachelor from Github.
Code Quality and Code Encapsulation
There is a divergence betwixt writing code for your personal or work project and writing code to explicate and teach concepts.
If I was writing this lawmaking for my ain project, I would apply proper object-oriented principles like encapsulation, brainchild, polymorphism, etc. But I wanted to brand each example stand alone and easily understood, which meant that some of the code has been copied from 1 case to the next. I did this on purpose because I didn't want the reader to have to figure out all the encapsulation and object structures I so cleverly created. That would accept away from the examples.
For the same reason, I chose Not to write these example with a unit testing framework like JUnit or TestNG because that's not the purpose of this article. That would add some other library for the reader to understand that has nil to do with reading files in Coffee. That's why all the example are written inline inside the chief method, without actress methods or classes.
My main purpose is to brand the examples as piece of cake to empathise equally possible and I believe that having extra unit testing and encapsulation code will not assistance with this. That doesn't hateful that's how I would encourage yous to write your own personal lawmaking. It's just the way I chose to write the examples in this commodity to make them easier to empathize.
Exception Handling
All examples declare whatsoever checked exceptions in the throwing method declaration.
The purpose of this commodity is to show all the different means to read from files in Java – it'southward not meant to show how to handle exceptions, which will exist very specific to your state of affairs.
So instead of creating unhelpful try catch blocks that but print exception stack traces and ataxia upward the code, all example volition declare any checked exception in the calling method. This will make the code cleaner and easier to empathize without sacrificing whatever functionality.
Futurity Updates
As Coffee file reading evolves, I volition exist updating this article with any required changes.
File Reading Methods
I organized the file reading methods into three groups:
- Classic I/O classes that have been role of Coffee since before JDK 1.7. This includes the java.io and java.util packages.
- New Java I/O classes that take been function of Java since JDK1.seven. This covers the coffee.nio.file.Files class.
- Third party I/O classes from the Apache Commons and Google Guava projects.
Classic I/O – Reading Text
1a) FileReader – Default Encoding
FileReader reads in one grapheme at a fourth dimension, without any buffering. It'south meant for reading text files. It uses the default character encoding on your organization, and so I have provided examples for both the default case, as well equally specifying the encoding explicitly.
one
ii
3
4
5
6
vii
8
9
10
xi
12
xiii
14
15
16
17
18
19
import java.io.FileReader ;
import java.io.IOException ;public class ReadFile_FileReader_Read {
public static void master( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;endeavour ( FileReader fileReader = new FileReader (fileName) ) {
int singleCharInt;
char singleChar;
while ( (singleCharInt = fileReader.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;//display one grapheme at a time
System.out.print (singleChar) ;
}
}
}
}
1b) FileReader – Explicit Encoding (InputStreamReader)
It's actually not possible to set the encoding explicitly on a FileReader then you have to employ the parent class, InputStreamReader and wrap it around a FileInputStream:
1
2
3
four
5
6
seven
viii
nine
10
xi
12
xiii
14
15
16
17
18
19
20
21
22
import java.io.FileInputStream ;
import coffee.io.IOException ;
import java.io.InputStreamReader ;public grade ReadFile_FileReader_Read_Encoding {
public static void main( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
FileInputStream fileInputStream = new FileInputStream (fileName) ;//specify UTF-8 encoding explicitly
try ( InputStreamReader inputStreamReader =
new InputStreamReader (fileInputStream, "UTF-eight" ) ) {int singleCharInt;
char singleChar;
while ( (singleCharInt = inputStreamReader.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;
Organization.out.impress (singleChar) ; //display one character at a time
}
}
}
}
2a) BufferedReader – Default Encoding
BufferedReader reads an entire line at a fourth dimension, instead of one character at a time like FileReader. It's meant for reading text files.
1
2
iii
four
v
half dozen
7
8
9
10
11
12
13
14
15
16
17
import java.io.BufferedReader ;
import java.io.FileReader ;
import coffee.io.IOException ;public grade ReadFile_BufferedReader_ReadLine {
public static void chief( String [ ] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
FileReader fileReader = new FileReader (fileName) ;effort ( BufferedReader bufferedReader = new BufferedReader (fileReader) ) {
String line;
while ( (line = bufferedReader.readLine ( ) ) != null ) {
System.out.println (line) ;
}
}
}
}
2b) BufferedReader – Explicit Encoding
In a similar way to how we ready encoding explicitly for FileReader, we need to create FileInputStream, wrap it inside InputStreamReader with an explicit encoding and pass that to BufferedReader:
1
2
three
4
5
6
7
8
9
x
11
12
thirteen
14
15
xvi
17
18
19
twenty
21
22
import java.io.BufferedReader ;
import coffee.io.FileInputStream ;
import java.io.IOException ;
import coffee.io.InputStreamReader ;public class ReadFile_BufferedReader_ReadLine_Encoding {
public static void primary( String [ ] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;FileInputStream fileInputStream = new FileInputStream (fileName) ;
//specify UTF-eight encoding explicitly
InputStreamReader inputStreamReader = new InputStreamReader (fileInputStream, "UTF-eight" ) ;try ( BufferedReader bufferedReader = new BufferedReader (inputStreamReader) ) {
Cord line;
while ( (line = bufferedReader.readLine ( ) ) != null ) {
Arrangement.out.println (line) ;
}
}
}
}
Classic I/O – Reading Bytes
ane) FileInputStream
FileInputStream reads in one byte at a fourth dimension, without any buffering. While it's meant for reading binary files such as images or sound files, it can still be used to read text file. It's similar to reading with FileReader in that you're reading one character at a time as an integer and y'all need to bandage that int to a char to see the ASCII value.
Past default, information technology uses the default character encoding on your system, so I accept provided examples for both the default case, as well equally specifying the encoding explicitly.
1
ii
3
4
five
six
7
8
ix
10
11
12
13
14
15
sixteen
17
eighteen
19
twenty
21
import java.io.File ;
import java.io.FileInputStream ;
import coffee.io.FileNotFoundException ;
import java.io.IOException ;public form ReadFile_FileInputStream_Read {
public static void main( Cord [ ] pArgs) throws FileNotFoundException, IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;attempt ( FileInputStream fileInputStream = new FileInputStream (file) ) {
int singleCharInt;
char singleChar;while ( (singleCharInt = fileInputStream.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;
Organisation.out.print (singleChar) ;
}
}
}
}
ii) BufferedInputStream
BufferedInputStream reads a gear up of bytes all at one time into an internal byte assortment buffer. The buffer size tin can be set explicitly or utilise the default, which is what we'll demonstrate in our case. The default buffer size appears to be 8KB simply I have not explicitly verified this. All performance tests used the default buffer size and then information technology will automatically re-size the buffer when it needs to.
one
2
3
four
5
6
7
8
9
10
eleven
12
xiii
xiv
fifteen
16
17
xviii
xix
20
21
22
import java.io.BufferedInputStream ;
import java.io.File ;
import coffee.io.FileInputStream ;
import java.io.FileNotFoundException ;
import java.io.IOException ;public class ReadFile_BufferedInputStream_Read {
public static void primary( String [ ] pArgs) throws FileNotFoundException, IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;
FileInputStream fileInputStream = new FileInputStream (file) ;attempt ( BufferedInputStream bufferedInputStream = new BufferedInputStream (fileInputStream) ) {
int singleCharInt;
char singleChar;
while ( (singleCharInt = bufferedInputStream.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;
Organisation.out.impress (singleChar) ;
}
}
}
}
New I/O – Reading Text
1a) Files.readAllLines() – Default Encoding
The Files class is part of the new Java I/O classes introduced in jdk1.seven. It only has static utility methods for working with files and directories.
The readAllLines() method that uses the default character encoding was introduced in jdk1.8 so this example will not piece of work in Java seven.
1
2
iii
four
five
6
vii
8
9
ten
11
12
thirteen
14
xv
16
17
import java.io.File ;
import coffee.io.IOException ;
import java.nio.file.Files ;
import java.util.List ;public class ReadFile_Files_ReadAllLines {
public static void primary( Cord [ ] pArgs) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;List fileLinesList = Files.readAllLines (file.toPath ( ) ) ;
for ( String line : fileLinesList) {
System.out.println (line) ;
}
}
}
1b) Files.readAllLines() – Explicit Encoding
1
2
iii
iv
v
6
vii
8
9
10
xi
12
13
xiv
15
16
17
eighteen
19
import java.io.File ;
import java.io.IOException ;
import java.nio.charset.StandardCharsets ;
import java.nio.file.Files ;
import java.util.Listing ;public class ReadFile_Files_ReadAllLines_Encoding {
public static void main( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;//employ UTF-eight encoding
List fileLinesList = Files.readAllLines (file.toPath ( ), StandardCharsets.UTF_8 ) ;for ( String line : fileLinesList) {
System.out.println (line) ;
}
}
}
2a) Files.lines() – Default Encoding
This code was tested to work in Java eight and 9. Java 7 didn't run because of the lack of back up for lambda expressions.
one
2
3
4
v
6
7
8
ix
10
11
12
thirteen
14
fifteen
sixteen
17
import java.io.File ;
import java.io.IOException ;
import java.nio.file.Files ;
import java.util.stream.Stream ;public class ReadFile_Files_Lines {
public static void main( Cord [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;endeavour (Stream linesStream = Files.lines (file.toPath ( ) ) ) {
linesStream.forEach (line -> {
System.out.println (line) ;
} ) ;
}
}
}
2b) Files.lines() – Explicit Encoding
Just similar in the previous example, this lawmaking was tested and works in Java viii and 9 only not in Java vii.
1
2
iii
4
5
6
seven
viii
9
10
xi
12
13
14
xv
16
17
18
import coffee.io.File ;
import java.io.IOException ;
import coffee.nio.charset.StandardCharsets ;
import java.nio.file.Files ;
import java.util.stream.Stream ;public class ReadFile_Files_Lines_Encoding {
public static void main( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;try (Stream linesStream = Files.lines (file.toPath ( ), StandardCharsets.UTF_8 ) ) {
linesStream.forEach (line -> {
Organization.out.println (line) ;
} ) ;
}
}
}
3a) Scanner – Default Encoding
The Scanner class was introduced in jdk1.seven and can be used to read from files or from the console (user input).
ane
two
3
4
5
half-dozen
vii
eight
ix
10
11
12
13
14
fifteen
16
17
18
xix
import java.io.File ;
import java.io.FileNotFoundException ;
import coffee.util.Scanner ;public course ReadFile_Scanner_NextLine {
public static void master( Cord [ ] pArgs) throws FileNotFoundException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;try (Scanner scanner = new Scanner(file) ) {
Cord line;
boolean hasNextLine = false ;
while (hasNextLine = scanner.hasNextLine ( ) ) {
line = scanner.nextLine ( ) ;
System.out.println (line) ;
}
}
}
}
3b) Scanner – Explicit Encoding
1
two
3
4
5
6
7
eight
9
x
11
12
13
xiv
15
16
17
18
19
xx
import coffee.io.File ;
import java.io.FileNotFoundException ;
import coffee.util.Scanner ;public class ReadFile_Scanner_NextLine_Encoding {
public static void chief( String [ ] pArgs) throws FileNotFoundException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;//use UTF-8 encoding
try (Scanner scanner = new Scanner(file, "UTF-8" ) ) {
String line;
boolean hasNextLine = faux ;
while (hasNextLine = scanner.hasNextLine ( ) ) {
line = scanner.nextLine ( ) ;
Organisation.out.println (line) ;
}
}
}
}
New I/O – Reading Bytes
Files.readAllBytes()
Even though the documentation for this method states that "it is not intended for reading in large files" I found this to be the absolute best performing file reading method, even on files as large every bit 1GB.
1
2
3
iv
5
half dozen
seven
8
9
10
11
12
13
14
15
16
17
import coffee.io.File ;
import java.io.IOException ;
import java.nio.file.Files ;public class ReadFile_Files_ReadAllBytes {
public static void main( String [ ] pArgs) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;byte [ ] fileBytes = Files.readAllBytes (file.toPath ( ) ) ;
char singleChar;
for ( byte b : fileBytes) {
singleChar = ( char ) b;
System.out.impress (singleChar) ;
}
}
}
3rd Party I/O – Reading Text
Commons – FileUtils.readLines()
Apache Commons IO is an open source Java library that comes with utility classes for reading and writing text and binary files. I listed it in this article because it can be used instead of the built in Java libraries. The form nosotros're using is FileUtils.
For this article, version 2.half dozen was used which is compatible with JDK i.7+
Notation that you demand to explicitly specify the encoding and that method for using the default encoding has been deprecated.
one
two
iii
four
five
6
7
8
9
10
11
12
13
14
15
xvi
17
18
import coffee.io.File ;
import java.io.IOException ;
import java.util.Listing ;import org.apache.commons.io.FileUtils ;
public grade ReadFile_Commons_FileUtils_ReadLines {
public static void main( String [ ] pArgs) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;List fileLinesList = FileUtils.readLines (file, "UTF-8" ) ;
for ( String line : fileLinesList) {
Arrangement.out.println (line) ;
}
}
}
Guava – Files.readLines()
Google Guava is an open source library that comes with utility classes for common tasks like collections treatment, cache management, IO operations, string processing.
I listed it in this article because it can be used instead of the built in Java libraries and I wanted to compare its performance with the Java built in libraries.
For this article, version 23.0 was used.
I'g not going to examine all the different ways to read files with Guava, since this article is not meant for that. For a more detailed await at all the different means to read and write files with Guava, take a look at Baeldung's in depth article.
When reading a file, Guava requires that the character encoding be set explicitly, just similar Apache Eatables.
Compatibility note: This code was tested successfully on Java viii and 9. I couldn't become information technology to piece of work on Java 7 and kept getting "Unsupported major.minor version 52.0" mistake. Guava has a split up API dr. for Java 7 which uses a slightly dissimilar version of the Files.readLine() method. I thought I could get it to piece of work but I kept getting that error.
one
2
3
4
5
6
7
8
9
10
11
12
13
xiv
15
16
17
eighteen
19
import java.io.File ;
import java.io.IOException ;
import java.util.Listing ;import com.google.common.base.Charsets ;
import com.google.common.io.Files ;public course ReadFile_Guava_Files_ReadLines {
public static void main( String [ ] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;List fileLinesList = Files.readLines (file, Charsets.UTF_8 ) ;
for ( Cord line : fileLinesList) {
System.out.println (line) ;
}
}
}
Performance Testing
Since there are and so many ways to read from a file in Java, a natural question is "What file reading method is the best for my situation?" So I decided to test each of these methods against each other using sample data files of different sizes and timing the results.
Each code sample from this article displays the contents of the file to a string and then to the panel (System.out). However, during the operation tests the Organization.out line was commented out since it would seriously ho-hum downward the performance of each method.
Each performance test measures the time information technology takes to read in the file – line past line, character by character, or byte past byte without displaying anything to the console. I ran each test 5-10 times and took the average and so equally non to let any outliers influence each test. I also ran the default encoding version of each file reading method – i.e. I didn't specify the encoding explicitly.
Dev Setup
The dev environs used for these tests:
- Intel Core i7-3615 QM @2.3 GHz, 8GB RAM
- Windows viii x64
- Eclipse IDE for Java Developers, Oxygen.2 Release (iv.7.2)
- Java SE 9 (jdk-nine.0.4)
Data Files
GitHub doesn't allow pushing files larger than 100 MB, then I couldn't find a practical manner to shop my large test files to let others to replicate my tests. And then instead of storing them, I'm providing the tools I used to generate them so you can create exam files that are like in size to mine. Manifestly they won't be the aforementioned, but you'll generate files that are similar in size as I used in my operation tests.
Random String Generator was used to generate sample text and so I simply copy-pasted to create larger versions of the file. When the file started getting too large to manage inside a text editor, I had to use the command line to merge multiple text files into a larger text file:
copy *.txt sample-1GB.txt
I created the post-obit seven information file sizes to exam each file reading method beyond a range of file sizes:
- 1KB
- 10KB
- 100KB
- 1MB
- 10MB
- 100MB
- 1GB
Performance Summary
In that location were some surprises and some expected results from the operation tests.
As expected, the worst performers were the methods that read in a file character by character or byte by byte. Only what surprised me was that the native Java IO libraries outperformed both 3rd party libraries – Apache Commons IO and Google Guava.
What's more – both Google Guava and Apache Commons IO threw a coffee.lang.OutOfMemoryError when trying to read in the i GB exam file. This also happened with the Files.readAllLines(Path) method simply the remaining 7 methods were able to read in all test files, including the 1GB test file.
The following table summarizes the average time (in milliseconds) each file reading method took to complete. I highlighted the acme three methods in green, the average performing methods in yellow and the worst performing methods in red:
The following chart summarizes the above table but with the following changes:
I removed java.io.FileInputStream.read() from the chart because its performance was so bad it would skew the entire nautical chart and y'all wouldn't come across the other lines properly
I summarized the information from 1KB to 1MB because afterward that, the chart would get as well skewed with and then many nether performers and also some methods threw a coffee.lang.OutOfMemoryError at 1GB
The Winners
The new Coffee I/O libraries (java.nio) had the best overall winner (java.nio.Files.readAllBytes()) but information technology was followed closely behind by BufferedReader.readLine() which was also a proven top performer beyond the board. The other excellent performer was coffee.nio.Files.lines(Path) which had slightly worse numbers for smaller test files but really excelled with the larger test files.
The absolute fastest file reader across all data tests was java.nio.Files.readAllBytes(Path). It was consistently the fastest and fifty-fifty reading a 1GB file merely took about 1 second.
The following chart compares performance for a 100KB test file:
You can encounter that the lowest times were for Files.readAllBytes(), BufferedInputStream.read() and BufferedReader.readLine().
The following chart compares performance for reading a 10MB file. I didn't carp including the bar for FileInputStream.Read() because the functioning was so bad information technology would skew the entire chart and yous couldn't tell how the other methods performed relative to each other:
Files.readAllBytes() really outperforms all other methods and BufferedReader.readLine() is a distant 2nd.
The Losers
Equally expected, the accented worst performer was java.io.FileInputStream.read() which was orders of magnitude slower than its rivals for most tests. FileReader.read() was also a poor performer for the same reason – reading files byte by byte (or character by graphic symbol) instead of with buffers drastically degrades functioning.
Both the Apache Commons IO FileUtils.readLines() and Guava Files.readLines() crashed with an OutOfMemoryError when trying to read the 1GB test file and they were most boilerplate in operation for the remaining test files.
java.nio.Files.readAllLines() as well crashed when trying to read the 1GB exam file merely it performed quite well for smaller file sizes.
Functioning Rankings
Hither's a ranked list of how well each file reading method did, in terms of speed and handling of large files, besides as compatibility with unlike Java versions.
Rank | File Reading Method |
---|---|
one | java.nio.file.Files.readAllBytes() |
2 | coffee.io.BufferedFileReader.readLine() |
3 | java.nio.file.Files.lines() |
four | java.io.BufferedInputStream.read() |
5 | java.util.Scanner.nextLine() |
six | coffee.nio.file.Files.readAllLines() |
7 | org.apache.commons.io.FileUtils.readLines() |
8 | com.google.common.io.Files.readLines() |
9 | coffee.io.FileReader.read() |
10 | java.io.FileInputStream.Read() |
Decision
I tried to present a comprehensive set of methods for reading files in Coffee, both text and binary. Nosotros looked at 15 unlike ways of reading files in Java and we ran operation tests to meet which methods are the fastest.
The new Java IO library (java.nio) proved to be a great performer only so was the classic BufferedReader.
Source: https://funnelgarden.com/java_read_file/
Post a Comment for "Write a Java Program That Reads Sales Data From File"