Python Read Big File Example

Read file in python is very simple, you can use read and readlines function to easily read them. But there are also some tricks in using them. This article will tell you how to use them correctly.

1. read() and readlines().

You often see the pairs of read() and readlines() functions in a handy tutorial for searching python read-write files. So we’ll often see the following code.

with open(file_path, 'rb') as f:
    for line in f.readlines():
        print(line)

with open(file_path, 'rb') as f:
    print(f.read())

This does not cause any exceptions when reading small files, but once reading large files, it can easily lead to memory leak MemoryError.

1.1 read([size]).

The read([size]) method reads size bytes from the current location of the file. If do not specify the value of parameter size, it will read until the end of the file. All the data will be saved in one string object.

1.2 readlines().

This method reads one line at a time, so it takes up less memory to read and is more suitable for large files. But readlines will construct a list object to store each string line. So everything is saved in memory and memory overflow errors may occurs.

2. How To Correctly Use read and readlines.

It is very dangerous to write the above code in a real running system. So let’s see how we can use it correctly.

2.1 read binary file.

If it’s a binary file, the recommended way is to specify how many bytes the buffer read. Obviously the larger the buffer, the faster the read.

with open(file_path, 'rb') as f:
    while True:
        buf = f.read(1024)
        if buf: 
            ...
        else:
            break

2.2 read text file.

If it is a text file, you can use the readline method and directly iterate the file, read one line at a time, the efficiency is relatively low.

with open(file_path, 'rb') as f:
    while True:
        line = f.readline()
        if line: 
            print(line)
        else:
            break