Read and write files is the most common IO operation. Python has built-in functions to reads and writes files, and the usage is compatible with C. The ability to read and write files on disk is provided by the operating system, modern operating systems do not allow programs to operate directly on disks. So, to read and write a file is to ask the operating system to open a file object (usually called a file descriptor), and then, through the interface provided by the operating system, read the data from the file object (read file), or write data to the file object (write file).
1. Read File Example.
To open a file object in file read mode, use Python’s built-in
open() function, passing in the file name and the operating mode identifier.
>>> f = open('/Users/jerry/hello.txt', 'r')
'r' means open file in read-only mode.
If the file does not exist, the
open() function throws an IOError and gives you error codes and detailed information to tell you that the file does not exist.
>>> f=open('/Users/jerry/hellooo.txt', 'r') Traceback (most recent call last): File "<stdin>", line 1, in <module> FileNotFoundError: [Errno 2] No such file or directory: '/Users/jerry/hellooo.txt'
If the file opens successfully, you can use the
read() method to read the entire contents of the file at once. Python reads the content into memory and uses a string object to save the content.
>>> f.read() 'Hello, world!'
The final step is to call the
close() method to close the file. The file must be closed after it is used, because the file object will occupy the operating system’s resources, and the number of files that the operating system can open at the same time is limited also.
Since IOError can occur when a file is read or write, so the
f.close() may not be invoked if an error occurs. So, to make sure that the file is closed correctly whether or not something goes wrong, we can use
try... finally code block to avoid the issue.
try: f = open('/Document/python/test.txt', 'r') print(f.read()) finally: if f: f.close()
But it’s too verbose to do this every time, so Python introduced the
with statement to automatically call the
close() method for us.
with open('/Document/python/abc.txt', 'r') as f: print(f.read())
This is the same as the previous
try... finally code block, but the code is simpler, and you don’t have to call the
read() method will read the entire contents of the file at once, and if the file has 10 gigabytes, the memory will explode. So, to be safe, you can call the
read(size) method repeatedly, reading up to size bytes at a time. In addition, a call to
readline() reads one line at a time, calls
readlines() reads all lines at once and returns a list, each list item is one line text.
If the file size is small,
read() is easiest to read all content at once. If you cannot determine the file size, repeatedly call
read(size) for security. If it’s a configuration file, you had better use
for line in f.readlines(): print(line.strip()) # Remove '\n' at the line tail.
1.1 File-like Object
Objects such as those returned by the
open() function with a
read() method is called file-like objects in Python. In addition to the file, the object can be byte streams in memory, network streams, custom streams, and so on. File-like objects do not require inheritance from a particular class, just be required to have a
StringIO is a file-like object created in memory, often used as a temporary buffer.
1.2 Binary File
All of the above examples defaults mentioned to read text files, and are utf-8 encoded text files. To read binary files, such as images, video, and so on, open the file in
>>> f = open('/Document/Images/test.jpg', 'rb') >>> f.read() b'\xdd\xf8\xee\xf1\xff\x18Exif\xff\xee...' # Hexadecimal bytes
1.3 Character Encoding
To read a text file that is not utf-8 encoded, you need to pass in the
encoding parameters to the
open() function, for example, to read the GBK-encoded file.
>>> f = open('/Document/text/gbk_test.txt', 'r', encoding='gbk') >>> f.read() '开发'
You may encounter
UnicodeDecodeError when you come across files that are not coded properly, because there may be some illegally encoded characters in the text file. In this case, the
open() function also has an
errors parameter, indicating what to do if an encoding error is encountered. The easiest way is to just ignore the error.
>>> f = open('/Document/dev/gbk.txt', 'r', encoding='gbk', errors='ignore')
1.4 How to read the last line or specified lines of data in a file with Python.
- When processing a file, a common requirement is to read the specified line of the file, so how to implement it?
- If we read the special line of a file with the below source code, it is not perfect, because if the file is large, the code
lines= fp.readlines ()will cause a lot of time and space costs.
with open('a.log', 'r') as fp: lines = fp.readlines() last_line = lines[-1]
- The solution is to locate the file pointer to the end of the file with the file object
seekmethod, and then try to find out the length of a line from the end of the file, so as to read the last line. The file object’s
seekmethod has two parameters. The first parameter is the seek offset value, it specifies the seek pointer position in the file. The second parameter has three values,
0means the offset value is an absolute position value from the beginning of the file,
1means the offset value is a relative position value to the current pointer position,
2means the seek action will start from the end of the file, in this case, the offset value should be a negative value (-10 means move to the 10th characters from the end of the file).
''' This function will return the last line data of a file. If the file is empty then it will return None. ''' def get_file_last_line(self, filename): try: # Get file size. file_size = os.path.getsize(filename) # If file_size == 0, that means the file is empty. if file_size == 0: return None else: # Open the file in 'rb' mode for binary file reading. with open(filename, 'rb') as fp: # Set an initial offset value. The value is a nigative number because we will read the characters from the end of the file to the beginning of the file. offset = -10 # When the file seek method is invoked, the offet value will increase, but the offset value can not bigger than the file size. while -offset < filesize: # Invoke the file object's seek method to read data from the end of the file to the beginning of the file. fp.seek(offset, 2) # Read the lines from the end of the file. lines = fp.readlines() # If we have read two lines. if len(lines) >= 2: # Get the last line of the file and return it. return lines[-1] else: # Make offset value bigger to read more file data. offset *= 2 # If the above source code can not get the last line in a file, then seek to the beginning of the file. fp.seek(0) # Read all lines data in the file, this will occupy a lot of memeory. lines = fp.readlines() # Return the last line of the file. return lines[-1] except FileNotFoundError: print(filename + ' not found!') return None
2. Write File Example.
Write a file is similar as read a file, except that when you call the
open() function, you pass in the mode identifier
'w' to write text or
'wb' to write binary data.
>>> f = open('/Users/jerry/hello_world.txt', 'w') >>> f.write('Hello, world!') >>> f.close()
You can call
write() repeatedly to write a file, but be sure to call
f.close() to close the file. When we write a file, the operating system doesn’t write the data to disk immediately, but instead caches it in memory and writes it when it’s free. Only when the
close() method is called does the operating system guarantee that all unwritten data is written to disk. The consequence of forgetting to call
close() is that some data that unwritten may be lost. So, it’s safe to use the
with open('/Users/jerry/hello_world.txt', 'w') as f: f.write('Hello, world!')
To write a special encoded string into a text file, pass the encoding parameter into the
open() method, the
open method will automatically convert the string to the specified encoding.
When writing a file in
'w' mode, if the file already exists, it overrides it (equivalent to write a new file after delete it). What if we want to append text to the end of the file? You can pass in
'a' mode to append text to the end of the file.