Skip to main content

How to Read Data From a File in Python

A professional engineer and writer helping people find creative ways to solve everyday problems.

how-to-read-data-from-a-file-in-python

This article will provide you a detailed walkthrough on how you can use Python to read large text files. A fully functional, ready-to-execute code snippet is included in this walkthrough to get you up to speed in 10 minutes after reading this article.

Let us first familiarize you with the advanced data structures available in Python that we will use to store and process data from files, just in case you are new to Python programming.

Advanced Data-Structures in Python

Python has two advanced and powerful data structures that make it superior in functionality to C/C++. This makes it an ideal language for numerical data intensive applications competing with Matlab.

ndarrays and dataframes

ndarrays and dataframes

NumPy Arrays

Numpy arrays or ndarrays are arrays that can be scaled up to 'n' dimensions. They are best used as two-dimensional array structures to represent matrices. The Numpy module itself contains powerful function libraries for a variety of numerical and algebraic operations.

Pandas Data-Frames

Data-frames build upon the two-dimensional ndarrays to add extra functionality. The two-dimensional ndarray now has a separate column for array index, and all column headers are now individually addressable. More importantly, each column can now hold a different data type (int, float, or string).

Text File to be Read by Python Code

Let's move forward to the tutorial and acquaint you with the demonstration text file.

It is a 14 rows x 20 column data table saved as a txt file. It contains data in all three data formats: int, float, and string. The file name is BusData.

Fig 1: Data file that is to be parsed with our code.

Fig 1: Data file that is to be parsed with our code.

Next, view the code snippet given below, to read this file, and we will explain this code line by line in the following section.

Python Code to Read Data From Text File

# Copyrights © Ali Khan (Author)
# Permitted to use with attribution

import numpy
import pandas

def Read ():
    
    global BusData, BusDataList, BusDataArray, BusDataReshaped
    
    X = open('C:/Users/user/OneDrive - Washington State University (email.wsu.edu)/EE - 521/BusData.txt', 'r')
    BusData = X.read()                      
    
    BusDataList = BusData.split()
    BusDataArray = numpy.array(BusDataList)
    BusDataReshaped = BusDataArray.reshape(14,20) #Make a matrix out of 1D Array

Read()

BusDataFrame = pandas.DataFrame(BusDataReshaped, columns =['BusNumber', 'Bus', 'Busx', 'BusClass', 'Unused1', 'Unused2', 'BusType', 'Unused3', 'Unused4', 'BusLoadMW', 'BusLoadMVAR', 'BusGenMW', 'BusGenMVAR', 'InitialVoltAngle', 'InitialVolt', 'Qlimit+', 'Qlimit-', 'Unused5', 'Shunt', 'Unused'])

print(BusDataFrame.BusClass[6])

Code Explanation

Here is an explanation of what is going on in the code.

Initialization: Import Numpy and Pandas

Line 4: Import the numpy package in the project.

Line 5: Import the pandas package in the project.

Line 7: Start a function definition Read(). It is always a good practice to break your code in functions.

Line 9: Define global variables.

In Python, only global variables will appear in the variable explorer, and they can be referenced outside functions. For demonstration, I have defined all four as global. Otherwise, only BusDataReshaped variable should have been declared global.

Fig 2: Spyder variable explorer showing details of every variable.

Fig 2: Spyder variable explorer showing details of every variable.

Open and Read the Target File

Line 11: The open() function points to the directory location of file BusData.txt. Definition is assigned to random variable X.

Line 12: read() function reads the entire file as a string and assigns it to variable BusData. Fig 2 shows that BusData is now a string with 1792 characters.

Split the File Character-wise

Line 14: split() function in Python, splits the string into a list at the points where their is space. The data is now converted into a list of 280 elements and assigned to variable BusDataList. Reference Fig 2.

Convert to Numpy Array

Line 15: The list is converted into a numpy array by the numpy.array() function. Fig 2 shows that BusDataArray is now an array of datatype string and has 280 elements.

The problem is that it still does not look like our original data table in the file. It needs to be reshaped.

Line 16: The numpy.reshape() function from the numpy package reshapes the array into our desired dimensions of 14 x 20. Fig 2 shows that BusDataReshaped variable is now an ndarray and has dimensions 14 x 20.

As we can see, the data type of all the values is still string, but remember, the original file had integers and floats in the data as well. To make sure that all the data is treated according to its correct data type we need to convert it into a Pandas Data-frame.

Fig 3: BusDataFrame, showing its first 13 columns in variable explorer

Fig 3: BusDataFrame, showing its first 13 columns in variable explorer

Converting a Numpy Array to Data-frame

Line 20: This line finally does the job of converting an array of strings into a pandas dataframe.

Pandas.dataframe() function takes the reshaped numpy array, and the names of all 20 column headers as inputs. Fig 3 shows the formed dataframe and Fig 2 verifies this in the variable explorer.

Fig 4: Print result

Fig 4: Print result

Referencing Values of a Pandas Dataframe

Line 22: Values of this dataframe can be very conveniently accessed by the dataframe.columnheader.[index] syntax.

A check of variable types will show that all the three data types of string, integer and floats of each column are automatically preserved by the dataframe.

This article is accurate and true to the best of the author’s knowledge. Content is for informational or entertainment purposes only and does not substitute for personal counsel or professional advice in business, financial, legal, or technical matters.

© 2022 StormsHalted