8 Ways to Use Python With Excel - Owlcation - Education
Updated date:

8 Ways to Use Python With Excel

Kevin is data science and data engineer. He works for a large consulting company in Montreal, Canada. He has over 20 years experience.

Python and Excel are both powerful tools for data exploration and analysis. They're both powerful, and even more so together. There are different libraries that have been created over the past several years to integrate Excel and Python or vice versa. This article will describe them, provide details to acquire and install them and finally brief instructions to help you get started using them. The libraries are listed below.

Excel/Python Integration Options

  1. Openpyxl
  2. Pyxll
  3. Xlrd
  4. Xlwt
  5. Xlutils
  6. Pandas
  7. Pywin32
  8. Xlsxwriter

1. Openpyxl

Openpyxl is an open source library that supports OOXML standard. OOXML standards for open office extensible markup language. Openpyxl can be used with any version of Excel that supports this standard; meaning Excel 2010 (2007) to the present (currently Excel 2016). I haven’t tried or tested Openpyxl with Office 365. However, alternative spreadsheet application like Office Libre Calc or Open Office Calc which support the OOXML standard can also use the library to work with xlsx files.

Openpyxl support most Excel functionality or APIs, including read and write to files, charting, working with pivot tables, parsing formulas, using filters and sorts, creating tables, styling to name a few of the most used. In terms of data wrangling, the library works with datasets both large and small, however, you will see a performance degradation on very large datasets. To work with very large datasets, you will need to use the openpyxl.worksheet._read_only.ReadOnlyWorksheet API.

openpyxl.worksheet._read_only.ReadOnlyWorksheet is read only

Depending on memory availability of your computer, you can use this function to load large datasets into memory or into Anaconda or Jupyter notebook for data analysis or data wrangling. You cannot interface with Excel directly or interactively.

To write back your very large dataset, you use the openpyxl.worksheet._write_only.WriteOnlyWorksheet API to dump the data back into Excel.

Openpyxl can be installed into any Python support editor or IDE, like Anaconda or IPython, Jupyter or any other that you currently use. Openpyxl cannot be used directly inside of Excel.

Note: for this examples I am using Jupyter from the Anaconda suite which can be downloaded and installed from this address: https://www.anaconda.com/distribution/ or you can install just the Jupyter editor from : https://jupyter.org/

Installation

To install from command line (command or powershell on Windows, or Terminal on OSX):

Pip install openpyxl

Create Workbook

To use to create an Excel workbook and worksheet:

from openpyxl import Workbook

#create workbook
wb = Workbook()

#create excel file
xl_file = 'tut_openpyxl.xlsx'

#get the active worksheet (e.g. sheet1)
ws1 = wb.active

#add content to the sheet
for row in range(1, 40):
    ws1.append(range(600))

#save the file
wb.save(filename = xl_file)
  • In the code above, we start by importing the Workbook object from the openpyxl library
  • Next we define a workbook object
  • Then we create an Excel file to store our data
  • From the open excel Workbook, we get a handle on the active Worksheet (ws1)
  • Afterwards, add some content using a “for” loop
  • And finally save the file.

The two following screenshots show the execution of the tut_openpyxl.py file and save.

Fig 1: Code

Fig 1: Code

Fig2: Output in Excel

Fig2: Output in Excel

Read Data From Excel

The next example will demonstrate opening and reading data from an Excel file

from openpyxl import load_workbook

#get handle on existing file
wk = load_workbook(filename='countries.xlsx')
#get active worksheet or wk['some_worksheet']
ws = wk.active

#loop through range values
for t in range(1,20): 
    range = 'A'+str(t)    
    print(ws[range].value)
  • This is a basic example to read from an Excel file
  • Import the load_workbook class from the openpyxl library
  • Get a handle on the open workbook
  • Get the active worksheet or a named worksheet using workbook[‘some_worksheet’]
  • Finally, loop through the values on the sheet
Fig 3: Read in Data

Fig 3: Read in Data

2. Pyxll

The pyxll package is a commercial offering that can be added or integrated into Excel. A bit like VBA. The pyxll package cannot be installed like other standard Python packages since pyxll is an Excel add-in. Pyxll supports Excel versions from 97-2003 up to the present.

Installation

Installation instruction are located here: https://www.pyxll.com/docs/userguide/installation.html

Usage

The pyxll website contains several examples on using pyxll in Excel. They make use of decorators and functions to interact with a worksheet, menu and other objects in a workbook.

3. Xlrd

Another library is xlrd and its companion xlwt below. Xlrd is used to read data from an Excel Workbook. Xlrd was designed to work with older versions of Excel with the “xls” extension.

Installation

Installation of the xlrd library is done with pip as:

pip install xlrd
Import xlrd
xlrd.open_workbook(excelFilePath)
sheetNames = xlWorkbook.sheet_names()
xlSheet = xlWorkbook.sheet_by_name(sheetNames[0])

# Or grab the first sheet by index 
xlSheet = xlWorkbook.sheet_by_index(0)

# Get the first row of data
row = xlSheet.row(0)

#to enumerate through all columns and rows
#get the number of rows in the sheet
numColumns = xlSheet.ncols 
  
for rowIdx in range(0, xlSheet.nrows):    # Iterate through rows
    
    print ('Row: %s' % rowIdx) # Print the row number
    for colIdx in range(0, numColumns):  # Iterate through the columns
        cell = xlSheet.cell(rowIdx, colIdx) # Get cell object by row, col
        print ('Column: [%s] cell: [%s]' % (colIdx, cell))

Usage

To open a workbook to read in the data from a worksheet follow these simple steps as in the code snippet below. The excelFilePath parameter is the path to the Excel file. The path value should be listed in double-quotes.

This brief example only covers the basic principle of opening a workbook and reading the data. The complete documentation can be found here: https://xlrd.readthedocs.io/en/latest/index.html

Of course, xlrd, as the name suggests, can only read in data from an Excel workbook. The library doesn’t provide the APIs to write to an Excel file. Luckily, xlrd has a partner called xlwt which is the next library to discuss.

4. Xlwt

The xlwt is designed to work with Excel files versions 95 through to 2003, which was the binary format prior to the OOXML (Open Office XML) format that was introduced with Excel 2007. The xlwt library works in candem with the xlrd library dscussed above.

Installation

The installation process is simple and straightforward. As with most other Python libraries, you can install using the pip utility as follows:

pip install xlwt

Usage

The following code snippet, adapted from the Read the Docs site on xlwt, provides the basic instructions on writing data to an Excel Worksheet, adding styling and using a formula. The syntax is easy to follow.

import xlwt
from datetime import datetime

style0 = xlwt.easyxf('font: name Times New Roman, color-index red, bold on',
    num_format_str='#,##0.00')
style1 = xlwt.easyxf(num_format_str='D-MMM-YY')

wb = xlwt.Workbook()
ws = wb.add_sheet('Hello world')
ws.write(0, 0, 999.99, style0)
ws.write(1, 0, datetime.now(), style1)
ws.write(2, 0, 1)
ws.write(2, 1, 1)
ws.write(2, 2, xlwt.Formula("A3+B3"))
wb.save(HW.xls')

The write function, write(r, c, label='', style=<xlwt.Style.XFStyle object>) , provides the main capabilities to write content to an Excel spreadsheet, with formulas being an alternate method. The “r” and “c” parameters are for the row and column respectfully.

Complete documentation on using this Python package is located here: https://xlwt.readthedocs.io/en/latest/ . As I mentioned in the opening paragraph, xlwt and xlrd for that matter, are for xls Excel formats (95-2003). For Excel OOXML, you should use other libraries discussed in this article.

5. Xlutils

The xlutils Python is a continuation of xlrd and xlwt. The package provides more extensive setr of APIs for working with xls based Excel files. Documentation on the package is found here: https://pypi.org/project/xlutils/. To use the package you also need to install the xlrd and xlwt packages.

Installation

The xlutils package is installed using pip:

pip install xlutils

6. Pandas

Pandas is a very powerful Python library used for data analysis, manipulation and exploration. It is one of the pillars of data engineering and data science. Once of the main tools or API in Pandas is the DataFrame, which is an in-memory table of data. Pandas can output the contents of the DataFrame to Excel using either openpyxl or xlsxwriter for OOXML files and xlwt (above) for xls file formats as its writing engine. You need to install these packages to work with Pandas. You don’t have to import them into your Python script to use them.

Installation

To install pandas, execute this command from the command line interface window, or terminal if you are using OSX:

pip install xlsxwriterp
pip install pandas

Usage

import pandas as pd

# Create a Pandas dataframe from the data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
# Create a Pandas Excel writer using XlsxWriter as the engine or openpyxl and xlwt for older versions of Excel.
writer = pd.ExcelWriter('pandas xl test.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Test')
# Close the Pandas Excel writer and output the Excel file.
writer.save()

Here is a screenshot of the script, the VS Code execution and Excel file that is created as a result.

Fig 4: Pandas script in VS Code

Fig 4: Pandas script in VS Code

Fig 5: Pandas output in Excel

Fig 5: Pandas output in Excel

7. Xlsxwriter

The xlsxwriter package supports the OOXML format Excel, which means 2007 onwards. It is a full feature package including formatting, cell manipulation, formulas, pivot tables, charts, filters, data validation and drop-down list, memory optimization and images to name of the extensive features.

As mentioned previously, it is integrated with Pandas as well which makes it a wicked combination.

The complete documentation is located at their site here: https://xlsxwriter.readthedocs.io/

Installation

pip install xlsxwriter

Usage

import xlsxwriter

# create a Excel file
xlWb = xlsxwriter.Workbook('simpleXl.xlsx')
xlWks = xlWb.add_worksheet()

# add some data
groceries = (
    ['apples', 2.00],
    ['bread',  1.00],
    ['meat',  15.00],
    ['celery', 1.00],
)
row = 0
col = 0
# add groceries data to sheet
for item, cost in (groceries):
    xlWks.write(row, col,     item)
    xlWks.write(row, col + 1, cost)
    row += 1

# Write a total using a formula.
xlWks.write(row, 0, 'Total')
xlWks.write(row, 1, '=SUM(B1:B4)')

xlWb.close()
xlWb.close()

The following script starts by importing the xlsxwriter package from PYPI repository using pip. Next, define and create a workbook and Excel file. Then we define a worksheet object, xlWks, and add it to the Workbook.

For the sake of the example, I define a dictionary object, but can be anything like a list, a Pandas dataframe, data imported from some external source. I add the data to the Worksheet using an interation and add a simple SUM formula before saving and closing the file.

The following screenshot is the result in Excel.

Fig 6: XLSXWriter in Excel

Fig 6: XLSXWriter in Excel

8. Pywin32

This final Python package is not specifically for Excel. Rather, it is a Python wrapper for the Windows API which provides access to COM (Common Object Model). COM is a common interface to all Windows based applications, Microsoft Office including Excel.

Documentation on the pywin32 package is located here: https://github.com/mhammond/pywin32 and here as well: http://timgolden.me.uk/pywin32-docs/contents.html

Installation

pip install pywin32

Usage

This is a simple example of using COM to automate the creation of an Excel file, adding a worksheet and some data as well as adding a formula and saving the file.

import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')


wb = excel.Workbooks.Add()
wks = wb.Sheets.Add()
wks.Name = "test"
groceries = (
    ['apples', 2.00],
    ['bread',  1.00],
    ['meat',  15.00],
    ['celery', 1.00],
)

row=1
col=1
for item, cost in (groceries):
    wks.Cells(row,col).Value = item
    wks.Cells(row,col+1).Value = cost
   
    row += 1

wks.Cells(row, 1).Value = "Total"
wks.Cells(row, 2).Value =  '=SUM(B1:B4)'
wb.SaveAs('C:\\Users\\kevin\\dev\\pyInExcel\\simplewin32.xlsx')
excel.Application.Quit()
Fig 7: Pywin32 output in Excel

Fig 7: Pywin32 output in Excel

Conclusion

There you have it: eight different Python packages for interfacing with Excel.

This article is accurate and true to the best of the author’s knowledge. Content is for informational or entertainment purposes only and does not substitute for personal counsel or professional advice in business, financial, legal, or technical matters.

© 2020 Kevin Languedoc

Comments

Kevin Languedoc (author) from Canada on March 31, 2020:

There are different ways to approach this. You can use an ORM library like https://www.sqlalchemy.org/ or a full stack framework like Python or Flask.

platinumOwl4 on March 30, 2020:

This is a great article packed with fantastic information. However, what I need is a library that will help me create a database for books and music.

Thanks in advance platimumowl4