8 Ways to Use Python With Excel

Updated on April 13, 2020
klanguedoc profile image

Kevin is data science and data engineer. He works for a large consulting company in Montreal, Canada. He has over 20 years experience.

Python and Excel are both powerful tools for data exploration and analysis. They're both powerful, and even more so together. There are different libraries that have been created over the past several years to integrate Excel and Python or vice versa. This article will describe them, provide details to acquire and install them and finally brief instructions to help you get started using them. The libraries are listed below.

Excel/Python Integration Options

  1. Openpyxl
  2. Pyxll
  3. Xlrd
  4. Xlwt
  5. Xlutils
  6. Pandas
  7. Pywin32
  8. Xlsxwriter

1. Openpyxl

Openpyxl is an open source library that supports OOXML standard. OOXML standards for open office extensible markup language. Openpyxl can be used with any version of Excel that supports this standard; meaning Excel 2010 (2007) to the present (currently Excel 2016). I haven’t tried or tested Openpyxl with Office 365. However, alternative spreadsheet application like Office Libre Calc or Open Office Calc which support the OOXML standard can also use the library to work with xlsx files.

Openpyxl support most Excel functionality or APIs, including read and write to files, charting, working with pivot tables, parsing formulas, using filters and sorts, creating tables, styling to name a few of the most used. In terms of data wrangling, the library works with datasets both large and small, however, you will see a performance degradation on very large datasets. To work with very large datasets, you will need to use the openpyxl.worksheet._read_only.ReadOnlyWorksheet API.

openpyxl.worksheet._read_only.ReadOnlyWorksheet is read only

Depending on memory availability of your computer, you can use this function to load large datasets into memory or into Anaconda or Jupyter notebook for data analysis or data wrangling. You cannot interface with Excel directly or interactively.

To write back your very large dataset, you use the openpyxl.worksheet._write_only.WriteOnlyWorksheet API to dump the data back into Excel.

Openpyxl can be installed into any Python support editor or IDE, like Anaconda or IPython, Jupyter or any other that you currently use. Openpyxl cannot be used directly inside of Excel.

Note: for this examples I am using Jupyter from the Anaconda suite which can be downloaded and installed from this address: https://www.anaconda.com/distribution/ or you can install just the Jupyter editor from : https://jupyter.org/

Installation

To install from command line (command or powershell on Windows, or Terminal on OSX):

Pip install openpyxl

Create Workbook

To use to create an Excel workbook and worksheet:

from openpyxl import Workbook

#create workbook
wb = Workbook()

#create excel file
xl_file = 'tut_openpyxl.xlsx'

#get the active worksheet (e.g. sheet1)
ws1 = wb.active

#add content to the sheet
for row in range(1, 40):
    ws1.append(range(600))

#save the file
wb.save(filename = xl_file)
  • In the code above, we start by importing the Workbook object from the openpyxl library
  • Next we define a workbook object
  • Then we create an Excel file to store our data
  • From the open excel Workbook, we get a handle on the active Worksheet (ws1)
  • Afterwards, add some content using a “for” loop
  • And finally save the file.

The two following screenshots show the execution of the tut_openpyxl.py file and save.

Fig 1: Code
Fig 1: Code
Fig2: Output in Excel
Fig2: Output in Excel

Read Data From Excel

The next example will demonstrate opening and reading data from an Excel file

from openpyxl import load_workbook

#get handle on existing file
wk = load_workbook(filename='countries.xlsx')
#get active worksheet or wk['some_worksheet']
ws = wk.active

#loop through range values
for t in range(1,20): 
    range = 'A'+str(t)    
    print(ws[range].value)
  • This is a basic example to read from an Excel file
  • Import the load_workbook class from the openpyxl library
  • Get a handle on the open workbook
  • Get the active worksheet or a named worksheet using workbook[‘some_worksheet’]
  • Finally, loop through the values on the sheet

Fig 3: Read in Data
Fig 3: Read in Data

2. Pyxll

The pyxll package is a commercial offering that can be added or integrated into Excel. A bit like VBA. The pyxll package cannot be installed like other standard Python packages since pyxll is an Excel add-in. Pyxll supports Excel versions from 97-2003 up to the present.

Installation

Installation instruction are located here: https://www.pyxll.com/docs/userguide/installation.html

Usage

The pyxll website contains several examples on using pyxll in Excel. They make use of decorators and functions to interact with a worksheet, menu and other objects in a workbook.

3. Xlrd

Another library is xlrd and its companion xlwt below. Xlrd is used to read data from an Excel Workbook. Xlrd was designed to work with older versions of Excel with the “xls” extension.

Installation

Installation of the xlrd library is done with pip as:

pip install xlrd
Import xlrd
xlrd.open_workbook(excelFilePath)
sheetNames = xlWorkbook.sheet_names()
xlSheet = xlWorkbook.sheet_by_name(sheetNames[0])

# Or grab the first sheet by index 
xlSheet = xlWorkbook.sheet_by_index(0)

# Get the first row of data
row = xlSheet.row(0)

#to enumerate through all columns and rows
#get the number of rows in the sheet
numColumns = xlSheet.ncols 
  
for rowIdx in range(0, xlSheet.nrows):    # Iterate through rows
    
    print ('Row: %s' % rowIdx) # Print the row number
    for colIdx in range(0, numColumns):  # Iterate through the columns
        cell = xlSheet.cell(rowIdx, colIdx) # Get cell object by row, col
        print ('Column: [%s] cell: [%s]' % (colIdx, cell))

Usage

To open a workbook to read in the data from a worksheet follow these simple steps as in the code snippet below. The excelFilePath parameter is the path to the Excel file. The path value should be listed in double-quotes.

This brief example only covers the basic principle of opening a workbook and reading the data. The complete documentation can be found here: https://xlrd.readthedocs.io/en/latest/index.html

Of course, xlrd, as the name suggests, can only read in data from an Excel workbook. The library doesn’t provide the APIs to write to an Excel file. Luckily, xlrd has a partner called xlwt which is the next library to discuss.

4. Xlwt

The xlwt is designed to work with Excel files versions 95 through to 2003, which was the binary format prior to the OOXML (Open Office XML) format that was introduced with Excel 2007. The xlwt library works in candem with the xlrd library dscussed above.

Installation

The installation process is simple and straightforward. As with most other Python libraries, you can install using the pip utility as follows:

pip install xlwt

Usage

The following code snippet, adapted from the Read the Docs site on xlwt, provides the basic instructions on writing data to an Excel Worksheet, adding styling and using a formula. The syntax is easy to follow.

import xlwt
from datetime import datetime

style0 = xlwt.easyxf('font: name Times New Roman, color-index red, bold on',
    num_format_str='#,##0.00')
style1 = xlwt.easyxf(num_format_str='D-MMM-YY')

wb = xlwt.Workbook()
ws = wb.add_sheet('Hello world')
ws.write(0, 0, 999.99, style0)
ws.write(1, 0, datetime.now(), style1)
ws.write(2, 0, 1)
ws.write(2, 1, 1)
ws.write(2, 2, xlwt.Formula("A3+B3"))
wb.save(HW.xls')

The write function, write(r, c, label='', style=<xlwt.Style.XFStyle object>) , provides the main capabilities to write content to an Excel spreadsheet, with formulas being an alternate method. The “r” and “c” parameters are for the row and column respectfully.

Complete documentation on using this Python package is located here: https://xlwt.readthedocs.io/en/latest/ . As I mentioned in the opening paragraph, xlwt and xlrd for that matter, are for xls Excel formats (95-2003). For Excel OOXML, you should use other libraries discussed in this article.

5. Xlutils

The xlutils Python is a continuation of xlrd and xlwt. The package provides more extensive setr of APIs for working with xls based Excel files. Documentation on the package is found here: https://pypi.org/project/xlutils/. To use the package you also need to install the xlrd and xlwt packages.

Installation

The xlutils package is installed using pip:

pip install xlutils

6. Pandas

Pandas is a very powerful Python library used for data analysis, manipulation and exploration. It is one of the pillars of data engineering and data science. Once of the main tools or API in Pandas is the DataFrame, which is an in-memory table of data. Pandas can output the contents of the DataFrame to Excel using either openpyxl or xlsxwriter for OOXML files and xlwt (above) for xls file formats as its writing engine. You need to install these packages to work with Pandas. You don’t have to import them into your Python script to use them.

Installation

To install pandas, execute this command from the command line interface window, or terminal if you are using OSX:

pip install xlsxwriterp
pip install pandas

Usage

import pandas as pd

# Create a Pandas dataframe from the data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
# Create a Pandas Excel writer using XlsxWriter as the engine or openpyxl and xlwt for older versions of Excel.
writer = pd.ExcelWriter('pandas xl test.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Test')
# Close the Pandas Excel writer and output the Excel file.
writer.save()

Here is a screenshot of the script, the VS Code execution and Excel file that is created as a result.

Fig 4: Pandas script in VS Code
Fig 4: Pandas script in VS Code
Fig 5: Pandas output in Excel
Fig 5: Pandas output in Excel

7. Xlsxwriter

The xlsxwriter package supports the OOXML format Excel, which means 2007 onwards. It is a full feature package including formatting, cell manipulation, formulas, pivot tables, charts, filters, data validation and drop-down list, memory optimization and images to name of the extensive features.

As mentioned previously, it is integrated with Pandas as well which makes it a wicked combination.

The complete documentation is located at their site here: https://xlsxwriter.readthedocs.io/

Installation

pip install xlsxwriter

Usage

import xlsxwriter

# create a Excel file
xlWb = xlsxwriter.Workbook('simpleXl.xlsx')
xlWks = xlWb.add_worksheet()

# add some data
groceries = (
    ['apples', 2.00],
    ['bread',  1.00],
    ['meat',  15.00],
    ['celery', 1.00],
)
row = 0
col = 0
# add groceries data to sheet
for item, cost in (groceries):
    xlWks.write(row, col,     item)
    xlWks.write(row, col + 1, cost)
    row += 1

# Write a total using a formula.
xlWks.write(row, 0, 'Total')
xlWks.write(row, 1, '=SUM(B1:B4)')

xlWb.close()
xlWb.close()

The following script starts by importing the xlsxwriter package from PYPI repository using pip. Next, define and create a workbook and Excel file. Then we define a worksheet object, xlWks, and add it to the Workbook.

For the sake of the example, I define a dictionary object, but can be anything like a list, a Pandas dataframe, data imported from some external source. I add the data to the Worksheet using an interation and add a simple SUM formula before saving and closing the file.

The following screenshot is the result in Excel.

Fig 6: XLSXWriter in Excel
Fig 6: XLSXWriter in Excel

8. Pywin32

This final Python package is not specifically for Excel. Rather, it is a Python wrapper for the Windows API which provides access to COM (Common Object Model). COM is a common interface to all Windows based applications, Microsoft Office including Excel.

Documentation on the pywin32 package is located here: https://github.com/mhammond/pywin32 and here as well: http://timgolden.me.uk/pywin32-docs/contents.html

Installation

pip install pywin32

Usage

This is a simple example of using COM to automate the creation of an Excel file, adding a worksheet and some data as well as adding a formula and saving the file.

import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')


wb = excel.Workbooks.Add()
wks = wb.Sheets.Add()
wks.Name = "test"
groceries = (
    ['apples', 2.00],
    ['bread',  1.00],
    ['meat',  15.00],
    ['celery', 1.00],
)

row=1
col=1
for item, cost in (groceries):
    wks.Cells(row,col).Value = item
    wks.Cells(row,col+1).Value = cost
   
    row += 1

wks.Cells(row, 1).Value = "Total"
wks.Cells(row, 2).Value =  '=SUM(B1:B4)'
wb.SaveAs('C:\\Users\\kevin\\dev\\pyInExcel\\simplewin32.xlsx')
excel.Application.Quit()
Fig 7: Pywin32 output in Excel
Fig 7: Pywin32 output in Excel

Conclusion

There you have it: eight different Python packages for interfacing with Excel.

This article is accurate and true to the best of the author’s knowledge. Content is for informational or entertainment purposes only and does not substitute for personal counsel or professional advice in business, financial, legal, or technical matters.

© 2020 Kevin Languedoc

Comments

    0 of 8192 characters used
    Post Comment
    • klanguedoc profile imageAUTHOR

      Kevin Languedoc 

      2 months ago from Canada

      There are different ways to approach this. You can use an ORM library like https://www.sqlalchemy.org/ or a full stack framework like Python or Flask.

    • platinumOwl4 profile image

      platinumOwl4 

      2 months ago

      This is a great article packed with fantastic information. However, what I need is a library that will help me create a database for books and music.

      Thanks in advance platimumowl4

    working

    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, owlcation.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://maven.io/company/pages/privacy

    Show Details
    Necessary
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
    Features
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Marketing
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Statistics
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)
    ClickscoThis is a data management platform studying reader behavior (Privacy Policy)