Get Started Now

Python and and Excel Reader, XLRD

The python module, XLRD, is a developer tool for reading Microsoft Excel formats to 2003 using a library reverse-engineered in OpenOffice.org. The description states it is Unicode-aware, and I would like to add to that it is very, very Unicode-aware meaning you may have issues with files formatted with different language packs.

In my experience version 5.2 worked the best for exporting Excel columns and rows to ansi text. As of this writing, the home page needs to be updated, as it still has the latest version as 0.6.1, when PyPI has 0.7.1 as the lastest version. This may have changed as of the writing of this blog, but I do not keep track of the XLRD development. Google is probably the best bet in tracking the latest developments.

Just for reference, files ending with .xls are Excel formated files up to Excel 2003, and files ending with .xlsx are Excel files formated with Excel 2007 (pc) and Excel 2008 (apple).

Example Usage [via Activestate] : Recipe 483742: Easy Cross Platform Excel Parsing With Xlrd

The class allows you to create a generator which returns excel data one row at a time as either a list or dictionary.

Below is a simple example I wrote which will only collect data from a column if a condition is met in another column, i.e. the latter column contains positive whole number integers. I am including two functions called before and positivity which may be useful in removing undesirable results from your data, as I had to do in my full implementation. Please be pragmatic when using python modules other than the default modules included in CPython (the core python modules).

USING XLRD 5.2

book = xlrd.open_workbook(parser)
sh = book.sheet_by_name(wb)

for r in range(sh.nrows)[1:]:
    # The item is the field column converted list
    item = sh.row(r)[special_info]

    # Example : To remove the last quote
    before = len(str(item)) - 1

    # Transform ouput to integer only (remove decimals)
    ad = str(sh.row(r)[addcol])

    # Example : To make sure the add column doesn't contain negative number
    positivity = re.search(("[\-\][0-9]*[0-9]"), ad)

    # If not a number, with the prefix identifier "text", then skip
    if re.search('text', ad) :
        pass # Skip line - Found junk data in badly formated Excel Spreadsheet
    else  :
        print final_output

UPDATE: This is in no way an endorsement of XLRD as a solution.

I would not recommend building an application off of XLRD if you had a database to use for importing, such as MS SQL or Oracle. I used XLRD as a solution, only because of the lack of a database at the time I started development.

  • Share/Bookmark

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>