The python module, XLRD, is a developer tool for reading Microsoft Excel formats to 2003 using a library reverse-engineered in OpenOffice.org. The description states it is Unicode-aware, and I would like to add to that it is very, very Unicode-aware meaning you may have issues with files formatted with different language packs.
In my experience version 5.2 worked the best for exporting Excel columns and rows to ansi text. As of this writing, the home page needs to be updated, as it still has the latest version as 0.6.1, when PyPI has 0.7.1 as the lastest version. This may have changed as of the writing of this blog, but I do not keep track of the XLRD development. Google is probably the best bet in tracking the latest developments.
Just for reference, files ending with .xls are Excel formated files up to Excel 2003, and files ending with .xlsx are Excel files formated with Excel 2007 (pc) and Excel 2008 (apple).
Example Usage [via Activestate] : Recipe 483742: Easy Cross Platform Excel Parsing With Xlrd
The class allows you to create a generator which returns excel data one row at a time as either a list or dictionary.
Below is a simple example I wrote which will only collect data from a column if a condition is met in another column, i.e. the latter column contains positive whole number integers. I am including two functions called before and positivity which may be useful in removing undesirable results from your data, as I had to do in my full implementation. Please be pragmatic when using python modules other than the default modules included in CPython (the core python modules).
USING XLRD 5.2
book = xlrd.open_workbook(parser)
sh = book.sheet_by_name(wb)
for r in range(sh.nrows)[1:]:
# The item is the field column converted list
item = sh.row(r)[special_info]
# Example : To remove the last quote
before = len(str(item)) - 1
# Transform ouput to integer only (remove decimals)
ad = str(sh.row(r)[addcol])
# Example : To make sure the add column doesn't contain negative number
positivity = re.search(("[\-\][0-9]*[0-9]"), ad)
# If not a number, with the prefix identifier "text", then skip
if re.search('text', ad) :
pass # Skip line - Found junk data in badly formated Excel Spreadsheet
else :
print final_output
UPDATE: This is in no way an endorsement of XLRD as a solution.
I would not recommend building an application off of XLRD if you had a database to use for importing, such as MS SQL or Oracle. I used XLRD as a solution, only because of the lack of a database at the time I started development.
