BUG: how to convert .xls to .xlsx because Pandas failed to open .xls files #58470

Chirawat3987 · 2024-04-29T08:35:36Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df = pd.read_excel('ZARR2001.xls')
df.to_excel('ZARR2001.xlsx')

Issue Description

The error that I get from pandas is:

ValueError Traceback (most recent call last)
Cell In[40], line 3
1 import pandas as pd
----> 3 df = pd.read_excel('ZARR2001.xls')
4 df.to_excel('ZARR2001.xlsx')

File /opt/conda/miniconda3/lib/python3.8/site-packages/pandas/util/_decorators.py:299, in deprecate_nonkeyword_arguments..decorate..wrapper(*args, **kwargs)
294 msg = (
295 f"Starting with Pandas version {version} all arguments of "
296 f"{func.name}{arguments} will be keyword-only"
297 )
298 warnings.warn(msg, FutureWarning, stacklevel=stacklevel)
--> 299 return func(*args, **kwargs)

File /opt/conda/miniconda3/lib/python3.8/site-packages/pandas/io/excel/_base.py:336, in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, storage_options)
334 if not isinstance(io, ExcelFile):
335 should_close = True
--> 336 io = ExcelFile(io, storage_options=storage_options, engine=engine)
337 elif engine and engine != io.engine:
338 raise ValueError(
339 "Engine should not be specified when passing "
340 "an ExcelFile - ExcelFile already has the engine set"
341 )

File /opt/conda/miniconda3/lib/python3.8/site-packages/pandas/io/excel/_base.py:1080, in ExcelFile.init(self, path_or_buffer, engine, storage_options)
1078 ext = "xls"
1079 else:
-> 1080 ext = inspect_excel_format(
1081 content=path_or_buffer, storage_options=storage_options
1082 )
1084 if ext == "ods":
1085 engine = "odf"

File /opt/conda/miniconda3/lib/python3.8/site-packages/pandas/io/excel/_base.py:974, in inspect_excel_format(path, content, storage_options)
972 return "xls"
973 elif not peek.startswith(ZIP_SIGNATURE):
--> 974 raise ValueError("File is not a recognized excel file")
976 # ZipFile typing is overly-strict
977 # python/typeshed#4212
978 zf = zipfile.ZipFile(stream) # type: ignore[arg-type]

ValueError: File is not a recognized excel file

Expected Behavior

pls help

Installed Versions

Replace this line with the output of pd.show_versions()

Aloqeely · 2024-04-29T13:21:57Z

Thanks for the report. You checked the box I have confirmed this bug exists on the latest version of pandas. but I am able to read xls files on v2.2.2, could you please check if you are on the latest version? Or send the output of pd.show_versions()

Chirawat3987 · 2024-04-29T15:17:57Z

Thanks for the report. You checked the box I have confirmed this bug exists on the latest version of pandas. but I am able to read xls files on v2.2.2, could you please check if you are on the latest version? Or send the output of pd.show_versions()

this result :

INSTALLED VERSIONS

commit : 7c48ff4
python : 3.8.15.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.0-0.deb10.16-cloud-amd64
Version : #1 SMP Debian 5.10.127-2~bpo10+1 (2022-07-28)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.5
numpy : 1.19.5
pytz : 2024.1
dateutil : 2.9.0
pip : 24.0
setuptools : 59.8.0
Cython : 0.29.37
pytest : None
hypothesis : None
sphinx : 7.1.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.8.0
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.12.2
pandas_datareader: None
bs4 : 4.12.3
bottleneck : None
fsspec : 0.9.0
fastparquet : 0.5.0
gcsfs : 0.8.0
matplotlib : 3.4.3
numexpr : 2.8.3
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 15.0.2
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : 1.4.49
tables : 3.6.1
tabulate : 0.9.0
xarray : None
xlrd : 2.0.1
xlwt : None
numba : 0.53.1

Aloqeely · 2024-04-29T15:57:41Z

You are still using pandas 1.2.5 which I don't think is supported anymore, could you try upgrading to the latest version of pandas and see if that fixes your problem?
If you used pip to install pandas, you can run this to update it: pip install --upgrade pandas

Chirawat3987 · 2024-04-29T16:04:22Z

You are still using pandas 1.2.5 which I don't think is supported anymore, could you try upgrading to the latest version of pandas and see if that fixes your problem? If you used pip to install pandas, you can run this to update it: pip install --upgrade pandas

how can i upgrade ?

Requirement already satisfied: pandas in /opt/conda/miniconda3/lib/python3.8/site-packages (2.0.3)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/miniconda3/lib/python3.8/site-packages (from pandas) (2.9.0)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/miniconda3/lib/python3.8/site-packages (from pandas) (2024.1)
Requirement already satisfied: tzdata>=2022.1 in /opt/conda/miniconda3/lib/python3.8/site-packages (from pandas) (2024.1)
Requirement already satisfied: numpy>=1.20.3 in /opt/conda/miniconda3/lib/python3.8/site-packages (from pandas) (1.24.4)
Requirement already satisfied: six>=1.5 in /opt/conda/miniconda3/lib/python3.8/site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv class="ansi-yellow-fg">
Note: you may need to restart the kernel to use updated packages.

Aloqeely · 2024-04-29T16:20:35Z

It seems like you are using conda, I am not familiar with that but I think you can run conda install -c conda-forge pandas
After doing that, you can do print(pd.__version__) to see if pandas was updated.

rhshadrach · 2024-04-29T22:01:01Z

@Chirawat3987 - can you try doing

df = pd.read_excel('ZARR2001.xls', engine='xlrd')

pandas will try to automatically detect the Excel file format, but this is failing for your XLS file (it does not match the XLS signatures that the file should start with). But you can specify which engine to use, and pandas will succeed if the engine can open it.

Chirawat3987 · 2024-04-30T02:12:38Z

@Chirawat3987 - can you try doing
df = pd.read_excel('ZARR2001.xls', engine='xlrd')
pandas will try to automatically detect the Excel file format, but this is failing for your XLS file (it does not match the XLS signatures that the file should start with). But you can specify which engine to use, and pandas will succeed if the engine can open it.

this error message, pls help

rhshadrach · 2024-04-30T21:07:54Z

xlrd is not able to read the XLS file - there is nothing that pandas can do. You can also try the engine="calamine" after installing the package python-calamine, it is the only other engine that is built into pandas that can read xls files.

rhshadrach · 2024-04-30T21:08:41Z

Ah -but python-calamine was only added recently. You will need to upgrade pandas to try it out, as @Aloqeely has advised.

Chirawat3987 · 2024-05-01T15:28:19Z

It seems like you are using conda, I am not familiar with that but I think you can run conda install -c conda-forge pandas After doing that, you can do print(pd.__version__) to see if pandas was updated.

I upgrade v 2.2.2 already

and then I run command : df = pd.read_excel('D:/test/ZARR2001.xls',engine='xlrd')
error message :
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'No.\tAcco'

Chirawat3987 · 2024-05-01T15:59:49Z

Ah -but python-calamine was only added recently. You will need to upgrade pandas to try it out, as @Aloqeely has advised.

I upgrade pandas lastest and run command
df = pd.read_excel('D:/test/ZARR2001.xls',engine='calamine')

error message :
CalamineError: Cannot detect file format

asishm · 2024-05-01T16:49:45Z

Are you sure it's actually an xls file and not just a text file with a .xls extension? Try opening it in a text editor.

rhshadrach · 2024-05-01T20:45:28Z

Thanks for the responses @Chirawat3987. pandas uses other packages to open Excel files (e.g. xlrd and calamine). Neither can open your file - so this is not a pandas issue. Closing.

Chirawat3987 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 29, 2024

rhshadrach added IO Excel read_excel, to_excel Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 29, 2024

rhshadrach added Closing Candidate May be closeable, needs more eyeballs and removed Needs Info Clarification about behavior needed to assess issue labels Apr 30, 2024

rhshadrach closed this as completed May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: how to convert .xls to .xlsx because Pandas failed to open .xls files #58470

BUG: how to convert .xls to .xlsx because Pandas failed to open .xls files #58470

Chirawat3987 commented Apr 29, 2024

Aloqeely commented Apr 29, 2024

Chirawat3987 commented Apr 29, 2024

Aloqeely commented Apr 29, 2024

Chirawat3987 commented Apr 29, 2024

Aloqeely commented Apr 29, 2024 •

edited

rhshadrach commented Apr 29, 2024 •

edited

Chirawat3987 commented Apr 30, 2024

rhshadrach commented Apr 30, 2024

rhshadrach commented Apr 30, 2024

Chirawat3987 commented May 1, 2024

Chirawat3987 commented May 1, 2024

asishm commented May 1, 2024

rhshadrach commented May 1, 2024

BUG: how to convert .xls to .xlsx because Pandas failed to open .xls files #58470

BUG: how to convert .xls to .xlsx because Pandas failed to open .xls files #58470

Comments

Chirawat3987 commented Apr 29, 2024

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

Aloqeely commented Apr 29, 2024

Chirawat3987 commented Apr 29, 2024

INSTALLED VERSIONS

Aloqeely commented Apr 29, 2024

Chirawat3987 commented Apr 29, 2024

Aloqeely commented Apr 29, 2024 • edited

rhshadrach commented Apr 29, 2024 • edited

Chirawat3987 commented Apr 30, 2024

rhshadrach commented Apr 30, 2024

rhshadrach commented Apr 30, 2024

Chirawat3987 commented May 1, 2024

Chirawat3987 commented May 1, 2024

asishm commented May 1, 2024

rhshadrach commented May 1, 2024

Aloqeely commented Apr 29, 2024 •

edited

rhshadrach commented Apr 29, 2024 •

edited