Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: how to convert .xls to .xlsx because Pandas failed to open .xls files #58470

Closed
2 of 3 tasks
Chirawat3987 opened this issue Apr 29, 2024 · 13 comments
Closed
2 of 3 tasks
Labels
Bug Closing Candidate May be closeable, needs more eyeballs IO Excel read_excel, to_excel

Comments

@Chirawat3987
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df = pd.read_excel('ZARR2001.xls')
df.to_excel('ZARR2001.xlsx')

Issue Description

The error that I get from pandas is:


ValueError Traceback (most recent call last)
Cell In[40], line 3
1 import pandas as pd
----> 3 df = pd.read_excel('ZARR2001.xls')
4 df.to_excel('ZARR2001.xlsx')

File /opt/conda/miniconda3/lib/python3.8/site-packages/pandas/util/_decorators.py:299, in deprecate_nonkeyword_arguments..decorate..wrapper(*args, **kwargs)
294 msg = (
295 f"Starting with Pandas version {version} all arguments of "
296 f"{func.name}{arguments} will be keyword-only"
297 )
298 warnings.warn(msg, FutureWarning, stacklevel=stacklevel)
--> 299 return func(*args, **kwargs)

File /opt/conda/miniconda3/lib/python3.8/site-packages/pandas/io/excel/_base.py:336, in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, storage_options)
334 if not isinstance(io, ExcelFile):
335 should_close = True
--> 336 io = ExcelFile(io, storage_options=storage_options, engine=engine)
337 elif engine and engine != io.engine:
338 raise ValueError(
339 "Engine should not be specified when passing "
340 "an ExcelFile - ExcelFile already has the engine set"
341 )

File /opt/conda/miniconda3/lib/python3.8/site-packages/pandas/io/excel/_base.py:1080, in ExcelFile.init(self, path_or_buffer, engine, storage_options)
1078 ext = "xls"
1079 else:
-> 1080 ext = inspect_excel_format(
1081 content=path_or_buffer, storage_options=storage_options
1082 )
1084 if ext == "ods":
1085 engine = "odf"

File /opt/conda/miniconda3/lib/python3.8/site-packages/pandas/io/excel/_base.py:974, in inspect_excel_format(path, content, storage_options)
972 return "xls"
973 elif not peek.startswith(ZIP_SIGNATURE):
--> 974 raise ValueError("File is not a recognized excel file")
976 # ZipFile typing is overly-strict
977 # python/typeshed#4212
978 zf = zipfile.ZipFile(stream) # type: ignore[arg-type]

ValueError: File is not a recognized excel file

Expected Behavior

pls help

Installed Versions

Replace this line with the output of pd.show_versions()

@Chirawat3987 Chirawat3987 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 29, 2024
@Aloqeely
Copy link
Contributor

Thanks for the report. You checked the box I have confirmed this bug exists on the latest version of pandas. but I am able to read xls files on v2.2.2, could you please check if you are on the latest version? Or send the output of pd.show_versions()

@Chirawat3987
Copy link
Author

Thanks for the report. You checked the box I have confirmed this bug exists on the latest version of pandas. but I am able to read xls files on v2.2.2, could you please check if you are on the latest version? Or send the output of pd.show_versions()

this result :

INSTALLED VERSIONS

commit : 7c48ff4
python : 3.8.15.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.0-0.deb10.16-cloud-amd64
Version : #1 SMP Debian 5.10.127-2~bpo10+1 (2022-07-28)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.5
numpy : 1.19.5
pytz : 2024.1
dateutil : 2.9.0
pip : 24.0
setuptools : 59.8.0
Cython : 0.29.37
pytest : None
hypothesis : None
sphinx : 7.1.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.8.0
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.12.2
pandas_datareader: None
bs4 : 4.12.3
bottleneck : None
fsspec : 0.9.0
fastparquet : 0.5.0
gcsfs : 0.8.0
matplotlib : 3.4.3
numexpr : 2.8.3
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 15.0.2
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : 1.4.49
tables : 3.6.1
tabulate : 0.9.0
xarray : None
xlrd : 2.0.1
xlwt : None
numba : 0.53.1

@Aloqeely
Copy link
Contributor

You are still using pandas 1.2.5 which I don't think is supported anymore, could you try upgrading to the latest version of pandas and see if that fixes your problem?
If you used pip to install pandas, you can run this to update it: pip install --upgrade pandas

@Chirawat3987
Copy link
Author

You are still using pandas 1.2.5 which I don't think is supported anymore, could you try upgrading to the latest version of pandas and see if that fixes your problem? If you used pip to install pandas, you can run this to update it: pip install --upgrade pandas

You are still using pandas 1.2.5 which I don't think is supported anymore, could you try upgrading to the latest version of pandas and see if that fixes your problem? If you used pip to install pandas, you can run this to update it: pip install --upgrade pandas

how can i upgrade ?

Requirement already satisfied: pandas in /opt/conda/miniconda3/lib/python3.8/site-packages (2.0.3)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/miniconda3/lib/python3.8/site-packages (from pandas) (2.9.0)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/miniconda3/lib/python3.8/site-packages (from pandas) (2024.1)
Requirement already satisfied: tzdata>=2022.1 in /opt/conda/miniconda3/lib/python3.8/site-packages (from pandas) (2024.1)
Requirement already satisfied: numpy>=1.20.3 in /opt/conda/miniconda3/lib/python3.8/site-packages (from pandas) (1.24.4)
Requirement already satisfied: six>=1.5 in /opt/conda/miniconda3/lib/python3.8/site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv class="ansi-yellow-fg">
Note: you may need to restart the kernel to use updated packages.

@Aloqeely
Copy link
Contributor

Aloqeely commented Apr 29, 2024

It seems like you are using conda, I am not familiar with that but I think you can run conda install -c conda-forge pandas
After doing that, you can do print(pd.__version__) to see if pandas was updated.

@rhshadrach
Copy link
Member

rhshadrach commented Apr 29, 2024

@Chirawat3987 - can you try doing

df = pd.read_excel('ZARR2001.xls', engine='xlrd')

pandas will try to automatically detect the Excel file format, but this is failing for your XLS file (it does not match the XLS signatures that the file should start with). But you can specify which engine to use, and pandas will succeed if the engine can open it.

@rhshadrach rhshadrach added IO Excel read_excel, to_excel Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 29, 2024
@Chirawat3987
Copy link
Author

@Chirawat3987 - can you try doing

df = pd.read_excel('ZARR2001.xls', engine='xlrd')

pandas will try to automatically detect the Excel file format, but this is failing for your XLS file (it does not match the XLS signatures that the file should start with). But you can specify which engine to use, and pandas will succeed if the engine can open it.

this error message, pls help

image

@rhshadrach
Copy link
Member

xlrd is not able to read the XLS file - there is nothing that pandas can do. You can also try the engine="calamine" after installing the package python-calamine, it is the only other engine that is built into pandas that can read xls files.

@rhshadrach rhshadrach added Closing Candidate May be closeable, needs more eyeballs and removed Needs Info Clarification about behavior needed to assess issue labels Apr 30, 2024
@rhshadrach
Copy link
Member

Ah -but python-calamine was only added recently. You will need to upgrade pandas to try it out, as @Aloqeely has advised.

@Chirawat3987
Copy link
Author

It seems like you are using conda, I am not familiar with that but I think you can run conda install -c conda-forge pandas After doing that, you can do print(pd.__version__) to see if pandas was updated.

I upgrade v 2.2.2 already
image

and then I run command : df = pd.read_excel('D:/test/ZARR2001.xls',engine='xlrd')
error message :
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'No.\tAcco'

@Chirawat3987
Copy link
Author

Ah -but python-calamine was only added recently. You will need to upgrade pandas to try it out, as @Aloqeely has advised.

I upgrade pandas lastest and run command
df = pd.read_excel('D:/test/ZARR2001.xls',engine='calamine')

error message :
CalamineError: Cannot detect file format

@asishm
Copy link
Contributor

asishm commented May 1, 2024

Are you sure it's actually an xls file and not just a text file with a .xls extension? Try opening it in a text editor.

@rhshadrach
Copy link
Member

Thanks for the responses @Chirawat3987. pandas uses other packages to open Excel files (e.g. xlrd and calamine). Neither can open your file - so this is not a pandas issue. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs IO Excel read_excel, to_excel
Projects
None yet
Development

No branches or pull requests

4 participants