Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Forcing an int dtype on DataFrame construction raises an odd error #58586

Open
3 tasks done
amanlai opened this issue May 6, 2024 · 4 comments · May be fixed by #58685
Open
3 tasks done

BUG: Forcing an int dtype on DataFrame construction raises an odd error #58586

amanlai opened this issue May 6, 2024 · 4 comments · May be fixed by #58685
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@amanlai
Copy link

amanlai commented May 6, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame([['1', '2'], ['3', '4']], dtype='int32')

Issue Description

The above code raises ValueError: Trying to coerce float values to integers on the development version (it is raised on 2.1.1 as well). However, if we force 'Int32', there is no error and the construction works as expected.

Expected Behavior

There should probably not be an error in the first place since .astype('int32') works as expected and extension dtypes also work as expected. Even if an error should be raised, the message is a bit off since the values are string representation of integers not floats, so perhaps the error message should be ValueError: Trying to coerce values to integers. Try astype instead. or something along those lines.

Installed Versions

commit : ea7bcd1
python : 3.12.0
OS : Windows 10
pandas : 3.0.0.dev0+880.gea7bcd14c8
numpy : 2.1.0.dev0+git20240402.e191a5f

@amanlai amanlai added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 6, 2024
@Aloqeely
Copy link
Member

Aloqeely commented May 6, 2024

I agree this should work, trying this on a series works as expected:

ser = pd.Series(['1', '2', '3', '4'], dtype='int32')

As for the error message, it says "Trying to coerce float values" when it is actually trying to coerce string (object) values, which is weird.

I will open a PR to improve the error message, PRs to fix the main issue will be appreciated.

@amanlai
Copy link
Author

amanlai commented May 6, 2024

I can work on the PR to fix the main issue.

@rajat315315 rajat315315 linked a pull request May 12, 2024 that will close this issue
1 task
@rajat315315
Copy link

Apologies to jump into this @amanlai .
Does my contribution look okay? Can I improve?

@amanlai
Copy link
Author

amanlai commented May 14, 2024

@rajat315315 I was actually working on a PR. Didn't think anybody would come in this fast lol. Anyway, let me know if you want to collaborate. I'm not a pandas dev so I don't have auth to merge or test or anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants