Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Adding table name parameter to pandas.read_excel #58464

Closed
wants to merge 10 commits into from

Conversation

iangainey
Copy link
Contributor

@iangainey
Copy link
Contributor Author

As far as I can tell this should be passing all of the tests. Some pre-commit hooks seem to not be passing in this PR but some make no sense, saying there is no attribute, yet there is. Tons of "not indexable errors", that are indexable, and were existing functionality prior to opening this so I'm not certain how it's failing.
Any guidance on getting these to pass would be great.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. This issue is tagged as Needs Discussion, and I am hesitant to add this functionality to pandas' already complicated Excel reading code. This is especially true because, as the implementation currently stands, it's only supported by openpyxl.

That said, it seems to me the move of code in parse to it's own method (currently - your parse_mutliindex) is a good cleanup anyways (without the reading tables feature), and it would simplify the diff here. Would you consider doing that refactor as a pre-cursor?

I'm also curious as to why parse_multiindex since (I think) there isn't necessarily a MultiIndex. I would suggest parse_sheet instead.

@mroeschke
Copy link
Member

Agreed with the hesitancy to implement this feature as of now. Going to close since I think more buy-in needs to happen in the issue first, but happy to have a separate PR with the cleanups

@mroeschke mroeschke closed this Apr 30, 2024
@iangainey
Copy link
Contributor Author

iangainey commented Apr 30, 2024

I guess I'm a bit confused, as I had tried to get more of a discussion on the issue but could not get any response from you (the development team), only from other contributors. The functionality only adds complexity when the user wants to read in tables, and there are plenty of other functionality in the excel io area of pandas that only works with one of the engines. How does getting more buy-in happen? @mroeschke @rhshadrach

@rhshadrach
Copy link
Member

rhshadrach commented Apr 30, 2024

The functionality only adds complexity when the user wants to read in tables

Each feature adds a certain amount of technical debt - it increases the surface area of pandas requiring tests, documentation, dealing with new bug reports. It would be unmaintainable to accept all feature requests, therefore we must carefully evaluate what features we do accept.

there are plenty of other functionality in the excel io area of pandas that only works with one of the engines.

This is something we would like to reduce.

That said - I would take a look at the diff here after #58497 is merged.

@iangainey
Copy link
Contributor Author

Thank you for the explanations, those certainly make sense.
In regards to looking at the diff here after #58497 is merged, since I can't update this PR and wanted to improve it/resolve any errors I opened at draft PR #58500 that will have a more accurate diff if you would be so kind to look at that.
Apologies for the closed out #58499-had set up working on this project incorrectly initially and was working on the main branch of my fork, finally got these split out into branches properly and linked with the right PR's.

Please let me know any thoughts, issues, or improvements. I would like to understand the discussion as to if or how this functionality should look in pandas.

@rhshadrach
Copy link
Member

For the future - we can reopen the PR. But let's continue this in #58500.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: pd.read_excel with table parameter
3 participants