-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"source"
encoding for datasets opened from fsspec
objects
#8923
base: main
Are you sure you want to change the base?
Conversation
Could use `getattr(filename_or_obj, "path", filename_or_obj)` to avoid `isinstance` checks.
Without knowing much (I generally |
Shouldn't |
the main use case is indeed to extract additional data, which you'd do immediately after
As far as I can tell, they only convert path-likes to string (which these objects are not, they are file-like, not path-like). Are you suggesting we should change that? |
I think this is fine, but our long-term goal is to delete |
my impression of that discussion was that we wanted to either return the encoding in a separate object, or somehow remove the encoding after the first operation (i.e. not carry it around). Either way would be fine with me, since I would still have access to it immediately after opening. |
When opening files from path-like objects (
str
,pathlib.Path
), the backend machinery (_dataset_from_backend_dataset
) sets the"source"
encoding. This is useful if we need the original path for additional processing, like writing to a similarly named file, or to extract additional metadata. This would be useful as well when usingfsspec
to open remote files.In this PR, I'm extracting the
path
attribute that mostfsspec
objects have to set that value. I've considered usingisinstance
checks instead of thegetattr
-with-default, but the list of potential classes is too big to be practical (at least 4 classes just withinfsspec
itself).If this sounds like a good idea, I'll update the documentation of the
"source"
encoding to mention this feature.whats-new.rst