Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential regression in Dataset.from_dataframe() not preserving timezone #9026

Open
5 tasks done
Aloqeely opened this issue May 14, 2024 · 6 comments · May be fixed by #9042
Open
5 tasks done

Potential regression in Dataset.from_dataframe() not preserving timezone #9026

Aloqeely opened this issue May 14, 2024 · 6 comments · May be fixed by #9042
Labels
bug needs triage Issue that has not been reviewed by xarray team member

Comments

@Aloqeely
Copy link

What happened?

Converting pandas DataFrame that has a datetime column with timezone to an xarray dataset does not preserve the timezone, this only breaks in version 2024.5

What did you expect to happen?

I would expect the timezone info to be preserved, as it was the case before.

Minimal Complete Verifiable Example

import pandas as pd
import xarray as xr

df1 = pd.DataFrame(
    {"A": pd.date_range("20130101", periods=4, tz="US/Eastern"), "B": [1, 2, 3, 4]}
)
dataset = xr.Dataset.from_dataframe(df1)
df2 = dataset.to_dataframe()

print(df1.dtypes, dataset.dtypes, df2.dtypes, sep="\n\n")

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

# On xarary 2024.5.0:
A    datetime64[ns, US/Eastern]
B                         int64
dtype: object

Frozen({'A': dtype('<M8[ns]'), 'B': dtype('int64')})

A    datetime64[ns]
B             int64
dtype: object

# ---------------------------
#  On previous versions:

A    datetime64[ns, US/Eastern]
B                         int64
dtype: object

Frozen({'A': dtype('O'), 'B': dtype('int64')})

A    datetime64[ns, US/Eastern]
B                         int64
dtype: object

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.12.1 (tags/v3.12.1:2305ca5, Dec 7 2023, 22:03:25) [MSC v.1937 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('English_United States', '1252')
libhdf5: 1.14.2
libnetcdf: None

xarray: 2024.5.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.12.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: 3.10.0
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.8.3
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.3.1
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.2.0
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: 8.22.2
sphinx: 7.2.6

@Aloqeely Aloqeely added bug needs triage Issue that has not been reviewed by xarray team member labels May 14, 2024
Copy link

welcome bot commented May 14, 2024

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@dcherian
Copy link
Contributor

@ilan-gold are you able to take a look here please? I suspect it's related to extension array stuff

@ilan-gold
Copy link
Contributor

Is dtype('O') from previous versions correct though?

@ilan-gold
Copy link
Contributor

Ah, ok, I see, previously this was an array of TimeStamp objects and now is being converted in a numpy array with a "proper" datatype

@dcherian
Copy link
Contributor

dcherian commented May 22, 2024

It's possible the previous behaviour was unintentional and this one is more "correct"/consistent ... Some exploration and reporting would be very helpful.

@ilan-gold
Copy link
Contributor

Ok, so the problem is that DateTime is an extension array dtype: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.types.is_extension_array_dtype.html

I will look into properly preserving the dtype then, although I suspect there is something else going on regarding datetimes (or the testing is not specific enough to cover this case)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs triage Issue that has not been reviewed by xarray team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants