You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
importpandasaspdidx1=pd.date_range("2024-01-01", periods=24*12, freq="5min", unit="us")
idx2=pd.date_range("2024-01-02", periods=24*12, freq="5min", unit="us")
idx3=pd.date_range("2024-01-03", periods=24*12, freq="5min", unit="us")
ts1=pd.Series(range(len(idx1)), index=idx1)
ts2=pd.Series(range(len(idx2)), index=idx2)
ts3=pd.Series(range(len(idx3)), index=idx3)
df=pd.concat([ts1, ts2, ts3], axis=1)
print(ts1.index)
# DatetimeIndex(['2024-01-01 00:00:00', '2024-01-01 00:05:00',# '2024-01-01 00:10:00', '2024-01-01 00:15:00',# '2024-01-01 00:20:00', '2024-01-01 00:25:00',# '2024-01-01 00:30:00', '2024-01-01 00:35:00',# '2024-01-01 00:40:00', '2024-01-01 00:45:00',# ...# '2024-01-01 23:10:00', '2024-01-01 23:15:00',# '2024-01-01 23:20:00', '2024-01-01 23:25:00',# '2024-01-01 23:30:00', '2024-01-01 23:35:00',# '2024-01-01 23:40:00', '2024-01-01 23:45:00',# '2024-01-01 23:50:00', '2024-01-01 23:55:00'],# dtype='datetime64[us]', length=288, freq='5min')print(ts2.index)
# DatetimeIndex(['2024-01-02 00:00:00', '2024-01-02 00:05:00',# '2024-01-02 00:10:00', '2024-01-02 00:15:00',# '2024-01-02 00:20:00', '2024-01-02 00:25:00',# '2024-01-02 00:30:00', '2024-01-02 00:35:00',# '2024-01-02 00:40:00', '2024-01-02 00:45:00',# ...# '2024-01-02 23:10:00', '2024-01-02 23:15:00',# '2024-01-02 23:20:00', '2024-01-02 23:25:00',# '2024-01-02 23:30:00', '2024-01-02 23:35:00',# '2024-01-02 23:40:00', '2024-01-02 23:45:00',# '2024-01-02 23:50:00', '2024-01-02 23:55:00'],# dtype='datetime64[us]', length=288, freq='5min')print(ts3.index)
# DatetimeIndex(['2024-01-03 00:00:00', '2024-01-03 00:05:00',# '2024-01-03 00:10:00', '2024-01-03 00:15:00',# '2024-01-03 00:20:00', '2024-01-03 00:25:00',# '2024-01-03 00:30:00', '2024-01-03 00:35:00',# '2024-01-03 00:40:00', '2024-01-03 00:45:00',# ...# '2024-01-03 23:10:00', '2024-01-03 23:15:00',# '2024-01-03 23:20:00', '2024-01-03 23:25:00',# '2024-01-03 23:30:00', '2024-01-03 23:35:00',# '2024-01-03 23:40:00', '2024-01-03 23:45:00',# '2024-01-03 23:50:00', '2024-01-03 23:55:00'],# dtype='datetime64[us]', length=288, freq='5min')print(df.index)
# DatetimeIndex(['2024-01-01 00:00:00', '2024-01-02 00:00:00',# '2024-01-03 00:00:00', '2024-01-03 00:05:00',# '2024-01-03 00:10:00', '2024-01-03 00:15:00',# '2024-01-03 00:20:00', '2024-01-03 00:25:00',# '2024-01-03 00:30:00', '2024-01-03 00:35:00',# ...# '2024-01-03 23:20:00', '2024-01-03 23:25:00',# '2024-01-03 23:30:00', '2024-01-03 23:35:00',# '2024-01-03 23:40:00', '2024-01-03 23:45:00',# '2024-01-03 23:50:00', '2024-01-03 23:55:00',# '2024-01-04 11:20:00', '2024-01-05 11:20:00'],# dtype='datetime64[us]', length=292, freq=None)print(df.to_string())
# 0 1 2# 2024-01-01 00:00:00 0.0 NaN NaN# 2024-01-02 00:00:00 NaN 0.0 NaN# 2024-01-03 00:00:00 NaN NaN 0.0# 2024-01-03 00:05:00 NaN NaN 1.0# 2024-01-03 00:10:00 NaN NaN 2.0# 2024-01-03 00:15:00 NaN NaN 3.0# 2024-01-03 00:20:00 NaN NaN 4.0# 2024-01-03 00:25:00 NaN NaN 5.0# 2024-01-03 00:30:00 NaN NaN 6.0# 2024-01-03 00:35:00 NaN NaN 7.0# 2024-01-03 00:40:00 NaN NaN 8.0# 2024-01-03 00:45:00 NaN NaN 9.0# 2024-01-03 00:50:00 NaN NaN 10.0# ...# 2024-01-03 22:50:00 NaN NaN 274.0# 2024-01-03 22:55:00 NaN NaN 275.0# 2024-01-03 23:00:00 NaN NaN 276.0# 2024-01-03 23:05:00 NaN NaN 277.0# 2024-01-03 23:10:00 NaN NaN 278.0# 2024-01-03 23:15:00 NaN NaN 279.0# 2024-01-03 23:20:00 NaN NaN 280.0# 2024-01-03 23:25:00 NaN NaN 281.0# 2024-01-03 23:30:00 NaN NaN 282.0# 2024-01-03 23:35:00 NaN NaN 283.0# 2024-01-03 23:40:00 NaN NaN 284.0# 2024-01-03 23:45:00 NaN NaN 285.0# 2024-01-03 23:50:00 NaN NaN 286.0# 2024-01-03 23:55:00 NaN NaN 287.0# 2024-01-04 11:20:00 NaN NaN NaN# 2024-01-05 11:20:00 NaN NaN NaNdf2=pd.concat([ts1, ts2], axis=1)
print(df2.index)
# DatetimeIndex(['2024-01-01 00:00:00', '2024-01-02 00:00:00',# '2024-01-04 11:20:00', '2024-01-05 11:20:00'],# dtype='datetime64[us]', freq=None)print(df2.to_string())
# 0 1# 2024-01-01 00:00:00 0.0 NaN# 2024-01-02 00:00:00 NaN 0.0# 2024-01-04 11:20:00 NaN NaN# 2024-01-05 11:20:00 NaN NaN
Issue Description
When trying to concatenate by column few non overlapping timeseries dataframes, if the units of the original dataframes are not 'ns' then the resulting dataframe will have missing data (and lose it's frequency value, in case it's relevant).
The example given has 3 dataframes and for some reason the result has missed most of the data from the first and the second dataframe. In case of concatenating 2 dataframes we end up with almost no data at all.
If we set the units to 'ns' everything works as expected, the resulting df has all the data and kept its frequency='5min'. Every other unit I tried failed with similar results than the example.
I am not sure if it is only related to the unit. Here is an example with ns (and with us as well). Same thing happens with freq="MS". But changing it to freq="D" fixes it.
Also notice that in both outputs the freq attribute of the index changes from ME to None by simply reordering the time series.
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
When trying to concatenate by column few non overlapping timeseries dataframes, if the units of the original dataframes are not 'ns' then the resulting dataframe will have missing data (and lose it's frequency value, in case it's relevant).
The example given has 3 dataframes and for some reason the result has missed most of the data from the first and the second dataframe. In case of concatenating 2 dataframes we end up with almost no data at all.
If we set the units to 'ns' everything works as expected, the resulting df has all the data and kept its frequency='5min'. Every other unit I tried failed with similar results than the example.
Expected Behavior
The text was updated successfully, but these errors were encountered: