Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: nested parse_dates in read_csv #58410

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
10 changes: 0 additions & 10 deletions asv_bench/benchmarks/io/csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -445,16 +445,6 @@ def setup(self, engine):
data = data.format(*two_cols)
self.StringIO_input = StringIO(data)

def time_multiple_date(self, engine):
read_csv(
self.data(self.StringIO_input),
engine=engine,
sep=",",
header=None,
names=list(string.digits[:9]),
parse_dates=[[1, 2], [1, 3]],
)

def time_baseline(self, engine):
read_csv(
self.data(self.StringIO_input),
Expand Down
77 changes: 1 addition & 76 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -267,13 +267,10 @@ skip_blank_lines : boolean, default ``True``
Datetime handling
+++++++++++++++++

parse_dates : boolean or list of ints or names or list of lists or dict, default ``False``.
parse_dates : boolean or list of ints or names, default ``False``.
* If ``True`` -> try parsing the index.
* If ``[1, 2, 3]`` -> try parsing columns 1, 2, 3 each as a separate date
column.
* If ``[[1, 3]]`` -> combine columns 1 and 3 and parse as a single date
column.
* If ``{'foo': [1, 3]}`` -> parse columns 1, 3 as date and call result 'foo'.

.. note::
A fast-path exists for iso8601-formatted dates.
Expand Down Expand Up @@ -828,74 +825,6 @@ The simplest case is to just pass in ``parse_dates=True``:
# These are Python datetime objects
df.index

It is often the case that we may want to store date and time data separately,
or store various date fields separately. the ``parse_dates`` keyword can be
used to specify a combination of columns to parse the dates and/or times from.

You can specify a list of column lists to ``parse_dates``, the resulting date
columns will be prepended to the output (so as to not affect the existing column
order) and the new column names will be the concatenation of the component
column names:

.. ipython:: python
:okwarning:

data = (
"KORD,19990127, 19:00:00, 18:56:00, 0.8100\n"
"KORD,19990127, 20:00:00, 19:56:00, 0.0100\n"
"KORD,19990127, 21:00:00, 20:56:00, -0.5900\n"
"KORD,19990127, 21:00:00, 21:18:00, -0.9900\n"
"KORD,19990127, 22:00:00, 21:56:00, -0.5900\n"
"KORD,19990127, 23:00:00, 22:56:00, -0.5900"
)

with open("tmp.csv", "w") as fh:
fh.write(data)

df = pd.read_csv("tmp.csv", header=None, parse_dates=[[1, 2], [1, 3]])
df

By default the parser removes the component date columns, but you can choose
to retain them via the ``keep_date_col`` keyword:

.. ipython:: python
:okwarning:

df = pd.read_csv(
"tmp.csv", header=None, parse_dates=[[1, 2], [1, 3]], keep_date_col=True
)
df

Note that if you wish to combine multiple columns into a single date column, a
nested list must be used. In other words, ``parse_dates=[1, 2]`` indicates that
the second and third columns should each be parsed as separate date columns
while ``parse_dates=[[1, 2]]`` means the two columns should be parsed into a
single column.

You can also use a dict to specify custom name columns:

.. ipython:: python
:okwarning:

date_spec = {"nominal": [1, 2], "actual": [1, 3]}
df = pd.read_csv("tmp.csv", header=None, parse_dates=date_spec)
df

It is important to remember that if multiple text columns are to be parsed into
a single date column, then a new column is prepended to the data. The ``index_col``
specification is based off of this new set of columns rather than the original
data columns:


.. ipython:: python
:okwarning:

date_spec = {"nominal": [1, 2], "actual": [1, 3]}
df = pd.read_csv(
"tmp.csv", header=None, parse_dates=date_spec, index_col=0
) # index is the nominal column
df

.. note::
If a column or index contains an unparsable date, the entire column or
index will be returned unaltered as an object data type. For non-standard
Expand All @@ -908,10 +837,6 @@ data columns:
for your data to store datetimes in this format, load times will be
significantly faster, ~20x has been observed.

.. deprecated:: 2.2.0
Combining date columns inside read_csv is deprecated. Use ``pd.to_datetime``
on the relevant result columns instead.


Date parsing functions
++++++++++++++++++++++
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,7 @@ Removal of prior version deprecations/changes
- Disallow calling :meth:`Series.replace` or :meth:`DataFrame.replace` without a ``value`` and with non-dict-like ``to_replace`` (:issue:`33302`)
- Disallow constructing a :class:`arrays.SparseArray` with scalar data (:issue:`53039`)
- Disallow indexing an :class:`Index` with a boolean indexer of length zero, it now raises ``ValueError`` (:issue:`55820`)
- Disallow nested sequences for 'parse_dates' in :func:`read_csv`, combine the desired columns using :func:`to_datetime` after parsing instead (:issue:`55569`)
- Disallow non-standard (``np.ndarray``, :class:`Index`, :class:`ExtensionArray`, or :class:`Series`) to :func:`isin`, :func:`unique`, :func:`factorize` (:issue:`52986`)
- Disallow passing a pandas type to :meth:`Index.view` (:issue:`55709`)
- Disallow units other than "s", "ms", "us", "ns" for datetime64 and timedelta64 dtypes in :func:`array` (:issue:`53817`)
Expand Down
11 changes: 5 additions & 6 deletions pandas/io/parsers/readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -814,12 +814,11 @@ def read_csv(
):
depr = True
if depr:
warnings.warn(
"Support for nested sequences for 'parse_dates' in pd.read_csv "
"is deprecated. Combine the desired columns with pd.to_datetime "
"after parsing instead.",
FutureWarning,
stacklevel=find_stack_level(),
raise ValueError(
# GH#55569
"Nested sequences for 'parse_dates' is no longer supported. "
"Combine the desired columns with pd.to_datetime after parsing "
"instead."
)

if infer_datetime_format is not lib.no_default:
Expand Down