BUG: improve pd.io.json_normalize #57811

slavanorm · 2024-03-11T17:48:55Z

closes BUG: pd.json_normalize improvement #57810
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/v2.2.2.rst file if fixing a bug or adding a new feature.

datapythonista · 2024-03-12T12:51:50Z

@slavanorm in case you haven't seen it, your changes are making the tests fail: https://github.com/pandas-dev/pandas/actions/runs/8239141116/job/22531710877?pr=57811#step:8:53

WillAyd · 2024-03-13T02:38:57Z

I'm not sure about this change - the point of the errors argument is to ignore missing keys. Shouldn't the test case you added still create the column but with all empty data?

slavanorm · 2024-03-13T08:41:04Z

yes it should but it creates rows only for the dictionaries with record path.
i will edit the fixture and assertion in order to get into this case of if

slavanorm · 2024-03-13T13:53:56Z

This docs check is failing, and its not fixable. Wonder what should we do now

doc/source/whatsnew/v2.2.2.rst

pandas/tests/io/json/test_normalize.py

slavanorm · 2024-03-14T10:02:35Z

pandas/io/json/_normalize.py

@WillAyd
please see lines 449-452. this code is highly related to errors=ignore.

Yea but why should it not work for errors="raise"? The errors argument is specifically for testing the presence of a key, which exists here. My major point is that we are mixing up two different types of errors if we don't separate the concepts of missing keys versus empty values

Hm, it is not very different for current code when we traverse and make early stop returning n/a.
I guess you propose raising errors only for missing keys? And missing values should be always safe?

I guess if we make such a feature, it should provide choice to user, which case should raise

Yea I think the error should just be for missing keys. There should be no such thing as a missing value in well-formed JSON

just wanted to point that key or value is arbitrary in our case.
value is just the end of json walking we defined in meta or record_path

pandas/io/json/_normalize.py

slavanorm · 2024-03-26T14:50:15Z

I fixed the code, wish someone reviewed it

pandas/tests/io/json/test_normalize.py

slavanorm · 2024-04-01T10:52:27Z

sorry i closed it by mistake, just reopened it

slavanorm · 2024-04-08T12:21:01Z

@WillAyd could you please review the code again, it's been 2 weeks i think

WillAyd · 2024-04-09T03:56:23Z

pandas/tests/io/json/test_normalize.py

@@ -663,6 +676,22 @@ def test_missing_nested_meta(self):
                errors="raise",
            )

+    def test_missing_nested_meta_traverse_empty_list_errors_ignore(self):
+        # If errors="ignore" and nested metadata is nullable, return nan
+        data = {"meta": "foo", "nested_meta": [], "value": [{"rec": 1}, {"rec": 2}]}


Suggested change

data = {"meta": "foo", "nested_meta": [], "value": [{"rec": 1}, {"rec": 2}]}

data = {"meta": "foo", "nested_meta": {}, "value": [{"rec": 1}, {"rec": 2}]}

I have lost track a bit on what we are trying to accomplish but this doesn't seem right to ignore an error when the meta argument starts peering into in an array instead of an object. Is this a TypeError on main or a KeyError? I feel like we are mixing the two up when we should not be

mroeschke · 2024-05-15T17:54:17Z

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

slavanorm · 2024-05-15T21:05:25Z

thank you too.
now the code is obsolete vs main branch but its ok.

on second thought, the code does not actually require merging to pandas library.
it got stale because of lack of interaction.
everything was done on my side and all requests were adressed, but were not reviewed on time.

best regards

slavanorm added 5 commits March 11, 2024 21:02

Update _normalize.py

606e96d

GH pandas-dev#57810 pandas-dev#57810

Update test_normalize.py

cc7aec8

Update v2.2.2.rst

0793c40

Merge branch 'pandas-dev:main' into main

9647d1d

Update _normalize.py

391ff1a

datapythonista added Bug IO JSON read_json, to_json, json_normalize labels Mar 12, 2024

slavanorm and others added 5 commits March 12, 2024 20:13

Update test_normalize.py

a3a22e8

Merge branch 'pandas-dev:main' into main

15901a9

Update _normalize.py

abc9974

precommit fix

3769b1f

precommit fix

0b91678

slavanorm added 2 commits March 13, 2024 11:37

test assertions

dc0d6f8

test assertions

d7bd075

slavanorm and others added 4 commits March 13, 2024 12:46

test assertion

03f1dc1

fix test

6b5d760

fix test

9a72855

Merge branch 'main' into main

194ff85

WillAyd requested changes Mar 14, 2024

View reviewed changes

doc/source/whatsnew/v2.2.2.rst Outdated Show resolved Hide resolved

pandas/tests/io/json/test_normalize.py Show resolved Hide resolved

slavanorm commented Mar 14, 2024

View reviewed changes

slavanorm and others added 3 commits March 14, 2024 14:03

Update v2.2.2.rst

eb71e1b

testcase

dfc6376

Merge branch 'main' of https://www.github.com/slavanorm/pandas

61e8dcc

slavanorm commented Mar 14, 2024

View reviewed changes

pandas/io/json/_normalize.py Outdated Show resolved Hide resolved

Merge branch 'pandas-dev:main' into main

aeae288

slavanorm requested a review from WillAyd March 14, 2024 19:46

slavanorm and others added 8 commits March 25, 2024 23:51

precommit

f24dc1b

Merge branch 'pandas-dev:main' into main

69c5243

fix tests

296ccc5

Merge branch 'pandas-dev:main' into main

51179b1

Update _normalize.py

1e8d43a

Merge branch 'main' of https://github.com/slavanorm/pandas

6849ffe

fix tests

fde9565

precommit

7e10e5a

WillAyd requested changes Mar 26, 2024

View reviewed changes

pandas/tests/io/json/test_normalize.py Outdated Show resolved Hide resolved

slavanorm and others added 2 commits March 27, 2024 21:09

precommit

2ef598d

Merge branch 'main' into main

ef857ba

slavanorm requested a review from WillAyd March 28, 2024 10:17

merge

9caace9

slavanorm requested review from rhshadrach and datapythonista as code owners March 29, 2024 15:12

slavanorm closed this Mar 29, 2024

merge

a73e49f

slavanorm reopened this Apr 1, 2024

slavanorm added 3 commits April 1, 2024 14:52

Merge branch 'pandas-dev:main' into main

0db9759

Merge branch 'pandas-dev:main' into main

a27d7a9

Merge branch 'main' into main

51062d7

WillAyd requested changes Apr 9, 2024

View reviewed changes

slavanorm added 2 commits April 12, 2024 16:25

Merge branch 'main' into main

750a270

Merge branch 'pandas-dev:main' into main

11138d8

mroeschke closed this May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: improve pd.io.json_normalize #57811

BUG: improve pd.io.json_normalize #57811

slavanorm commented Mar 11, 2024 •

edited

datapythonista commented Mar 12, 2024

WillAyd commented Mar 13, 2024

slavanorm commented Mar 13, 2024

slavanorm commented Mar 13, 2024

slavanorm Mar 14, 2024 •

edited

WillAyd Mar 14, 2024

slavanorm Mar 15, 2024 •

edited

WillAyd Mar 19, 2024

slavanorm Mar 19, 2024

slavanorm commented Mar 26, 2024

slavanorm commented Apr 1, 2024

slavanorm commented Apr 8, 2024

WillAyd Apr 9, 2024

mroeschke commented May 15, 2024

slavanorm commented May 15, 2024

	data = {"meta": "foo", "nested_meta": [], "value": [{"rec": 1}, {"rec": 2}]}
	data = {"meta": "foo", "nested_meta": {}, "value": [{"rec": 1}, {"rec": 2}]}

BUG: improve pd.io.json_normalize #57811

BUG: improve pd.io.json_normalize #57811

Conversation

slavanorm commented Mar 11, 2024 • edited

datapythonista commented Mar 12, 2024

WillAyd commented Mar 13, 2024

slavanorm commented Mar 13, 2024

slavanorm commented Mar 13, 2024

slavanorm Mar 14, 2024 • edited

Choose a reason for hiding this comment

WillAyd Mar 14, 2024

Choose a reason for hiding this comment

slavanorm Mar 15, 2024 • edited

Choose a reason for hiding this comment

WillAyd Mar 19, 2024

Choose a reason for hiding this comment

slavanorm Mar 19, 2024

Choose a reason for hiding this comment

slavanorm commented Mar 26, 2024

slavanorm commented Apr 1, 2024

slavanorm commented Apr 8, 2024

WillAyd Apr 9, 2024

Choose a reason for hiding this comment

mroeschke commented May 15, 2024

slavanorm commented May 15, 2024

slavanorm commented Mar 11, 2024 •

edited

slavanorm Mar 14, 2024 •

edited

slavanorm Mar 15, 2024 •

edited