merge baseurl and pattern for scraper clients (#7077) #7227

nabobalis · 2023-10-09T20:14:34Z

This is the main PR to update the scarper, we want this in for 5.1 but we have two questions to solve:

How do we handle the breaking changes?
a. We can create a "Scraper 2" and deprecate the old class?
b. We could YOLO it but I do not want to do that.
Need a migration guide of somesort.

Cadair · 2023-10-10T10:43:03Z

The windows test fails look real fwiw.

nabobalis · 2023-10-10T10:44:02Z

The windows test fails look real fwiw.

Oh yeah, I forgot. I will debug that and see what the problem is.

nabobalis · 2023-10-10T10:54:30Z

Outside of that, what are your thoughts on my questions/todo list in the main body?
Also on the code changes?

dstansby

I think we really have to go through a deprecation period here for the old behaviour. One way I can see of doing this is:

Make pattern optional (ie. , pattern=None)
Add a new keyword-only argument, format, to take the new format
Raise an error if both pattern and format are provided
Use old behaviour, and raise a deprecation warning saying to use new format=... instead if only pattern passed
Use new behaviour if only format passed.

samaloney

Similar to what @dstansby said need a small migration guide even just pulling some examples of the changes from this PR could work.
Also need to clean way to deprecate as I know of at at least two packages that use scraper outside of sunpy core.

samaloney · 2024-02-21T12:46:14Z

sunpy/net/scraper_utils.py

+
+def extract_timestep(directoryPattern):
+    """
+    Obtain the smaller time step for the given pattern.


Suggested change

Obtain the smaller time step for the given pattern.

Obtain the smallest time step for the given pattern.

samaloney · 2024-02-21T14:48:40Z

sunpy/net/scraper_utils.py

+    date_parts = [int(p) for p in date.strftime('%Y,%m,%d,%H,%M,%S').split(',')]
+    date_parts[-1] = date_parts[-1] % 60
+    date = datetime(*date_parts)
+    orig_time_tup = date.timetuple()
+    time_tup = [orig_time_tup.tm_year, orig_time_tup.tm_mon, orig_time_tup.tm_mday,
+                orig_time_tup.tm_hour, orig_time_tup.tm_min, orig_time_tup.tm_sec]
+    if timestep == relativedelta(minutes=1):
+        time_tup[-1] = 0
+    elif timestep == relativedelta(hours=1):
+        time_tup[-2:] = [0, 0]
+    elif timestep == relativedelta(days=1):
+        time_tup[-3:] = [0, 0, 0]
+    elif timestep == relativedelta(months=1):
+        time_tup[-4:] = [1, 0, 0, 0]
+    elif timestep == relativedelta(years=1):
+        time_tup[-5:] = [1, 1, 0, 0, 0]


I think I'm originally responsible for this mess I think it can be done in cleaner way something like

def date_floor(date, step): floor_date = date.copy() if step >= relativedelta(minutes=1): floor_date.replace(minutes=0) if step >= relativedelta(hours=1): floor_date.replace(hours=0) ... return floor_date

* merge baseurl and pattern for scraper clients (#7077) * precommit fixes * added some bit of error handling in the scraper * Update sunpy/net/dataretriever/client.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * made some minor changes * Update sunpy/net/scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * added some changes to the scraper and also added the tentative tests * modified tests * added some bit of error handling in the scraper * made some minor changes * Update sunpy/net/dataretriever/client.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * added some changes to the scraper and also added the tentative tests * Update sunpy/net/scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * modified tests * refactored the tests to scraper test files and restored the scraper code * Update sunpy/net/scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * clean ups * Update sunpy/net/scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * made tests offline * Update sunpy/net/scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * added urlerror test * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * resolved conflicts * clean ups * parametrized * clean ups * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * clean ups * added explanation * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * added 404 test * parametrized the tests * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * log level set * Update sunpy/net/scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * log level using caplog * log level using caplog * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> * Update sunpy/net/tests/test_scraper.py Co-authored-by: Nabil Freij <nabil.freij@gmail.com> --------- Co-authored-by: Akshit Tyagi <37214399+exitflynn@users.noreply.github.com> Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

exitflynn · 2024-04-25T14:02:45Z

hey everyone, got superbusy with some other stuff for a while but i'd love to get involved again. i just figured out why the windows tests were failing, gonna make a quick PR to the scraper branch for that

fix failing windows tests

exitflynn · 2024-04-28T06:56:31Z

I tried fixing the failing doctests but i cannot seem to figure them out. Let me know if i can help out in any other way with this feature!

nabobalis · 2024-04-28T07:20:31Z

Let us see if merging in main helps.

nabobalis added Needs Review Needs reviews before merge net Affects the net submodule Whats New? Needs a section added to the current Whats New? page. labels Oct 9, 2023

nabobalis requested a review from a team as a code owner October 9, 2023 20:14

nabobalis added this to the 5.1.0 milestone Oct 9, 2023

nabobalis added the No Changelog Entry Needed label Oct 9, 2023

dstansby requested changes Oct 17, 2023

View reviewed changes

dstansby marked this pull request as draft October 22, 2023 17:47

dstansby removed this from the 5.1.0 milestone Oct 24, 2023

nabobalis force-pushed the scraper_rewrite branch from f883a0d to 07514ae Compare October 31, 2023 04:10

samaloney suggested changes Feb 21, 2024

View reviewed changes

wtbarnes removed the Needs Review Needs reviews before merge label Mar 14, 2024

merge baseurl and pattern for scraper clients (#7077)

ea6a405

nabobalis force-pushed the scraper_rewrite branch from 07514ae to b66235d Compare April 4, 2024 23:33

precommit fixes

b393491

nabobalis force-pushed the scraper_rewrite branch from b66235d to b393491 Compare April 4, 2024 23:36

exitflynn and others added 2 commits April 25, 2024 19:47

fix failing windows tests

9900ab9

Merge pull request #7601 from exitflynn/scraper_rewrite

c46ab8d

fix failing windows tests

Merge branch 'main' into scraper_rewrite

7c7f600

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge baseurl and pattern for scraper clients (#7077) #7227

merge baseurl and pattern for scraper clients (#7077) #7227

nabobalis commented Oct 9, 2023

Cadair commented Oct 10, 2023

nabobalis commented Oct 10, 2023

nabobalis commented Oct 10, 2023

dstansby left a comment

samaloney left a comment

samaloney Feb 21, 2024

samaloney Feb 21, 2024

exitflynn commented Apr 25, 2024

exitflynn commented Apr 28, 2024

nabobalis commented Apr 28, 2024

	Obtain the smaller time step for the given pattern.
	Obtain the smallest time step for the given pattern.

merge baseurl and pattern for scraper clients (#7077) #7227

Are you sure you want to change the base?

merge baseurl and pattern for scraper clients (#7077) #7227

Conversation

nabobalis commented Oct 9, 2023

Cadair commented Oct 10, 2023

nabobalis commented Oct 10, 2023

nabobalis commented Oct 10, 2023

dstansby left a comment

Choose a reason for hiding this comment

samaloney left a comment

Choose a reason for hiding this comment

samaloney Feb 21, 2024

Choose a reason for hiding this comment

samaloney Feb 21, 2024

Choose a reason for hiding this comment

exitflynn commented Apr 25, 2024

exitflynn commented Apr 28, 2024

nabobalis commented Apr 28, 2024