Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge baseurl and pattern for scraper clients (#7077) #7227

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

nabobalis
Copy link
Contributor

This is the main PR to update the scarper, we want this in for 5.1 but we have two questions to solve:

  1. How do we handle the breaking changes?
    a. We can create a "Scraper 2" and deprecate the old class?
    b. We could YOLO it but I do not want to do that.
  2. Need a migration guide of somesort.

@nabobalis nabobalis added Needs Review Needs reviews before merge net Affects the net submodule Whats New? Needs a section added to the current Whats New? page. labels Oct 9, 2023
@nabobalis nabobalis requested a review from a team as a code owner October 9, 2023 20:14
@nabobalis nabobalis added this to the 5.1.0 milestone Oct 9, 2023
@Cadair
Copy link
Member

Cadair commented Oct 10, 2023

The windows test fails look real fwiw.

@nabobalis
Copy link
Contributor Author

The windows test fails look real fwiw.

Oh yeah, I forgot. I will debug that and see what the problem is.

@nabobalis
Copy link
Contributor Author

Outside of that, what are your thoughts on my questions/todo list in the main body?
Also on the code changes?

Copy link
Member

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we really have to go through a deprecation period here for the old behaviour. One way I can see of doing this is:

  • Make pattern optional (ie. , pattern=None)
  • Add a new keyword-only argument, format, to take the new format
  • Raise an error if both pattern and format are provided
  • Use old behaviour, and raise a deprecation warning saying to use new format=... instead if only pattern passed
  • Use new behaviour if only format passed.

@dstansby dstansby marked this pull request as draft October 22, 2023 17:47
@dstansby dstansby removed this from the 5.1.0 milestone Oct 24, 2023
Copy link
Contributor

@samaloney samaloney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to what @dstansby said need a small migration guide even just pulling some examples of the changes from this PR could work.
Also need to clean way to deprecate as I know of at at least two packages that use scraper outside of sunpy core.


def extract_timestep(directoryPattern):
"""
Obtain the smaller time step for the given pattern.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Obtain the smaller time step for the given pattern.
Obtain the smallest time step for the given pattern.

Comment on lines +61 to +76
date_parts = [int(p) for p in date.strftime('%Y,%m,%d,%H,%M,%S').split(',')]
date_parts[-1] = date_parts[-1] % 60
date = datetime(*date_parts)
orig_time_tup = date.timetuple()
time_tup = [orig_time_tup.tm_year, orig_time_tup.tm_mon, orig_time_tup.tm_mday,
orig_time_tup.tm_hour, orig_time_tup.tm_min, orig_time_tup.tm_sec]
if timestep == relativedelta(minutes=1):
time_tup[-1] = 0
elif timestep == relativedelta(hours=1):
time_tup[-2:] = [0, 0]
elif timestep == relativedelta(days=1):
time_tup[-3:] = [0, 0, 0]
elif timestep == relativedelta(months=1):
time_tup[-4:] = [1, 0, 0, 0]
elif timestep == relativedelta(years=1):
time_tup[-5:] = [1, 1, 0, 0, 0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm originally responsible for this mess I think it can be done in cleaner way something like

def date_floor(date, step):
    floor_date = date.copy()
    if step >= relativedelta(minutes=1):
        floor_date.replace(minutes=0)
    if step >= relativedelta(hours=1):
        floor_date.replace(hours=0)
    ...
    return floor_date

@wtbarnes wtbarnes removed the Needs Review Needs reviews before merge label Mar 14, 2024
* merge baseurl and pattern for scraper clients (#7077)

* precommit fixes

* added some bit of error handling in the scraper

* Update sunpy/net/dataretriever/client.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* made some minor changes

* Update sunpy/net/scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* added some changes to the scraper and also added the tentative tests

* modified tests

* added some bit of error handling in the scraper

* made some minor changes

* Update sunpy/net/dataretriever/client.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* added some changes to the scraper and also added the tentative tests

* Update sunpy/net/scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* modified tests

* refactored the tests to scraper test files and restored the scraper code

* Update sunpy/net/scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* clean ups

* Update sunpy/net/scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* made tests offline

* Update sunpy/net/scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* added urlerror test

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* resolved conflicts

* clean ups

* parametrized

* clean ups

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* clean ups

* added explanation

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* added 404 test

* parametrized the tests

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* log level set

* Update sunpy/net/scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* log level using caplog

* log level using caplog

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

* Update sunpy/net/tests/test_scraper.py

Co-authored-by: Nabil Freij <nabil.freij@gmail.com>

---------

Co-authored-by: Akshit Tyagi <37214399+exitflynn@users.noreply.github.com>
Co-authored-by: Nabil Freij <nabil.freij@gmail.com>
@exitflynn
Copy link
Contributor

hey everyone, got superbusy with some other stuff for a while but i'd love to get involved again. i just figured out why the windows tests were failing, gonna make a quick PR to the scraper branch for that

@exitflynn
Copy link
Contributor

I tried fixing the failing doctests but i cannot seem to figure them out. Let me know if i can help out in any other way with this feature!

@nabobalis
Copy link
Contributor Author

Let us see if merging in main helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
net Affects the net submodule No Changelog Entry Needed Whats New? Needs a section added to the current Whats New? page.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants