You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Check what's the latest date that I have data for in a given org/repo
Choose the next day after the latest date
Only download the remaining data so I don't double-up on data
I think it'd be useful if either:
We made it easier to get the answer to 1, so that people could programmatically update data by choosing a date that doesn't make redundant API calls
We added an option (e.g. "date='latest'") that will check for the last datetime in an org/repo and only download new data after that.
The goals of both of these would be to make it easier to incrementally update datasets instead of re-downloading them, since you start to his API limits when you download lots at once.
Reading the documentation actually might. I've also never reached the rate limit, so I'm not sure it's worth spending much time on this unless we see it is becoming a problem.
From my perspective it's actually less about API limits, and more about time. E.g. if I already have the last 2 years of data from a project, then it would be useful for watchtower to do something like "hey, you've already got this data, I won't waste the time to re-download it since it's already there".
isn't the "date" field always either created_at or date? We only have 4 kinds of objects we need to care about so it shouldn't be that hard to handle the date for each one.
e.g., it could be a function that'd exist in each submodule (e.g. comments_) and would follow a pattern like this:
defupdate_commits_from_latest(org, repo):
# Current data we've gotcurrent_comments=comments_.load_comments(org, repo)
# Find last day of datalast_date=current_comments['created_at'].max()
# Two day overlapfrom_date=last_date-timedelta(days=2)
# Update since that daycomments_.update_comments(org, repo, since=from_date)
Something that I often do is the following:
I think it'd be useful if either:
The goals of both of these would be to make it easier to incrementally update datasets instead of re-downloading them, since you start to his API limits when you download lots at once.
What do you think @NelleV ?
The text was updated successfully, but these errors were encountered: