-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fdsn client get_availability? #3002
base: master
Are you sure you want to change the base?
Conversation
…rking get_availability method on FDSN client
this is definitely something i would like to see; actually i thought this already existed |
I'd like to try to do this. There is a beginning here: master...tcths:fdsn-client-getavailability |
👍 Feel free to open a PR right away, so people can give feedback |
PR opened; comments welcome |
…ent-getavailability
obspy/clients/fdsn/client.py
Outdated
|
||
availability = self._download(url, return_string=True).decode() | ||
lines = [line.split() for line in availability.strip().split('\n')[1:]] # skip header | ||
extents = [(line[0], line[1], line[2], line[3], UTCDateTime(line[6]), UTCDateTime(line[7])) for line in lines] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think how the return type looks like is one of the most important things to decide, but there might not be more sophisticated ways to return this I guess, so this simple structure might be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I am looking at this and also at the API specification here as examples.
Relatedly, I notice that there are optional parameters in the specification ('merge', 'show') that change the number of fields included in the results from FDSN.
I have two ocmments, above. Otherwise this looks pretty much ready, besides the many PEP8 fails, missing changelog etc |
a list of unimplemented functionality possibly tbd:
|
Do we even need that? It's just tuples of strings for the most part, and its easy for people to write it out should that need ever arise. If you are thinking of e.g.
So, I had a look at those optional parameters, and I saw that the availability service actually has two endpoints "query" and "extent". It looks like you only use "query", so it would be good to discuss how we treat this part. It looks like "extent" is just for a general overview, basically more or less the same as what
Personally, for me it would be OK to have the return type vary depending on what options are specified, e.g. having more fields in the tuples when asking for the "last update time".
Ah, should've read to the end first.. see above. To be honest, I'm quite confused by this "extent" thing. It seems to me it should be the same as "station" query with "matchtimeseries=true"? To me this "extent" thing seems like an obscure middle ground between going full detail with "availability/query" or going pure metadata with "station/query" (and what is that timespan with sampling rate
http://service.iris.edu/fdsnws/availability/1/extent?network=IU&station=ANMO&channel=BHZ&location=00
Probably.. why would there be authenticated requests if it showed restricted stations in unauthenticated requests. But on server side they sure could have two layers of obscurity, i.e. have some restricted stations show info on unauthenticated requests but others not. queryauth should be trivial to add, tbh. All the authentication is already handled. 1446 def _build_url(self, service, resource_type, parameters={}):
1447 """
1448 Builds the correct URL.
1449
1450 Replaces "query" with "queryauth" if client has authentication
1451 information.
1452 """
1453 # authenticated dataselect queries have different target URL
1454 if self.user is not None:
1455 if service == "dataselect" and resource_type == "query":
1456 resource_type = "queryauth"
1457 return build_url(self.base_url, service, self.major_versions[service],
1458 resource_type, parameters,
1459 service_mappings=self._service_mappings,
1460 subpath=self.url_subpath) Just have to change line 1455
|
One possible use case for the availability/extent endpoint is to obtain a list of streams for which availability information is available. I suppose this could be inferred from the station/query endpoint but the availability/extent seems to provide the information a little bit more cleanly and closer to the source of truth. On the other hand to rely on the availability/query endpoint for this information could be very inefficient in the case, for example, where a client wants to provide a list of stations and allow a user to select from them for full availability information. This is just what I have observed, I do not speculate concerning the intent :) Regarding writing to file, I agree that if there is not a compelling reason to include that in the get_* method then it is nicer to exclude it. So the list of things to do:
We could go ahead and implement all this with list of variable length tuples as the return type and then see if there is time available to think about the UTCTimeSpan objects? Even if we eventually decided to move forward with the UTCTimeSpan objects there would not be a lot of wasted effort, a few unit tests maybe. |
I had another look, how about the following.. add a kwarg It seems that all
I think this can be unified like above, e.g. we could just raise an exception if
I see three options..
from obspy import UTCDateTime
import collections
Availability = collections.namedtuple(
'Availability',
['network', 'station', 'location', 'channel', 'quality', 'sampling_rate',
'earliest', 'latest', 'updated', 'time_spans', 'restriction'],
defaults=(None, None, None))
line = 'IU ANMO 00 BHZ M 20.0 1998-10-26T20:35:58.310050Z 1998-10-26T20:37:31.610050Z'
items = line.split()
items[5] = float(items[5])
items[6] = UTCDateTime(items[6])
items[7] = UTCDateTime(items[7])
x = Availability(*items) I'm kinda thinking maybe we should just add a class UTCTimeSpan(object):
def __init__(self, start=None, end=None):
self.start = start
self.end = end
@property
def start(self):
return self._start
@start.setter
def start(self, value):
if value is None:
self._start = None
else:
self._start = UTCDateTime(value)
@property
def end(self):
return self._end
@end.setter
def end(self, value):
if value is None:
self._end = None
else:
self._end = UTCDateTime(value) We could a) use that directly and just set fields as needed for the return types or b) go all out and define some Other opinions? Maybe @trichter? |
Tempted to bump this to next version, to speed up getting 1.4.0 out.. I think there was still some things unclear about how to do some of the implementation? What so you think @tcths? anybody wanna offer opinions on above implementation discussion? |
Any updates on this? Following with interest! |
This pull request has been mentioned on ObsPy Forum. There might be relevant details there: https://discourse.obspy.org/t/determining-station-data-availability-from-inventory/1895/1 |
Hello,
I am wondering if there is any interest in including the fdsn availability service in the retrieval capabilities of the fdsn client. I notice that the earthworm client has a get_availability method, and that get_availability and get_availability_extent methods are included in the API for obspy.clients.filesystem.tsindex.Client.
I have in mind something similar here, although I notice that the JSON retrieval capability (for example, https://service.iris.edu/fdsnws/availability/1/query?network=IU&station=ANMO&channel=BHZ&format=json) is already very handy. The get_availability methods mentioned above seem to prefer a list(tuple) return value.
I notice also some previous discussion here and here and here, but am not certain of the relationship between that and this.
PR Checklist
master
for new features,maintenance_...
for bug fixesJust add the "build_docs" tag to this PR.
Docs will be served at docs.obspy.org/pr/{branch_name} (do not use master branch).
Please post a link to the relevant piece of documentation.
clients.fdsn
) should be tested for the PR,just add the "test_network" tag to this PR.
CHANGELOG.txt
.CONTRIBUTORS.txt
.from all the CI builds look correct. Add the "upload_plots" tag so that plotting
outputs are attached as artifacts.