fdsn client get_availability? #3002

tcths · 2022-03-22T16:45:17Z

Hello,

I am wondering if there is any interest in including the fdsn availability service in the retrieval capabilities of the fdsn client. I notice that the earthworm client has a get_availability method, and that get_availability and get_availability_extent methods are included in the API for obspy.clients.filesystem.tsindex.Client.

I have in mind something similar here, although I notice that the JSON retrieval capability (for example, https://service.iris.edu/fdsnws/availability/1/query?network=IU&station=ANMO&channel=BHZ&format=json) is already very handy. The get_availability methods mentioned above seem to prefer a list(tuple) return value.

I notice also some previous discussion here and here and here, but am not certain of the relationship between that and this.

PR Checklist

…rking get_availability method on FDSN client

filefolder · 2022-03-07T22:47:06Z

this is definitely something i would like to see; actually i thought this already existed

tcths · 2022-03-10T14:56:57Z

I'd like to try to do this. There is a beginning here: master...tcths:fdsn-client-getavailability

megies · 2022-03-10T16:03:08Z

👍 Feel free to open a PR right away, so people can give feedback

tcths · 2022-03-22T16:48:28Z

PR opened; comments welcome

…ent-getavailability

obspy/clients/fdsn/client.py

megies · 2022-03-23T14:38:21Z

obspy/clients/fdsn/client.py

+
+        availability = self._download(url, return_string=True).decode()
+        lines = [line.split() for line in availability.strip().split('\n')[1:]] # skip header 
+        extents = [(line[0], line[1], line[2], line[3], UTCDateTime(line[6]), UTCDateTime(line[7])) for line in lines]


I think how the return type looks like is one of the most important things to decide, but there might not be more sophisticated ways to return this I guess, so this simple structure might be fine

yes. I am looking at this and also at the API specification here as examples.

Relatedly, I notice that there are optional parameters in the specification ('merge', 'show') that change the number of fields included in the results from FDSN.

megies · 2022-03-23T14:45:00Z

I have two ocmments, above. Otherwise this looks pretty much ready, besides the many PEP8 fails, missing changelog etc

tcths · 2022-03-24T23:08:04Z

a list of unimplemented functionality possibly tbd:

writing to file is not implemented
optional parameters (merge, show) that change resultset length are not implemented
- should they be implemented?
- does this require a more 'sophisticated' return type?
extent endpoint request is not implemented
- note: 'show' and 'mergegaps' optional params are defined only for query endpoint
optional queryauth and extentauth endpoint requests are not implemented
- probably include_restricted FDSNWS parameter is only relevant if these are implemented?

megies · 2022-03-30T09:43:21Z

a list of unimplemented functionality possibly tbd:
* writing to file is not implemented

Do we even need that? It's just tuples of strings for the most part, and its easy for people to write it out should that need ever arise. If you are thinking of e.g. get_waveforms() having that option, there is more profound reason there, because it avoids reading+writing the MSEED data, which causes quite some low level information on miniseed block level to get lost.

* optional parameters (merge, show) that change resultset length are not implemented
  
  * should they be implemented?

So, I had a look at those optional parameters, and I saw that the availability service actually has two endpoints "query" and "extent". It looks like you only use "query", so it would be good to discuss how we treat this part. It looks like "extent" is just for a general overview, basically more or less the same as what "station" webservice would return in textual form? If thats the case we can probably ignore the "extent" endpoint and just keep as is, only using the "query" endpoint?

  * does this require a more 'sophisticated' return type?

Personally, for me it would be OK to have the return type vary depending on what options are specified, e.g. having more fields in the tuples when asking for the "last update time".
On the other hand, it happened before that I was thinking something like "would be nice to have something like a UTCTimeSpan object, with a start and end time and funtionality for merging etc". But that might be a much bigger endeavor and would need a lot of thinking through carefully.

* extent endpoint request is not implemented

Ah, should've read to the end first.. see above. To be honest, I'm quite confused by this "extent" thing. It seems to me it should be the same as "station" query with "matchtimeseries=true"? To me this "extent" thing seems like an obscure middle ground between going full detail with "availability/query" or going pure metadata with "station/query" (and what is that timespan with sampling rate 0.0??). I don't know why I would wanna use it, but if anybody sees reason for it, other opinions would be good to hear. Seems like bad design by FDSN working group having multiple ways to try and do the same..

station/query w/ and w/o includeavailability and/or matchtimeseries
availability/extent
availability/query

http://service.iris.edu/fdsnws/availability/1/extent?network=IU&station=ANMO&channel=BHZ&location=00

#Network Station Location Channel Quality SampleRate Earliest Latest Updated TimeSpans Restriction
IU ANMO 00 BHZ M 0.0 2002-08-28T18:17:51.000000Z 2008-05-23T23:09:24.000000Z 2017-12-06T03:42:35Z 9051 OPEN
IU ANMO 00 BHZ M 20.0 1998-10-26T20:35:58.310050Z 2018-07-09T20:45:47.369538Z 2021-09-10T07:10:56Z 2238 OPEN
IU ANMO 00 BHZ M 40.0 2018-07-09T20:46:40.594538Z 2022-03-28T23:59:59.994538Z 2022-03-29T06:56:15Z 24 OPEN

http://service.iris.edu/fdsnws/station/1/query?network=IU&station=ANMO&channel=BHZ&location=00&level=channel&format=text&matchtimeseries=true

#Network | Station | Location | Channel | Latitude | Longitude | Elevation | Depth | Azimuth | Dip | SensorDescription | Scale | ScaleFreq | ScaleUnits | SampleRate | StartTime | EndTime
IU|ANMO|00|BHZ|34.9459|-106.4572|1700.0|150.0|0.0|-90.0|Geotech KS-54000 Borehole Seismometer|8.64679E8|0.02|m/s|20.0|1998-10-26T20:00:00.0000|2000-10-19T16:00:00.0000
IU|ANMO|00|BHZ|34.9502|-106.4602|1743.0|96.0|0.0|-90.0|Geotech KS-54000 Borehole Seismometer|8.64679E8|0.02|m/s|20.0|2000-10-19T16:00:00.0000|2002-11-19T21:07:00.0000
IU|ANMO|00|BHZ|34.945981|-106.457133|1671.0|145.0|0.0|-90.0|Geotech KS-54000 Borehole Seismometer|8.11548E8|0.02|m/s|20.0|2002-11-19T21:07:00.0000|2008-06-30T00:00:00.0000
IU|ANMO|00|BHZ|34.945981|-106.457133|1671.0|145.0|0.0|-90.0|Geotech KS-54000 Borehole Seismometer|8.1872E8|0.02|m/s|20.0|2008-06-30T00:00:00.0000|2008-06-30T20:00:00.0000
IU|ANMO|00|BHZ|34.945981|-106.457133|1671.0|145.0|0.0|-90.0|Geotech KS-54000 Borehole Seismometer|3.27511E9|0.02|m/s|20.0|2008-06-30T20:00:00.0000|2011-02-18T19:11:00.0000
IU|ANMO|00|BHZ|34.945981|-106.457133|1671.0|145.0|0.0|-90.0|Geotech KS-54000 Borehole Seismometer|3.27511E9|0.02|m/s|20.0|2011-02-18T19:11:00.0000|2012-03-12T20:28:00.0000
IU|ANMO|00|BHZ|34.945981|-106.457133|1671.0|145.0|0.0|-90.0|Geotech KS-54000 Borehole Seismometer|3.27511E9|0.02|m/s|20.0|2012-03-12T20:28:00.0000|2014-12-17T18:40:00.0000
IU|ANMO|00|BHZ|34.94591|-106.4572|1671.0|145.0|0.0|-90.0|Geotech KS-54000 Borehole Seismometer|3.40413E9|0.02|m/s|20.0|2014-12-17T18:40:00.0000|2018-07-09T20:45:00.0000
IU|ANMO|00|BHZ|34.94591|-106.4572|1632.7|188.0|0.0|-90.0|Streckeisen STS-6A VBB Seismometer|1.98475E9|0.02|m/s|40.0|2018-07-09T20:45:00.0000|

  * note: 'show' and 'mergegaps' optional params are defined only for query endpoint

* optional queryauth and extentauth endpoint requests are not implemented
  
  * probably include_restricted FDSNWS parameter is only relevant if these are implemented?

Probably.. why would there be authenticated requests if it showed restricted stations in unauthenticated requests. But on server side they sure could have two layers of obscurity, i.e. have some restricted stations show info on unauthenticated requests but others not.

queryauth should be trivial to add, tbh. All the authentication is already handled.

1446     def _build_url(self, service, resource_type, parameters={}):
1447         """
1448         Builds the correct URL.
1449 
1450         Replaces "query" with "queryauth" if client has authentication
1451         information.
1452         """
1453         # authenticated dataselect queries have different target URL
1454         if self.user is not None:
1455             if service == "dataselect" and resource_type == "query":
1456                 resource_type = "queryauth"
1457         return build_url(self.base_url, service, self.major_versions[service],
1458                          resource_type, parameters,
1459                          service_mappings=self._service_mappings,
1460                          subpath=self.url_subpath)

Just have to change line 1455

1455             if service in ("dataselect", "availability") and resource_type == "query":

tcths · 2022-04-19T14:51:59Z

One possible use case for the availability/extent endpoint is to obtain a list of streams for which availability information is available. I suppose this could be inferred from the station/query endpoint but the availability/extent seems to provide the information a little bit more cleanly and closer to the source of truth. On the other hand to rely on the availability/query endpoint for this information could be very inefficient in the case, for example, where a client wants to provide a list of stations and allow a user to select from them for full availability information. This is just what I have observed, I do not speculate concerning the intent :)

Regarding writing to file, I agree that if there is not a compelling reason to include that in the get_* method then it is nicer to exclude it.

So the list of things to do:

(possibly) implement extent endpoint request (discussion still open)
implement optional parameters
1. query endpoint
  1. show
  2. merge
2. extent endpoint
  1. merge
3. return types (discussion still open)
  1. option: a sequence of variable length sequences (e.g. a list of tuples)
  2. option: a sequence of UTCTimeSpan objects
implement auth endpoints ( should be trivial, as noted above)

We could go ahead and implement all this with list of variable length tuples as the return type and then see if there is time available to think about the UTCTimeSpan objects? Even if we eventually decided to move forward with the UTCTimeSpan objects there would not be a lot of wasted effort, a few unit tests maybe.

megies · 2022-04-25T09:58:19Z

(possibly) implement extent endpoint request (discussion still open)

I had another look, how about the following.. add a kwarg details=True and use endpoint query by default and fall back to extent endpoint when setting details=False. That's the impression I get from these endpoints semantically, in any case.

It seems that all extent does is merge timespans that have the same SEED ID (but have some instrument changes or sampling rate changes etc.) and it adds three fields "Updated TimeSpans Restriction".

implement optional parameters

I think this can be unified like above, e.g. we could just raise an exception if orderby gets one of these values only specified for extent endpoint when requesting details=True. Same with mergegaps and show, just raise an exception if these are used with details=False.

iii. return types (discussion still open)

I see three options..

plain tuples.
- positive: KISS
- negative: variable length return types are kinda ugly
namedtuple.
- positive: simple, but fields can be used by their name. additional fields from extent / "details=False" could be handled the same for both result types (with just defaults of None in the one case)
- negative: if we at some later point decide to have a time span object, we can not replace the result type of these methods without breaking peoples' codes
create a TimeSpan object right now
- positive: most canonical, additional functionality (like merging time spans etc) could be added later without breaking peoples' codes
- negative: if we overlook any basic concepts needed for this new object, we might set it in stone in a way that we need to change later on (breaking peoples' codes), but if we start it out real simple we should be able to expand on it later

from obspy import UTCDateTime
import collections

Availability = collections.namedtuple(
    'Availability',
    ['network', 'station', 'location', 'channel', 'quality', 'sampling_rate',
     'earliest', 'latest', 'updated', 'time_spans', 'restriction'],
    defaults=(None, None, None))

line = 'IU ANMO 00 BHZ M 20.0 1998-10-26T20:35:58.310050Z 1998-10-26T20:37:31.610050Z'
items = line.split()
items[5] = float(items[5])
items[6] = UTCDateTime(items[6])
items[7] = UTCDateTime(items[7])

x = Availability(*items)

I'm kinda thinking maybe we should just add a UTCTimeSpan object right now and just keep it as simple as possible for now so we can keep working on it later on..?
We could have the results be a list of such objects and even later if we decide we want more functionality on top we could replace it with some object type UTCTimeSpans(list) without breaking things.

class UTCTimeSpan(object):
    def __init__(self, start=None, end=None):
        self.start = start
        self.end = end
        
    @property
    def start(self):
        return self._start

    @start.setter
    def start(self, value):
        if value is None:
            self._start = None
        else:
            self._start = UTCDateTime(value)

    @property
    def end(self):
        return self._end

    @end.setter
    def end(self, value):
        if value is None:
            self._end = None
        else:
            self._end = UTCDateTime(value)

We could a) use that directly and just set fields as needed for the return types or b) go all out and define some class Availability(UTCTimeSpan) right away, defining all the fields in it. Either way would be fine I think and if we start simple (a) we could expand to option (b) later on without breaking things, most likely.

Other opinions? Maybe @trichter?

megies · 2022-11-15T12:50:25Z

Tempted to bump this to next version, to speed up getting 1.4.0 out.. I think there was still some things unclear about how to do some of the implementation? What so you think @tcths? anybody wanna offer opinions on above implementation discussion?

iandkelly · 2024-03-09T00:24:14Z

Any updates on this? Following with interest!

obspy-bot · 2024-03-09T00:54:05Z

This pull request has been mentioned on ObsPy Forum. There might be relevant details there:

https://discourse.obspy.org/t/determining-station-data-availability-from-inventory/1895/1

tcths added 2 commits February 28, 2022 17:08

first commit on branch; availability WADL parsing; basic tests and wo…

fdf05f5

…rking get_availability method on FDSN client

remove pytestdebug.log

0771ea6

megies added the .clients.fdsn label Mar 8, 2022

megies added this to the 1.4.0 milestone Mar 10, 2022

tcths marked this pull request as draft March 22, 2022 17:24

Merge branch 'master' of https://github.com/obspy/obspy into fdsn-cli…

17b3a02

…ent-getavailability

trichter added the test_network tell github actions to also run network tests for this PR label Mar 22, 2022

megies reviewed Mar 23, 2022

View reviewed changes

obspy/clients/fdsn/client.py Outdated Show resolved Hide resolved

megies reviewed Mar 23, 2022

View reviewed changes

tcths added 3 commits March 23, 2022 12:17

autopep8

3ffeb29

Implement suggestion to use BytesIO.readlines

aa471f5

modify optional params to conform to the spec

f5bd6d7

add optional mergegaps parameter

dd975a7

queryauth

373bb95

megies modified the milestones: 1.4.0, 1.5.0 Nov 17, 2022

megies mentioned this pull request Mar 15, 2023

Adding capacity to use fdsnws-availability webservice #3278

Closed

vpet98 mentioned this pull request Mar 26, 2024

Availability webservice support, continuing #3002 #3423

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fdsn client get_availability? #3002

fdsn client get_availability? #3002

tcths commented Mar 22, 2022 •

edited by megies

filefolder commented Mar 7, 2022

tcths commented Mar 10, 2022

megies commented Mar 10, 2022

tcths commented Mar 22, 2022

megies Mar 23, 2022

tcths Mar 24, 2022 •

edited

megies commented Mar 23, 2022

tcths commented Mar 24, 2022 •

edited

megies commented Mar 30, 2022

tcths commented Apr 19, 2022

megies commented Apr 25, 2022

megies commented Nov 15, 2022

iandkelly commented Mar 9, 2024

obspy-bot commented Mar 9, 2024

fdsn client get_availability? #3002

Are you sure you want to change the base?

fdsn client get_availability? #3002

Conversation

tcths commented Mar 22, 2022 • edited by megies

PR Checklist

filefolder commented Mar 7, 2022

tcths commented Mar 10, 2022

megies commented Mar 10, 2022

tcths commented Mar 22, 2022

megies Mar 23, 2022

Choose a reason for hiding this comment

tcths Mar 24, 2022 • edited

Choose a reason for hiding this comment

megies commented Mar 23, 2022

tcths commented Mar 24, 2022 • edited

megies commented Mar 30, 2022

tcths commented Apr 19, 2022

megies commented Apr 25, 2022

megies commented Nov 15, 2022

iandkelly commented Mar 9, 2024

obspy-bot commented Mar 9, 2024

tcths commented Mar 22, 2022 •

edited by megies

tcths Mar 24, 2022 •

edited

tcths commented Mar 24, 2022 •

edited