Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Support more metadata for templates and each template trace #506

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

flixha
Copy link
Collaborator

@flixha flixha commented Aug 2, 2022

What does this PR do?

Implement first ideas / attempts for adding more metadata to templates, including trace-specific metadata.

Main points:

  • trace.stats.extra is populated with trace metadata during template generation / template reading
  • to read / write the trace metadata in a tribe tar-archive, the values are collected from the traces and stored in an AttribDict in event.extra.
  • trace metadata added for now: weight, peak_snr, rms_snr, starttime, endtime, length_npts (i.e., the actual starttime / endtime / npts of each trace BEFORE writing, to allow checks when reading)

Why was it initiated? Any relevant Issues?

We've been mentioning for a while that we'd like to save more metadata for each template, and also each trace in the templates. We didn't yet have a way of handling trace-specific metadata that do not fit into miniseed-supported header fields.

Saving metadata for each trace should allow some more tasks:

Possible downsides:

  • saving trace metadata in the event works well with QuakeML files, but it would not work so easily if we'd like to support saving tribes with Nordic format event files. I don't know how it would be for sc3ml files yet.

@calum-chamberlain , @d-chambers: What do you think about saving trace metadata like this? I went through a few other ideas, but they all seemed to have bigger downsides.

Example:

from eqcorrscan.core.match_filter import Tribe
from obspy.clients.fdsn import Client

client = Client('NCEDC')
catalog = client.get_events(eventid='72572665', includearrivals=True)
catalog[0].picks = catalog[0].picks[0:5]
tribe = Tribe().construct(
    method='from_client', catalog=catalog, client_id='NCEDC', lowcut=2.0,
    highcut=8.0,  samp_rate=20.0, filt_order=4, length=6.0, prepick=0.1,
    swin='all', process_len=3600, all_horiz=True)

tribe.write('test_tribe.tgz')
tribe_in = Tribe().read('test_tribe.tgz')
tribe[0].st[0].stats.extra
tribe_in[0].st[0].stats.extra

PR Checklist

  • develop base branch selected?
  • This PR is not directly related to an existing issue (which has no PR yet).
  • All tests still pass.
  • Any new features or fixed regressions are be covered via new tests.
  • Any new or changed features have are fully documented.
  • Significant changes have been added to CHANGES.md.
  • First time contributors have added your name to CONTRIBUTORS.md.

@flixha
Copy link
Collaborator Author

flixha commented Aug 2, 2022

tests fail due to Scipy 1.9.0; I was hoping that scipy>=0.18,<1.9.0 would fix it for now but doesn't seem to work...

@calum-chamberlain
Copy link
Member

event.extra seems like a good place for this - but if you are concerned about supporting other formats for event files then would a separate file that contains the extra Template-specific metadata be worthwhile. We already write a csv with some barebones info with Tribe._par_write - we could change this to writing a json file that has more meta-info and would allow for expansion in the future. I imagine that you might have thought of this and the potential downsides already?

@cjhopp might have opinions on what he would like included in template metadata as well.

@flixha
Copy link
Collaborator Author

flixha commented Aug 3, 2022

Well, I have to admit that I didn't consider Tribe._par_write as much; and indeed from what I can think of now it should work just as well, especially if we write it to a json.

I guess I was focusing more on where to keep the trace-stats - and came to the conclusion that they have to be added to the traces themselves because the two other options I saw, [1: referencing by resource ID (not thread safe), 2: referencing by waveform ID or ID + pick (not unique / cumbersome)] seemed less optimal.

Even though it works ok now in the QuakeML file, one thing I don't like perfectly there is that the list of parameters for each trace is read in as a string, and then needs to be evaluated to a list again:

if isinstance(event.extra.trace_ids.value, str):
event.extra.trace_ids.value = ast.literal_eval(
event.extra.trace_ids.value)

So any opinions on the choice:

  • write to QuakeML?
  • write to csv?
  • write to new json file, instead of the csv?

@calum-chamberlain
Copy link
Member

I'm fairly keen to write to a json, because I think it will allow us more expansion later, without having to do strange string mangling out of extras. Internally though, I think you are right to have the metadata associated with the trace itself, that definitely makes sense. It may make sense to have a phase-hint associated with each trace at some point as well when we end up having multiple traces from a single channel?

@calum-chamberlain calum-chamberlain added this to the 0.5.0 milestone Aug 5, 2022
This was referenced Aug 5, 2022
@flixha
Copy link
Collaborator Author

flixha commented Aug 8, 2022

Thanks for the feedback; i'll move the trace-metadata to a json together with the previously existing template metadata. Will need to wait until late next week when I'm back from the field though

@flixha
Copy link
Collaborator Author

flixha commented Mar 16, 2023

Just a note that this PR is not forgotten; I just haven't gotten around to writing it such that the metadata are written to a json file.
Note to self:

  • the extended metadata-file should include more information on the trace processing, either the whole processing history from tr.stats.processing, or a user-provided dict with any processing steps that are not covered by the current processing-metadata.
  • in particular, it's important to save whether a trace had its response removed in case the traces are used for relative amplitude / magnitude calculation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants