Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand dataset conversion tutorial #153

Open
3 tasks
yetinam opened this issue Dec 17, 2022 · 4 comments
Open
3 tasks

Expand dataset conversion tutorial #153

yetinam opened this issue Dec 17, 2022 · 4 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@yetinam
Copy link
Member

yetinam commented Dec 17, 2022

The dataset conversion tutorial should be extended. The following points came to my mind so far:

  • Add a very general intro on the abstract main steps for converting a dataset
  • Be more clear about the dangers of writing one pick per trace. May even add the grouping operation, because separate picks are a clear anti-pattern.
  • Add a comment on dataset chunking
@maihao14
Copy link

Hi @yetinam , for question #164 adds traces for both P and S individually (not in one trace) in tutorial 03b, at the end of the cell[9] (above writer.add_trace({**event_params, **trace_params}, data)), adding the following scripts might group the picks that existed in the current waveform (didn't test exhaustively)

            # group other picks
            for pick_candidate in event.picks:
                # skip phase itself
                if pick.resource_id == pick_candidate.resource_id:
                    continue
                # same station
                if pick.waveform_id.station_code == pick_candidate.waveform_id.station_code :
                    arrival_candidate = pick_candidate.time
                    if arrival_candidate < waveforms[0].stats.endtime and arrival_candidate > waveforms[0].stats.starttime:
                        sample = (pick_candidate.time - actual_t_start) * sampling_rate
                        trace_params[f"trace_{pick_candidate.phase_hint}_arrival_sample"] = int(sample)
                        trace_params[f"trace_{pick_candidate.phase_hint}_status"] = pick_candidate.evaluation_mode
            

@yetinam
Copy link
Member Author

yetinam commented Jan 25, 2023

Hi @maihao14 ,

thanks for having a look at this. I think this is one option for doing it. From a first glance I think it correctly groups the picks but it will write duplicate lines just with different waveform window selections. There more clean strategy is to group the picks first and then select an appropriate window around all picks. An example for this can be found here.

Would you be interested in creating a PR updating this tutorial?

@maihao14
Copy link

Hi @yetinam ,
I'm down to create a PR for this. I definitely agree with you about the above script, it will cause duplication (and I received warnings from SeisBench). I've tried the example in this link but it looks like my following scripts didn't give the correct group of picks. Could you take a look at it? I might be misunderstood some steps.

# Iterate over events and picks, write to SeisBench format
with sbd.WaveformDataWriter(metadata_path, waveforms_path) as writer:
    # Define data format
    writer.data_format = {
        "dimension_order": "CW",
        "component_order": "ZNE",
        "measurement": "velocity",
        "unit": "counts",
        "instrument_response": "not restituted",
    }
    
    for event in catalog:
        event_params = get_event_params(event)
        # group all picks in the same station, e.g, a P and S phases within a same earthquake event
        station_groups = defaultdict(list)
        for pick in event.picks:
            if pick.phase_hint is None:
                continue

            station_groups[
                waveform_id_to_network_station_location(pick.waveform_id.id)
            ].append(pick)

        for picks in station_groups.values():

            t_start = min(pick.time for pick in picks) 
            t_end = max(pick.time for pick in picks)

            trace_params = get_trace_params(pick)
            waveforms = get_waveforms(pick, trace_params,t_start,t_end)

            # rotate_stream_to_zne(waveforms, inv)

            if len(waveforms) == 0:
                # No waveform data available
                continue

            sampling_rate = waveforms[0].stats.sampling_rate
            trace_params[
                "trace_name"
            ] = f"{event_params['source_id']}_{waveform_id_to_network_station_location(picks[0].waveform_id.id)}"

            actual_t_start, data, _ = sbu.stream_to_array(
                waveforms,
                component_order=writer.data_format["component_order"],
            )

            trace_params["trace_sampling_rate_hz"] = sampling_rate
            trace_params["trace_start_time"] = str(actual_t_start)
            #write picks
            for pick in picks:
                sample = (pick.time - actual_t_start) * sampling_rate
                trace_params[f"trace_{pick.phase_hint}_arrival_sample"] = int(
                    sample
                )
                trace_params[
                    f"trace_{pick.phase_hint}_status"
                ] = pick.evaluation_mode
            writer.add_trace({**event_params, **trace_params}, data)

@yetinam
Copy link
Member Author

yetinam commented Feb 1, 2023

Hi @maihao14 ,

thanks for giving this a go. My suspicion is that the issue is with these lines:

trace_params = get_trace_params(pick)
waveforms = get_waveforms(pick, trace_params,t_start,t_end)

Here you reference the variable pick that hasn't been defined in this scope. However, it's still around from the previous for loop so it'll use the last pick in total in the event. Could you check if that's the issue?

Disclaimer: I just eyeballed the code and did not try running it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants