Expand dataset conversion tutorial #153

yetinam · 2022-12-17T17:49:29Z

The dataset conversion tutorial should be extended. The following points came to my mind so far:

Add a very general intro on the abstract main steps for converting a dataset
Be more clear about the dangers of writing one pick per trace. May even add the grouping operation, because separate picks are a clear anti-pattern.
Add a comment on dataset chunking

maihao14 · 2023-01-25T00:19:44Z

Hi @yetinam , for question #164 adds traces for both P and S individually (not in one trace) in tutorial 03b, at the end of the cell[9] (above writer.add_trace({**event_params, **trace_params}, data)), adding the following scripts might group the picks that existed in the current waveform (didn't test exhaustively)

            # group other picks
            for pick_candidate in event.picks:
                # skip phase itself
                if pick.resource_id == pick_candidate.resource_id:
                    continue
                # same station
                if pick.waveform_id.station_code == pick_candidate.waveform_id.station_code :
                    arrival_candidate = pick_candidate.time
                    if arrival_candidate < waveforms[0].stats.endtime and arrival_candidate > waveforms[0].stats.starttime:
                        sample = (pick_candidate.time - actual_t_start) * sampling_rate
                        trace_params[f"trace_{pick_candidate.phase_hint}_arrival_sample"] = int(sample)
                        trace_params[f"trace_{pick_candidate.phase_hint}_status"] = pick_candidate.evaluation_mode

yetinam · 2023-01-25T10:53:41Z

Hi @maihao14 ,

thanks for having a look at this. I think this is one option for doing it. From a first glance I think it correctly groups the picks but it will write duplicate lines just with different waveform window selections. There more clean strategy is to group the picks first and then select an appropriate window around all picks. An example for this can be found here.

Would you be interested in creating a PR updating this tutorial?

maihao14 · 2023-01-27T20:41:05Z

Hi @yetinam ,
I'm down to create a PR for this. I definitely agree with you about the above script, it will cause duplication (and I received warnings from SeisBench). I've tried the example in this link but it looks like my following scripts didn't give the correct group of picks. Could you take a look at it？ I might be misunderstood some steps.

# Iterate over events and picks, write to SeisBench format
with sbd.WaveformDataWriter(metadata_path, waveforms_path) as writer:
    # Define data format
    writer.data_format = {
        "dimension_order": "CW",
        "component_order": "ZNE",
        "measurement": "velocity",
        "unit": "counts",
        "instrument_response": "not restituted",
    }
    
    for event in catalog:
        event_params = get_event_params(event)
        # group all picks in the same station, e.g, a P and S phases within a same earthquake event
        station_groups = defaultdict(list)
        for pick in event.picks:
            if pick.phase_hint is None:
                continue

            station_groups[
                waveform_id_to_network_station_location(pick.waveform_id.id)
            ].append(pick)

        for picks in station_groups.values():

            t_start = min(pick.time for pick in picks) 
            t_end = max(pick.time for pick in picks)

            trace_params = get_trace_params(pick)
            waveforms = get_waveforms(pick, trace_params,t_start,t_end)

            # rotate_stream_to_zne(waveforms, inv)

            if len(waveforms) == 0:
                # No waveform data available
                continue

            sampling_rate = waveforms[0].stats.sampling_rate
            trace_params[
                "trace_name"
            ] = f"{event_params['source_id']}_{waveform_id_to_network_station_location(picks[0].waveform_id.id)}"

            actual_t_start, data, _ = sbu.stream_to_array(
                waveforms,
                component_order=writer.data_format["component_order"],
            )

            trace_params["trace_sampling_rate_hz"] = sampling_rate
            trace_params["trace_start_time"] = str(actual_t_start)
            #write picks
            for pick in picks:
                sample = (pick.time - actual_t_start) * sampling_rate
                trace_params[f"trace_{pick.phase_hint}_arrival_sample"] = int(
                    sample
                )
                trace_params[
                    f"trace_{pick.phase_hint}_status"
                ] = pick.evaluation_mode
            writer.add_trace({**event_params, **trace_params}, data)

yetinam · 2023-02-01T12:09:49Z

Hi @maihao14 ,

thanks for giving this a go. My suspicion is that the issue is with these lines:

trace_params = get_trace_params(pick)
waveforms = get_waveforms(pick, trace_params,t_start,t_end)

Here you reference the variable pick that hasn't been defined in this scope. However, it's still around from the previous for loop so it'll use the last pick in total in the event. Could you check if that's the issue?

Disclaimer: I just eyeballed the code and did not try running it yet.

yetinam added enhancement New feature or request documentation Improvements or additions to documentation labels Dec 17, 2022

yetinam mentioned this issue Jan 16, 2023

"03b_creating_a_dataset.ipynb" adds traces for both P and S individually (not in one trace) #164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand dataset conversion tutorial #153

Expand dataset conversion tutorial #153

yetinam commented Dec 17, 2022

maihao14 commented Jan 25, 2023

yetinam commented Jan 25, 2023

maihao14 commented Jan 27, 2023

yetinam commented Feb 1, 2023

Expand dataset conversion tutorial #153

Expand dataset conversion tutorial #153

Comments

yetinam commented Dec 17, 2022

maihao14 commented Jan 25, 2023

yetinam commented Jan 25, 2023

maihao14 commented Jan 27, 2023

yetinam commented Feb 1, 2023