Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch trigger by Julian day #67

Open
6 of 8 tasks
hemmelig opened this issue May 29, 2020 · 1 comment
Open
6 of 8 tasks

Batch trigger by Julian day #67

hemmelig opened this issue May 29, 2020 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@hemmelig
Copy link
Member

hemmelig commented May 29, 2020

Issue statement
Trigger is limited to a single core, despite being suitable for multi-processing. The current workaround is to batch up the time window of interest in an external script. This, however, leads to the issue of files from each trigger run overwriting any existing file.

Proposition
Bake the batching of trigger into Trigger. This will also require dealing with overwriting triggered event files. This will be handled by adding the Julian day to the triggered events filename.

Primary issue

  • Bake batching process into Trigger.trigger()
    • Have TriggeredEvent.csv files write with the relevant Julian day
    • Ensure the process of reading these TriggeredEvent files for locate can handle the separate Julian days

Future

  • Multi-process the batched trigger
  • Allow for multi-processing arbitrary periods of time by breaking into batches with non-Julian day lengths.

Additional tasks

  • Break the _trigger() method into two stages
    • _trigger_candidates() - Identify all of the instances of the (normalised) coalescence exceeding the chosen detection threshold
    • _refine_candidates() - Merge events for which the marginal windows overlap with the minimum inter-event time.

Result
Clearer, multi-processed code leading to faster results and a clearer codebase.

Reach
Trigger files generated using the development branch prior to this change will no longer be compatible with locate(starttime, endtime). However, it is still possible to locate the events in this file using locate(trigger_file).

@hemmelig hemmelig added the enhancement New feature or request label May 29, 2020
@hemmelig hemmelig added this to the Ahead of publication milestone May 29, 2020
@hemmelig hemmelig changed the title [Enhancement] Batch trigger by Julian day Batch trigger by Julian day May 29, 2020
hemmelig added a commit that referenced this issue Jun 2, 2020
Split the internal trigger method into a series of methods that each
capture a specific stage. In doing so, also update some of the
implementation.

_trigger_events() has been split into:
    _get_threshold() - determine an array of values to use as a threshold
    _identify_candidates() - find distinct periods of time for which the
			     maximum (normalised) coalescence trace
			     exceeds the chosen threshold
    _refine_candidates() - merge candidate events for which the marginal
			   windows overlap with the minimum inter-event
			   time
    _filter_events() - remove events within the padding time and/or
		       within a specific geographical region

In both the _identify_candidates() and _refine_candidates() methods, the
pandas.DataFrame.groupby() method has been used to remove the confusing
index twiddling.

This partially addresses some of the tasks in Issue #67.
hemmelig added a commit that referenced this issue Jun 2, 2020
The Trigger.trigger() method now internally batches the specified time
period into Julian days (with possible partial days on either end).
Consequently, the names of files output by trigger also include the year and
Julian day.

Triggered event files for locate between two timestamps are now read
by looping over the Julian days.

This addresses the primary issue in Issue #67

Trigger files generated prior to this change will no longer be compatible
with locate between two timestamps. However, they can still be used in
locate using locate(trigger_file="path/to/old_trigger_file"
@hemmelig hemmelig mentioned this issue Jun 2, 2020
@TomWinder TomWinder reopened this Jun 18, 2020
@TomWinder
Copy link
Member

Re-opened pending addition of multi-processing to trigger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants