What does analysis_duration controls? #338

seismolab-uct · 2024-04-08T17:37:41Z

Hi everyone,

I have been using MSNoise on nodal datasets for the past few weeks.

While running compute_cc on a dataset with 85 stations the compute_cc process would finish without error but leaving thousands of job 'I'n progress.

Following suggestions on #196 I did a reset of cross-correlation jobs in progress ($msnoise reset CC) and ran compute_cc with the verbose option in serial mode (no -t integer option). I noticed that a job would write to disk only when all slices for the current day are processed.

I would like to know if the writing to disk is controlled by the parameter analysis_duration ? If so, could I reduce it to e.g., 3600 s to reduce the memory usage?

At the moment I am running correlations with the following key parameters which I understand might not be standard for dv/v but I am not going further than the reference stack with my processing.

cc_sampling_rate: 100.0 Hz
analysis_duration: 86400 [default]
maxlag: 20
corr_duration: 60

Thank you for your time and any explanations
Best

In case it helps:
My full configuration values are below:

Configuration values:
| Normal colour indicates that the default value is used
| Green indicates "M"odified values
M data_folder: /home/seismolab/Projects/Rt13/Data_Folder
output_folder: CROSS_CORRELATIONS
M data_structure: BUD
archive_format: ''
network: *
channels: *
M startdate: 2011-09-23
M enddate: 2011-10-04
analysis_duration: 86400
M cc_sampling_rate: 100.0
resampling_method: Lanczos
M preprocess_lowpass: 46.0
M preprocess_highpass: 2
preprocess_max_gap: 10.0
preprocess_taper_length: 20.0
remove_response: N
M response_format: inventory
response_path: inventory
response_prefilt: (0.005, 0.006, 30.0, 35.0)
M maxlag: 20
M corr_duration: 60.
overlap: 0.0
M windsorizing: 0
M whitening: N
whitening_type: B
stack_method: linear
pws_timegate: 10.0
pws_power: 2.0
crondays: 1
components_to_compute: ZZ
cc_type: CC
M components_to_compute_single_station: ZZ
cc_type_single_station_AC: CC
cc_type_single_station_SC: CC
autocorr: N
keep_all: N
keep_days: Y
M ref_begin: 2011-09-23
M ref_end: 2011-10-04
mov_stack: 5
export_format: MSEED
sac_format: doublets
dtt_lag: static
dtt_v: 1.0
dtt_minlag: 5.0
dtt_width: 30.0
dtt_sides: both
dtt_mincoh: 0.65
dtt_maxerr: 0.1
dtt_maxdt: 0.1
plugins: ''
hpc: N
stretching_max: 0.01
stretching_nsteps: 1000

Filters:
ID: [low:high] [mwcs_low:mwcs_high] mwcs_wlen mwcs_step Used?
1: [4.000:40.000] [4.000:40.000] 12 4 Y

ThomasLecocq · 2024-04-09T08:56:51Z

I noticed that a job would write to disk only when all slices for the current day are processed.

Yes, this is how the code is structured for now

I would like to know if the writing to disk is controlled by the parameter analysis_duration ? If so, could I reduce it to e.g., 3600 s to reduce the memory usage?

no, actually the analysis_duration parameter is not (yet) used in msnoise, it doesn't control anything

you would benefit strongly by using the development version (master on github), because:

it spawns the preprocessing (a little slower, but effectively cleans the garbage collector upon reading the raw data)
it has a better way of handling the 2D arrays of correlations
you could try to find a way to use the decimation instead of Lanczos (saves times & memory too - during the preprocessing phase)

⚠️ the new dev version is NOT compatible with the existing database ! you'll have to redo all the init steps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does analysis_duration controls? #338

What does analysis_duration controls? #338

seismolab-uct commented Apr 8, 2024

ThomasLecocq commented Apr 9, 2024

What does *analysis_duration* controls? #338

What does *analysis_duration* controls? #338

Comments

seismolab-uct commented Apr 8, 2024

ThomasLecocq commented Apr 9, 2024

What does analysis_duration controls? #338

What does analysis_duration controls? #338