Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does *analysis_duration* controls? #338

Open
seismolab-uct opened this issue Apr 8, 2024 · 1 comment
Open

What does *analysis_duration* controls? #338

seismolab-uct opened this issue Apr 8, 2024 · 1 comment

Comments

@seismolab-uct
Copy link

Hi everyone,

I have been using MSNoise on nodal datasets for the past few weeks.

While running compute_cc on a dataset with 85 stations the compute_cc process would finish without error but leaving thousands of job 'I'n progress.

Following suggestions on #196 I did a reset of cross-correlation jobs in progress ($msnoise reset CC) and ran compute_cc with the verbose option in serial mode (no -t integer option). I noticed that a job would write to disk only when all slices for the current day are processed.

I would like to know if the writing to disk is controlled by the parameter analysis_duration ? If so, could I reduce it to e.g., 3600 s to reduce the memory usage?

At the moment I am running correlations with the following key parameters which I understand might not be standard for dv/v but I am not going further than the reference stack with my processing.

cc_sampling_rate: 100.0 Hz
analysis_duration: 86400 [default]
maxlag: 20
corr_duration: 60

Thank you for your time and any explanations
Best

In case it helps:
My full configuration values are below:

Configuration values:
| Normal colour indicates that the default value is used
| Green indicates "M"odified values
M data_folder: /home/seismolab/Projects/Rt13/Data_Folder
output_folder: CROSS_CORRELATIONS
M data_structure: BUD
archive_format: ''
network: *
channels: *
M startdate: 2011-09-23
M enddate: 2011-10-04
analysis_duration: 86400
M cc_sampling_rate: 100.0
resampling_method: Lanczos
M preprocess_lowpass: 46.0
M preprocess_highpass: 2
preprocess_max_gap: 10.0
preprocess_taper_length: 20.0
remove_response: N
M response_format: inventory
response_path: inventory
response_prefilt: (0.005, 0.006, 30.0, 35.0)
M maxlag: 20
M corr_duration: 60.
overlap: 0.0
M windsorizing: 0
M whitening: N
whitening_type: B
stack_method: linear
pws_timegate: 10.0
pws_power: 2.0
crondays: 1
components_to_compute: ZZ
cc_type: CC
M components_to_compute_single_station: ZZ
cc_type_single_station_AC: CC
cc_type_single_station_SC: CC
autocorr: N
keep_all: N
keep_days: Y
M ref_begin: 2011-09-23
M ref_end: 2011-10-04
mov_stack: 5
export_format: MSEED
sac_format: doublets
dtt_lag: static
dtt_v: 1.0
dtt_minlag: 5.0
dtt_width: 30.0
dtt_sides: both
dtt_mincoh: 0.65
dtt_maxerr: 0.1
dtt_maxdt: 0.1
plugins: ''
hpc: N
stretching_max: 0.01
stretching_nsteps: 1000

Filters:
ID: [low:high] [mwcs_low:mwcs_high] mwcs_wlen mwcs_step Used?
1: [4.000:40.000] [4.000:40.000] 12 4 Y

@ThomasLecocq
Copy link
Member

I noticed that a job would write to disk only when all slices for the current day are processed.

Yes, this is how the code is structured for now

I would like to know if the writing to disk is controlled by the parameter analysis_duration ? If so, could I reduce it to e.g., 3600 s to reduce the memory usage?

no, actually the analysis_duration parameter is not (yet) used in msnoise, it doesn't control anything

you would benefit strongly by using the development version (master on github), because:

  • it spawns the preprocessing (a little slower, but effectively cleans the garbage collector upon reading the raw data)
  • it has a better way of handling the 2D arrays of correlations
  • you could try to find a way to use the decimation instead of Lanczos (saves times & memory too - during the preprocessing phase)

⚠️ the new dev version is NOT compatible with the existing database ! you'll have to redo all the init steps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants