MSEED: accumulating record-level micro gaps/overlaps over long time spans #3447

raciner · 2024-05-16T10:29:55Z

Avoid duplicates

I searched existing issues

Bug Summary

The file linked here contains data from 2024-05-14T23:52:07 to 2024-05-16T00:04:33. However, when using

st = obspy.core.read('file.mseed')
print(st)

this is shown:

XX.XDR01.02.VKD | 2024-05-14T23:52:07.000000Z - 2024-05-15T23:57:37.000000Z | 0.1 Hz, 8674 samples

The problem is that the delta is slightly higher than 10.0 due to the behaviour of the datalogger, however, the difference is within the default tolerance of obspy, so that the difference between the time used in obspy and the actual timestamp grows up to ten minutes throughout the day. The time record for each individual miniseed record is correct. Other software such as qmerge handles this differently. Wouldn't it make more sense to accumulate the time difference between the time indicated in the header and the time that obspy computes based on the time stamp of the first header + delta*number of seen data points and introduce a break once this gets bigger than the tolerance instead of not accumulating it and checking it for each boundary between records individually?

Code to Reproduce

No response

Error Traceback

No response

ObsPy Version?

1.4.1

Operating System?

No response

Python Version?

3.10.12

Installation Method?

pip

The text was updated successfully, but these errors were encountered:

megies · 2024-05-16T12:25:54Z

I'm a little bit confused. From what I can see the sampling interval is stated in the file as exactly 10s but the individual records' start times don't line up, i.e. there are gaps and overlaps on a subsample scale between every record? That correct or am I missing something?

We are merging together records that have gap/overlaps on subsample scale, otherwise you would end up with 420 individual Trace objects.

Am I correct that you want obspy to assume the file is without gaps and compute an odd sampling rate and treat it as a gapless trace?

I'm pretty sure that would mean touching our C code wrapper for libmseed around here, maybe passing in a flag from Python to change behavior in the wrapper. I'm no C wizard, so definitely somebody else would need to do that.

megies · 2024-05-16T12:33:12Z

The time record for each individual miniseed record is correct.

I am doubtful of that statement.. note how the individual records' start time single digit seconds jumps around "7" but with ".000000" decimal..?!?

msi version: 0.9.6

msi version: 0.9.6
XX_XDR01_01_VKD, 000001, D
             start time: 2024,135,23:52:07.000000
      number of samples: 14
     sample rate factor: -10  (0.1 samples per second)
 sample rate multiplier: -1
   number of blockettes: 1
        time correction: 0
            data offset: 56
 first blockette offset: 48
         BLOCKETTE 1000: (Data Only SEED)
              next blockette: 0
                    encoding: IEEE double precision float (val:5)
                  byte order: Big endian (val:1)
               record length: 512 (val:9)
XX_XDR01_01_VKD, 000001, D
             start time: 2024,135,23:54:28.000000
      number of samples: 3
     sample rate factor: -10  (0.1 samples per second)
 sample rate multiplier: -1
   number of blockettes: 1
        time correction: 0
            data offset: 56
 first blockette offset: 48
         BLOCKETTE 1000: (Data Only SEED)
              next blockette: 0
                    encoding: IEEE double precision float (val:5)
                  byte order: Big endian (val:1)
               record length: 512 (val:9)
XX_XDR01_01_VKD, 000001, D
             start time: 2024,135,23:54:56.000000
      number of samples: 3
     sample rate factor: -10  (0.1 samples per second)
 sample rate multiplier: -1
   number of blockettes: 1
        time correction: 0
            data offset: 56
 first blockette offset: 48
         BLOCKETTE 1000: (Data Only SEED)
              next blockette: 0
                    encoding: IEEE double precision float (val:5)
                  byte order: Big endian (val:1)
               record length: 512 (val:9)
XX_XDR01_01_VKD, 000001, D
             start time: 2024,135,23:55:27.000000
      number of samples: 21
     sample rate factor: -10  (0.1 samples per second)
 sample rate multiplier: -1
   number of blockettes: 1
        time correction: 0
            data offset: 56
 first blockette offset: 48
         BLOCKETTE 1000: (Data Only SEED)
              next blockette: 0
                    encoding: IEEE double precision float (val:5)
                  byte order: Big endian (val:1)
               record length: 512 (val:9)

megies · 2024-05-16T12:36:26Z

accumulate the time difference between the time indicated in the header and the time that obspy computes based on the time stamp of the first header + delta*number of seen data points and introduce a break once this gets bigger than the tolerance instead of not accumulating it and checking it for each boundary between records individually?

could make sense as a switch, but this would mean adding a lot of logic to our libmseed wrapper code and somebody else would have to do it.

megies · 2024-05-16T13:13:17Z

compare code from mseed2sac:

https://github.com/EarthScope/mseed2sac/blob/356fe94244189a2bd45bcb32e6e8950debc9b82a/src/mseed2sac.c#L336-L372

raciner · 2024-05-16T13:42:00Z

The basic issue is that the datalogger is supposed to make one measurement every 10s. However, due to the poor quality its implementation, this is not stable, so that there is a slow drift. From one miniseed record to the next, it's always within the tolerance, but on a whole day, it adds up to about seven minutes, which means, that the timestamps are completely wrong after reading the file and writing it to disk. A less surprising behaviour would be to declare a gap or overlap whenever the accumulated time difference is larger than the tolerance.

megies · 2024-05-16T15:35:46Z

A less surprising behaviour would be to declare a gap or overlap whenever the accumulated time difference is larger than the tolerance

I agree that makes sense, I might not be able to work on it anytime soon though, trying to finish up some loose ends and more of those appear than get closed anyway..

I'd still say though, that the initial problem comes from a quite bad file and not us and it's an edge case.

megies · 2024-05-16T15:48:05Z

What you can do in the meantime is this:

from obspy import read
from obspy.io.mseed.util import get_start_and_end_time

path = '/tmp/proof.mseed'
st = read(path)
assert len(st) == 1
tr = st[0]
start, end = get_start_and_end_time("/tmp/proof.mseed")
tr.stats.sampling_rate = len(tr) / (end - start)

This isn't safe though, and relies heavily on lots of assumptions (ordering of records, no multiplexing, constant record size, ..., see docs for that helper routine).
To do this safely in Python would mean to sacrifice some read speed because it would mean looking at every record header.

raciner added the bug-unconfirmed reported bug that still needs to be confirmed label May 16, 2024

megies added the .io.mseed label May 16, 2024

megies changed the title ~~Miniseed file not fully read~~ MSEED: accumulating record-level micro gaps/overlaps over long time spans May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSEED: accumulating record-level micro gaps/overlaps over long time spans #3447

MSEED: accumulating record-level micro gaps/overlaps over long time spans #3447

raciner commented May 16, 2024 •

edited by megies

megies commented May 16, 2024

megies commented May 16, 2024

megies commented May 16, 2024

megies commented May 16, 2024

raciner commented May 16, 2024

megies commented May 16, 2024 •

edited

megies commented May 16, 2024 •

edited

MSEED: accumulating record-level micro gaps/overlaps over long time spans #3447

MSEED: accumulating record-level micro gaps/overlaps over long time spans #3447

Comments

raciner commented May 16, 2024 • edited by megies

Avoid duplicates

Bug Summary

Code to Reproduce

Error Traceback

ObsPy Version?

Operating System?

Python Version?

Installation Method?

megies commented May 16, 2024

megies commented May 16, 2024

megies commented May 16, 2024

megies commented May 16, 2024

raciner commented May 16, 2024

megies commented May 16, 2024 • edited

megies commented May 16, 2024 • edited

raciner commented May 16, 2024 •

edited by megies

megies commented May 16, 2024 •

edited

megies commented May 16, 2024 •

edited