Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate ShortTermFeatures for an audio signal of 0.1 seconds of a mono, 16KHz and PCM wave file #396

Open
prashant-saxena opened this issue Mar 25, 2024 · 3 comments

Comments

@prashant-saxena
Copy link

Hello,

First of all, thank you for providing a great library.
I would like to process a signal of 0.1 seconds (1600) for short-term features.

F, f_names = ShortTermFeatures.feature_extraction(x[0:1600], Fs, 160, 160, deltas=False)

It's throwing an error. When I try little bigger values

F, f_names = ShortTermFeatures.feature_extraction(x[0:1600], Fs, 200, 200, deltas=False)

Now no errors but there are <= 8 points in features which I believe is quite low for uniqueness. Using python_speech_features, I'm successfully able to generate 20 points but I think the resulting mfcc is not unique in terms of noise.

mfcc(x[0:1600], # The audio signal (N*1 array) from which to compute features.
        16000, # The sample rate of the signal we are working with.
         numcep=NUM_CEP, # The number of cepstrum to return, default 13
         winlen=160/16000, # The length of the analysis window in seconds.
         winstep=160/16000, # The step between successive windows in seconds.
         nfilt=20)

How do you generate features with more points (20-40), and a good amount of noise for a short signal?

@Caparrini
Copy link

Caparrini commented Mar 28, 2024

Hi!

I’d like to help with the feature extraction issue you're facing. To do so, I need a bit more info:

  • Python Version: Which version are you using?
  • Libraries: Could you list the libraries and their versions you're working with?
  • Error Message: What error comes up with the smaller values?
  • Context: Any other details about your setup might be helpful.

This will help me understand the problem better and find a solution for you.

Thanks!

@prashant-saxena
Copy link
Author

prashant-saxena commented Mar 29, 2024

Hi,
Windows 10
Python 3.10.0

customtkinter==5.2.1
dm-tree==0.1.8
dtaidistance==2.3.11
eyed3==0.9.7
fastdtw==0.3.4
fCWT==0.1.18
fqdn==1.5.1
google-auth-oauthlib==1.2.0
isoduration==20.11.0
jsonpointer==2.4
lesscpy==0.15.1
noisereduce==3.0.2
notebook==7.1.2
pandas==2.2.1
pipdeptree==2.16.1
pyAudioAnalysis==0.3.14
pydub==0.25.1
python-speech-features==0.6
resampy==0.4.3
tensorflow==2.16.1
tensorflow-estimator==2.15.0
tkinterdnd2==0.3.0
toml==0.10.2
uri-template==1.3.0
webcolors==1.13
wurlitzer==3.0.3
xlwt==1.3.0

Error when using

F, f_names = ShortTermFeatures.feature_extraction(x[0:1600], Fs, 160, 160, deltas=False)
---------------------------------------------------------------------------
File D:\projects\vrt\.venv\lib\site-packages\pyAudioAnalysis\ShortTermFeatures.py:662, in feature_extraction(signal, sampling_rate, window, step, deltas)
    657 feature_vector[n_time_spectral_feats:mffc_feats_end, 0] = \
    658     mfcc(fft_magnitude, fbank, n_mfcc_feats).copy()
    660 # chroma features
    661 chroma_names, chroma_feature_matrix = \
--> 662     chroma_features(fft_magnitude, sampling_rate, num_fft)
    663 chroma_features_end = n_time_spectral_feats + n_mfcc_feats + \
    664                       n_chroma_feats - 1
    665 feature_vector[mffc_feats_end:chroma_features_end] = \
    666     chroma_feature_matrix

File D:\projects\vrt\.venv\lib\site-packages\pyAudioAnalysis\ShortTermFeatures.py:293, in chroma_features(signal, sampling_rate, num_fft)
    291     I = np.nonzero(num_chroma > num_chroma.shape[0])[0][0]
    292     C = np.zeros((num_chroma.shape[0],))
--> 293     C[num_chroma[0:I - 1]] = spec
    294     C /= num_freqs_per_chroma
    295 final_matrix = np.zeros((12, 1))

ValueError: shape mismatch: value array of shape (80,) could not be broadcast to indexing result of shape (27,)

I need a distinct sound feature for my CNN-based project to create a model. The data frame size is 1600 (0.1 seconds)
graph
In the above plot, you can see 7 MFCC generated from 7 different wave files. All the wave files have a similar sound.
The whole idea is to make the feature as same as possible for similar types of data so that a good prediction score can be
created.

Caparrini added a commit to Caparrini/pyAudioAnalysis that referenced this issue Mar 31, 2024
@Caparrini
Copy link

Caparrini commented Mar 31, 2024

Hello again,

I conducted a small experiment and was able to replicate the issue you described. It appears that there isn't sufficient information to compute chroma features. To address this and ensure the code functions (even if it means the chroma feature values are zeroes), I've implemented a fix. I plan to submit a pull request for this fix, pending the library author's approval.

For testing, I took the following approach (I recommend using fractions of the sampling rate, Fs, rather than sample counts, but the choice is yours. In my tests, I used an Fs of 44100):

from pyAudioAnalysis import ShortTermFeatures
from pyAudioAnalysis import audioBasicIO


def extract_features(frac_second, samples_features, Fs, x):
    samples_frac_second = frac_second * Fs
    samples_windows = samples_features // samples_frac_second

    F, f_names = ShortTermFeatures.feature_extraction(x[:samples_features], Fs, frac_second*Fs, frac_second*Fs,
                                                      deltas=False)

    print(f"In {frac_second} there are {samples_frac_second} samples")
    print(f"In {samples_features} there are {samples_windows} windows")
    print(len(F[0]))
    print(len(f_names))

    return F, f_names


def issue_396():
    # Use a breakpoint in the code line below to debug your script.

    [Fs, x] = audioBasicIO.read_audio_file('./audio/limbo_mono.wav')

    for frac_second in [0.1, 0.05, 0.025, 0.01, 0.0036, 0.0018]:
        print(f"Experiment with {frac_second} frac of second")
        F, f_names = extract_features(frac_second, 16000, Fs, x)


# Press the green button in the gutter to run the script.
if __name__ == '__main__':
    issue_396()

Output generated:

Experiment with 0.1 frac of second
In 0.1 there are 4410.0 samples
In 16000 there are 3.0 windows
3
34
Experiment with 0.05 frac of second
In 0.05 there are 2205.0 samples
In 16000 there are 7.0 windows
7
34
Experiment with 0.025 frac of second
In 0.025 there are 1102.5 samples
In 16000 there are 14.0 windows
14
34
Experiment with 0.01 frac of second
In 0.01 there are 441.0 samples
In 16000 there are 36.0 windows
36
34
Experiment with 0.0036 frac of second
In 0.0036 there are 158.76 samples
In 16000 there are 100.0 windows
101
34
Experiment with 0.0018 frac of second
In 0.0018 there are 79.38 samples
In 16000 there are 201.0 windows
202
34

Fix: In the method chroma_features inside of the file ShortTermFeatures.py adapt the following part like this:

else:
        I = np.nonzero(num_chroma > num_chroma.shape[0])[0][0]
        C = np.zeros((num_chroma.shape[0],))
        if I > 1:
            # If I <= 1 there are no chroma features that can be extracted
            C[num_chroma[0:I - 1]] = spec[num_chroma[0:I - 1]]
            C /= num_freqs_per_chroma
    final_matrix = np.zeros((12, 1))

I'm submitting a pull request (https://github.com/Caparrini/pyAudioAnalysis), although I'm uncertain if it aligns with the expected behavior. I've uploaded it here for your convenience, should you prefer this over modifying your local library directly. Please choose whichever option suits you best.

Best regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants