How to extract formant center frequencies (or other acoustic features) from speech data with opensmile in python

actually i moved this series to blog.syntheticspeech.de mainly because i found it tedious to not be able to post in markdown

There is a framework called OpenSMILE published on Github that can be used to extract high level acoustic features from audio signals and I’d like to show you how to use it with Python.

I’ve set up a notebook for this here.

First you need to install opensmile

pip install opensmile

General procedure

There are two ways to extract a specific acoustic feature with opensmile:

  1. Use an existing config that contains your target feature and filter it from the results
  2. Write your own config file and extract only your target feature directly

Method 1 is easier but obviously not resource efficient, 2 is better but then to learn the opensmile config syntax and all the existing modules is not trivial.

Using one example for an acoustic feature: formants, we’ll do both ways. The documentation for the python wrapper of opensmile is here

Method 1): Use an existing config file that includes the first three formant frequencies

We start with instantiating the main extractor class, Smile, with a configuration that includes formants. The GeMAPSv01b features set has been derived from the GeMAPS feature set

smile = opensmile.Smile(
feature_set=opensmile.FeatureSet.GeMAPSv01b,
feature_level=opensmile.FeatureLevel.LowLevelDescriptors,
)

# Extact this for our test sentence, out comes a pandas dataframe
result_df = smile.process_file(testfp)
print(result_df.shape)

# Now us only the three center formant frequencies
centerformantfreqs = [‘F1frequency_sma3nz’, ‘F2frequency_sma3nz’, ‘F3frequency_sma3nz’]
formant_df = result_df[centerformantfreqs]
formant_df.head()

Method 2): Write your own config file

# An opensmile config is a text file:
formant_conf_str = '''
[componentInstances:cComponentManager]
instance[dataMemory].type=cDataMemory

;;; default source
[componentInstances:cComponentManager]
instance[dataMemory].type=cDataMemory

;;; source

\{\cm[source{?}:include external source]}

;;; main section

[componentInstances:cComponentManager]
instance[framer].type = cFramer
instance[win].type = cWindower
instance[fft].type = cTransformFFT
instance[resamp].type = cSpecResample
instance[lpc].type = cLpc
instance[formant].type = cFormantLpc

[framer:cFramer]
reader.dmLevel = wave
writer.dmLevel = frames
copyInputName = 1
frameMode = fixed
frameSize = 0.025000
frameStep = 0.010000
frameCenterSpecial = left
noPostEOIprocessing = 1

[win:cWindower]
reader.dmLevel=frames
writer.dmLevel=win
winFunc=gauss
gain=1.0

[fft:cTransformFFT]
reader.dmLevel=win
writer.dmLevel=fft

[resamp:cSpecResample]
reader.dmLevel=fft
writer.dmLevel=outpR
targetFs = 11000

[lpc:cLpc]
reader.dmLevel=outpR
writer.dmLevel=lpc
p=11
method=acf
lpGain=1
saveLPCoeff=1
residual=0
forwardFilter=0
lpSpectrum=0
lpSpecBins=128

[formant:cFormantLpc]
reader.dmLevel=lpc
writer.dmLevel=formant
saveIntensity=1
saveBandwidths=0
maxF=5500.0
minF=50.0
nFormants=3
useLpSpec=0
medianFilter=0
octaveCorrection=0

;;; sink

\{\cm[sink{?}:include external sink]}
'''
with open('formant.conf', 'w') as fp:
fp.write(formant_conf_str)
# Now we reinstantiate our smile object with the custom config
smile = opensmile.Smile(
feature_set=’formant.conf’,
feature_level=’formant’,
)
# and extract again
formant_df_2 = smile.process_file(testfp)
formant_df_2.head()

Voila! The output is shown below the title

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store