Audio DSP at the Edge!
An approach to implement a simple sound/no-sound detection feature using amplitude thresholding.
Review our I2S process & How to signal Process
- How are we reading audio blocks.
- We capture a block size of 256
- They are read into our mem slab
- We have the filled blocks shortly after calling
i2s_trigger
to start - We should likely free our blocks after calling the same trigger to stop
- The capture format is.
- 24 Bit aligned container sometimes knows as
S24_LE
or24 bit in 32 bit
orS32_LE
, the LE here is little Endian.
- 24 Bit aligned container sometimes knows as
- How many samples per block or frame to analyzei
- Our detection window will be 256 samples, each sample will have 3 bytes (this fits into our 2 bit format above)
- At our sample rate of 44Khz that means a 5.8ms window or 172 times per a second
time per sample = 1/sample_rate ~~ 1/44Khz
window_time = samples/sample_rate ~~ 256/44100
for faster detection we can use a smaller window, for richer detection we can use
a bigger window.
- Add a thread and queue to do this in the background/ we can start by doing it in main.
The Detection Algorithm
- Define the detection metric:
- We will be using the Root Mean Square (rms) of the absolute value of the signal, what this basically means is we will be measuring both the AC and DC components in this system
a) DC would account for any interference in our signals environment b) AC would account for the actual sound signal itself.
This will basically enable us to do a more robust threshold detection.
- Computing the metric for each buffer:
- Everytime we have a memory block we will need to calculate the number of samples in
the block each sample is 4 bytes (24 bits with a sign), so
num_bytes/4 = num_samples
- Let’s use CMSIS-DSP to compute the rms we want from above a) cmsis works with floating point arrays so we need to scale our 24 bit values b) We need to normalize our values
#define SCALER 8388608.0f int sample_count = (BLOCK_SIZE / sizeof(int32_t); int_32t *samples = (int32_t *)block; float32_t buffer[sample_count]; for (int i = 0; i < sample_count; i++) { //we have sign extended sample above buffer[i] = (float32_t) samples[i] / SCALER; }
- After this wecan use
arm_math.h
in our code to get the rmsarm_rms_f32(buffer, sample_count, &rms_value)
- Everytime we have a memory block we will need to calculate the number of samples in
the block each sample is 4 bytes (24 bits with a sign), so
- Plan the comparison:
- Compare the computed metric against the threshold to determine if sound is present.
Testing Strategy
- Define test scenarios:
- Quiet environment to verify that silence is correctly detected.
- Introduce controlled sounds (clap, voice) to verify detection.
- Plan how to observe results:
- Can we Detect Audio and send a BLE notification
- Can we advance towards identifying the audio?
Integration and Optimization
- Plan how to integrate detection into the main loop without disrupting I²S reads.
- Decide how frequently detection should run (e.g., once per block or at fixed intervals).
- Identify opportunities to optimize performance:
- Use simple arithmetic operations.
- Minimize memory usage by processing in place.
- Prepare for future enhancements:
- Optionally, maintain a running average over multiple blocks.
- Consider adding hysteresis to avoid rapid toggling between sound and silence.
Outcome:
We have a structured approach to implement amplitude threshold-based sound detection in our I²S application, including where to integrate it, how to measure amplitude, and how to test and tune it.
References Zephyr Device Tree