0% found this document useful (0 votes)
62 views

Sampling, Removal of Silence and Noise in Audio Signal PDF

The document discusses sampling and removing silence from audio signals. It begins by explaining audio sampling and how it converts continuous sound waves into discrete digital samples. It then describes how silence can be removed from speech signals by dividing the signal into segments, calculating the RMS value of each segment, and comparing it to a threshold to identify and remove silent portions. Removing silence can help reduce noise, create segments for synchronization, and optimize audio files.

Uploaded by

Swetha N S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Sampling, Removal of Silence and Noise in Audio Signal PDF

The document discusses sampling and removing silence from audio signals. It begins by explaining audio sampling and how it converts continuous sound waves into discrete digital samples. It then describes how silence can be removed from speech signals by dividing the signal into segments, calculating the RMS value of each segment, and comparing it to a threshold to identify and remove silent portions. Removing silence can help reduce noise, create segments for synchronization, and optimize audio files.

Uploaded by

Swetha N S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

SAMPLING,

REMOVAL OF SILENCE AND


NOISE IN AUDIO SIGNAL

A PROJECT REPORT

Submitted by

N.S.SWETHA (211420106266)

S.THANUSHA (211420106268)

R.SOWMYA (211420106244)

In partial fulfillment for the award of the degree

of

BACHELOR OF ENGINEERING

In

ELECTRONICS AND COMMUNICATION ENGINEERING

PANIMALAR ENGINEERING COLLEGE, CHENNAI

NOVEMBER 2022
ACKNOWLEDGEMENT

We would like to express our deep gratitude to our Beloved secretary


and Correspondent Dr. P. CHINNADURAI, M .A. Ph. D, for his kind words and
enthusiastic motivation which inspired us lot in completing this project and we
express our sincere thanks to Our directors Mrs. VIJAYARAJESHWARI and Mr.
C. SAKTHIKUMAR, M .E.for providing us with necessary facilities for
completion of this project.

We also express gratitude to our principal Dr. K. MANI, M.E. Ph .D,


who has been source of constant Encouragement and Support.

We would also like to express our gratitude to our internal guide Mrs.
R .RAJALAKSHMI, Professor, Electronics and Communication Engineering, for
her valuable guidance, ideas and Encouragement for Successful completion of the
project.

We take this Opportunity to thank our beloved Parents, Teachers and


Friends for their constant Support and Encouragement.
SAMPLING
In signal processing, sampling is the reduction of a continuous-time signal to
a discrete-time signal. A common example is the conversion of a sound wave to a
sequence of "samples". A sample is a value of the signal at a point in time and/or
space; this definition differs from the usage in statistics, which refers to a set of
such values.
A sampler is a subsystem or operation that extracts samples from a continuous
signal. A theoretical ideal sampler produces samples equivalent to the
instantaneous value of the continuous signal at the desired points.
The original signal can be reconstructed from a sequence of samples, up to
the nyquist limit, by passing the sequence of samples through a type of low-pass
filter called a reconstruction filter.

Audio sampling
Digital audio uses pulse-code modulation (PCM) and digital signals for
sound reproduction. This includes analog-to-digital conversion (ADC), digital-to-
analog conversion (DAC), storage, and transmission. In effect, the system
commonly referred to as digital is in fact a discrete-time, discrete-level analog of a
previous electrical analog. While modern systems can be quite subtle in their
methods, the primary usefulness of a digital system is the ability to store, retrieve
and transmit signals without any loss of quality.
When it is necessary to capture audio covering the entire 20–20,000 Hz
range of human hearing, such as when recording music or many types of acoustic
events, audio waveforms are typically sampled at 44.1 kHz (CD), 48 kHz,
88.2 kHz, or 96 kHz. The approximately double-rate requirement is a consequence
of the Nyquist theorem. Sampling rates higher than about 50 kHz to 60 kHz cannot
supply more usable information for human listeners. Early professional
audio equipment manufacturers chose sampling rates in the region of 40 to 50 kHz
for this reason.
There has been an industry trend towards sampling rates well beyond the
basic requirements: such as 96 kHz and even 192 kHz Even
though ultrasonic frequencies are inaudible to humans, recording and mixing at
higher sampling rates is effective in eliminating the distortion that can be caused
by fold back aliasing. Conversely, ultrasonic sounds may interact with and
modulate the audible part of the frequency spectrum (inter modulation
distortion), degrading the fidelity. One advantage of higher sampling rates is that
they can relax the low-pass filter design requirements for ADCs and DACs, but
with modern oversampling sigma-delta converters this advantage is less important.
The Audio Engineering Society recommends 48 kHz sampling rate for most
applications but gives recognition to 44.1 kHz for Compact Disc (CD) and other
consumer uses, 32 kHz for transmission-related applications, and 96 kHz for
higher bandwidth or relaxed anti-aliasing filtering. Both Lavry Engineering and J.
Robert Stuart state that the ideal sampling rate would be about 60 kHz, but since
this is not a standard frequency, recommend 88.2 or 96 kHz for recording
purposes.

Speech sampling
Speech signals, i.e., signals intended to carry only human speech, can usually be
sampled at a much lower rate. For most phonemes, almost all of the energy is
contained in the 100 Hz–4 kHz range, allowing a sampling rate of 8 kHz. This is
the sampling rate used by nearly all telephony systems, which use
the G.711 sampling and quantization specifications.

SILENCE REMOVAL IN SPEECH SIGNAL


In speech recording, click noises (not to be confused with click consonants) result
from tongue movements, swallowing, mouth and saliva noises.[8] While in voice-
over recordings, click noises are undesirable, they can be used as a sound effect
of close-miking in ASMR and pop music.

Silence removal block is used to eliminate the unvoiced and silent portion of the
speech signal. For this purpose input signal is divided into small segments (frames)
and root mean square (RMS) of each individual segment is calculated and
compared with a specific threshold value. The total length of each individual
segment is equal to product of time duration and sampling frequency of segment.

𝑆𝑒𝑔𝑚𝑒𝑛𝑡length=𝑆𝑒𝑔𝑚𝑒𝑛𝑡duration× 𝐹𝑠 (2)

Accuracy and performance of silence removal block depend on total number of


segments. The total number of segments can be calculated from dividing total
length of input signal by length of individual segment. Equation to find total
number of segments is expressed as:
𝑇𝑜𝑡𝑎𝑙segments=𝑁/𝑆𝑒𝑔𝑚𝑒t length (3)
RMS value of each segment is calculated and compared with threshold value. RMS
value of each individual segment can be calculated from equation 4.
𝑅𝑀𝑆𝑆𝑒𝑔𝑚𝑒𝑛t= sqrt ( 𝑛(𝑆𝑒𝑔𝑚𝑒𝑛𝑡))2 (4)
Threshold value for this block is computed from equation (5).
𝑅th=𝜇+𝑣2 (5)
Where 𝑣 is minimum RMS value of 𝐾 voiced signals and 𝜇 is mean RMS
value of 𝐾 unvoiced signals.
Silence removal is very helpful portion of proposed technique to reduce
processing time and increase the performance of system by eliminating unvoiced
segments from the input signal.
A novel idea is used to set the threshold value for silence removal it
eliminates 97.2% of unvoiced segments from speech signal.
After the elimination of silent segments the new (remaining) signal
entered into Endpoint detector block. Length of new signal is always less than the
length of original signal.
Endpoint detector is used to compute the stop point of signal where the
magnitudes of signal drops to zero.
Endpoint detection is important feature of speech processing it plays an
important role in speaker and speech recognition for the identification of
individuals.
USES OF SILENCE REMOVAL:
Remove background noises
 Create segments of spoken recordings
 Create segments for drum loops
 Optimize synchronization
 Optimize files and regions
 Use Remove Silence to extract audio files
REMOVE BACKGROUND NOISES
The most common use for Remove Silence is simulation of the classic noise gate
effect. When used on long recordings with numerous gaps—such as vocals or
instrumental solos—you can obtain better results by setting a low threshold value.
Background noise is removed, without affecting the main signal.
 For short percussive regions (drum loops), you can simulate time
compression/expansion by simply altering the tempo.
 You can even quantize the individual segments in an audio recording.

CREATE SEGMENTS OF SPOKEN RECORDINGS


 You can use Remove Silence to divide long spoken passages into several
convenient segments, like sentences, words, or syllables. For film
synchronization or jingles, you can move or reposition the speech segments by
simply dragging them around in the Tracks area.
 Tempo changes allow you to simulate a time compression or expansion effect,
as the syllables automatically move closer together, or farther apart.

CREATE SEGMENTS FOR DRUM LOOPS


 Dividing drum loops into small segments is a good way to perfectly
synchronize them. For example, in audio passages where the bass drum and
snare are completely separate, you can often use Remove Silence to isolate
each individual beat.

OPTIMIZE SYNCHRONIZATION
 Different computers, different synchronization sources (internal or SMPTE
code), different tape machines, and—in theory—different samplers or hard disk
recording systems will exhibit slight variations in clock speed. Changing just
one component can lead to a loss of synchronization between recorded audio
material and MIDI. This is particularly applicable to long audio regions.
 This is another situation where the Remove Silence function can help, by
creating several shorter audio regions, with more trigger points between the
audio and MIDI events.
 For example, you can use this method to roughly split up a whole audio file,
and then divide the new regions, using different parameters. The new regions
can then be processed again with the Remove Silence function.
OPTIMIZE FILES AND REGIONS
 You can use Remove Silence to automatically create regions from an audio file
that contains silent passages, such as a single vocal take that runs the length of a
project. The unused regions or portions of the audio file can be deleted, saving
hard disk space, and simplifying (file and) region management.

USE REMOVE SILENCE TO EXTRACT AUDIO FILES


 Many sample library discs (CD or DVD) contain thousands of audio recordings
stored as AIFF or WAV files. You can use Remove Silence to split these into
individual regions, which can be used directly in the Tracks area. In addition,
you can convert regions into individual audio files (samples), which can be
used in the Sampler.

NOISE REDUCTION
Noise reduction is the process of removing noise from a signal. Noise
reduction techniques exist for audio and images. Noise reduction algorithms
may distort the signal to some degree. Noise rejection is the ability of a circuit to
isolate an undesired signal component from the desired signal component, as
with common-mode rejection ratio.
All signal processing devices, both analog and digital, have traits that make
them susceptible to noise. Noise can be random with an even frequency
distribution (white noise), or frequency-dependent noise introduced by a device's
mechanism or signal processing algorithms.
In electronic systems, a major type of noise is hiss created by random
electron motion due to thermal agitation. These agitated electrons rapidly add and
subtract from the output signal and thus create detectable noise.
In the case of photographic film and magnetic tape, noise (both visible and
audible) is introduced due to the grain structure of the medium. In photographic
film, the size of the grains in the film determines the film's sensitivity, more
sensitive film having larger-sized grains. In magnetic tape, the larger the grains of
the magnetic particles (usually ferric oxide or magnetite), the more prone the
medium is to noise. To compensate for this, larger areas of film or magnetic tape
may be used to lower the noise to an acceptable level.
MATLAB PROGRAM :
clear all; close all; clc
% REMOVAL OF SILENCE AND NOISE IN SPEECH SIGNAL
% read sound
[data, fs] = audioread('count.wav'); sound(data,fs); hold on;
plot(data(1:end,1))
figure(1);
% normalize data
data = data / abs(max(data));
fd = 0.025;
f_size = round(fd * fs);
n = length(data)/f_size;
floor(n);
temp = 0;
for i = 1 : n
frames(i,:) = data(temp + 1 : temp + f_size);
temp = temp + f_size;
end

% voiced sample (110 frame)


plot(frames(110,:));
figure(2);
plot(abs(fft(frames(110,:))));
figure(3);
plot(frames(40,:));title('unvoiced/silence part');
figure(4);
plot(abs(fft(frames(50,:))));
figure(5);
m_amp = abs(max(frames,[],2));
id = find(m_amp > 0.03);
fr_ws = frames(id,:);
data_r = reshape(fr_ws',1,[]);
plot(data_r); title('speech without silence');
figure(6);
hold on; plot(data);
hold on; plot(data_r);
figure(7);
hold on; sound(data_r,fs);

% SAMPLING OF SPEECH SIGNAL

[y, fs] = audioread('count.wav');


info = audioinfo('count.wav'); %Information about the audio
%Time Domain Analysis
t = 0:seconds(1/fs):seconds(info.Duration); %Time array
t = t(1:end-1); %Time index adjustment
%%% Time Domain Plot
figure(8); subplot(2,1,1); plot(t,y);
xlabel('Time'); ylabel('Audio Signal')
% Compute the Fourier transform of the signal first
Y = fft(y);
L = info.TotalSamples;
P2 = abs(Y/L);
P1 = P2(1:L/2+1);
P1(2:end-1) = 2*P1(2:end-1);
f = fs*(0:(L/2))/L; %Single side spectrum

%%% Frequency Domain Plot


figure(8); subplot(2,1,2); plot(f,P1)
title('Single-Sided Amplitude Spectrum of y(t)')
xlabel('f (Hz)'); ylabel('|P1(f)|')

q = 1; p =2; % q and p identify up sampling, and down sampling respectively


[y1,af1] = resample(y,q,p) ; %downsample
[y2,af2] = resample(y1,p,q); %interpolate

%plot for showing the sampling information


figure(9);
t1 = 0:(1/fs):(info.Duration);
t1 = t1(1:end-1);
t2 = (0:(length(y1)-1))*p/(q*fs);
plot(t1(1:100),y(1:100,1),'-b*',t2(1:(100/p)+1),y1(1:100/p+1,1),'-ro')
title('Sampling Info')
legend('Original','Resampled', 'Location','South')
sound(y1,fs);
figure(10);
freqz(af1)

%echo in sound using convolution


[x,fs] = audioread('count.wav');
figure(11); subplot(3,1,1);
plot(x);
xlen=length(x);
s=x(:,1);
d=75000;
fs=44100;
xlen=length(x);
t=zeros(xlen,1);
t(10)=0.6;
t(d)=0.6;
subplot(3,1,2);
plot(t);
z=conv(s,t);
subplot(3,1,3);
plot(z);
sound(z,44100);

GRAPH OUTPUT :
SAMPLING
REMOVAL OF SILENCE AND NOISE
INPUT SIGNAL
OUTPUT SIGNAL

COMPARING BOTH INPUT AND OUTPUT AUDIO SIGNAL


ADDING ECHO IN SILENCE PART:

RESULT :
Hence , the program has been successfully executed and output is verified.

You might also like