Chapter 7: Audio analysis

We use a variety of audio analysis techniques to to examine, understand and interpret the content of recorded sound signals. In some cases these lead to visualisation methods, whilst in other caes they may be useful in specifying further processing or measurement of the audio signal.

Most of the basic techniques are explored and illustrated in this chapter. Some additional material is presented below.


7.1.4 Spectral measures

Devising a measure to identify speech based on a simple spectral comparison. First, we take some examples of the two letters being spoken, 'c' and 'r' in turn and then obtain a spectrum of each;

Nc=length(speech_letter_c); 
Nr=length(speech_letter_r);
fft_c=fft(speech_letter_c); 
fft_c=abs(fft_c(1:Nc/2)); 
fft_r=fft(speech_letter_r);
fft_r=abs(fft_r(1:Nr/2));

Next, we device a measure based on the energy in high and low freuency regions respectively;

c_lowf=sum(fft_c(1:Nc/4))/(Nc/4); 
c_highf=sum(fft_c(1+Nc/4:Nc/2))/(Nc/4);
r_lowf=sum(fft_r(1:Nr/4))/(Nr/4); 
r_highf=sum(fft_r(1+Nr/4:Nr/2))/(Nr/4);

And compute the ratio of these;

c_ratio=c_highf/c_lowf;
r_ratio=r_highf/r_lowf;

Given an unknown recording of a single letter, we can make a pretty good guess whether it is a 'c' or an 'r' by computing the FFT of the recording, and then computing the low and high frequency sums and in turn the ratio;

U_lowf=sum(fft_U(1:Nr/4))/(Nr/4); 
U_highf=sum(fft_U(1+Nr/4:Nr/2))/(Nr/4);
U_ratio=U_highf/U_lowf;

Looking at the final result, 'U_ratio', we would compare that to 'c_ratio' and 'r_ratio' to determine which is the closest match.


7.1.5 Cepstral analysis

MATLAB code to analyse voiced speech, leading to Figure 7.6, is reproduced below;

len=length(segment);
%Take the cepstrum
ps=log(abs(fft(segment))); 
cep=ifft(ps);
%Perform the filtering
cut =30; 
cep2=zeros(1,len);
cep2(1:cut-1)=cep(1:cut-1)*2; 
cep2(1)=cep(1);
cep2(cut)=cep(cut);
%Convert to frequency domain
env=real(fft(cep2)); 
act=real(fft(cep));
%Plot the result
pl1=20*log10(env(1:len/2)); 
pl2=20*log10(act(1:len/2)); 
span=[1:fs/len:fs/2];
plot(span,pl1,'k-.',span,pl2,'b'); 
xlabel('Frequency, Hz');
ylabel('Amplitude, dB');

7.5.1 Analysis of music

if(~exist('violin'))
	disp('Ensure an array named violin has been recorded');
end

%Plots a simple single spectrum of a segment of the violin music
P=16;
Ws=256;  %window size=256 samples
vseg=violin(1:Ws).*hamming(Ws);
a=lpc(vseg, P);
w=lpc_lsp(a);
lpcsp(a, w);

And the plotting of a waterfall track;

[audio,Fs]=audioread('music_recording.wav');
w=256;		%Window size
order=18;	%LPC analysis order
c=0;
for index=1:w:length(audio)-w
    c=c+1;
    a=lpc(audio(index:index+w),order);
    lsp(c,1:order)=lpc_lsp(a);
end
plot(lsp*Fs/pi);

We also can explore the effect of different analysis orders as in Figure 7.10 as follows;

c=1;
for i=12:12:48
    subplot(4,1,c);
    a=lpc(violin,i);
    lpcsp(a,lpc_lsp(a));
    xlabel('Frequency');
    ylabel('Amplitude');
    legend(['Order=',num2str(i)]);
    c=c+1;
end