We use a variety of audio analysis techniques to to examine, understand and interpret the content of recorded sound signals. In some cases these lead to visualisation methods, whilst in other caes they may be useful in specifying further processing or measurement of the audio signal.

Most of the basic techniques are explored and illustrated in this chapter. Some additional material is presented below.

Devising a measure to identify speech based on a simple spectral comparison. First, we take some examples of the two letters being spoken, 'c' and 'r' in turn and then obtain a spectrum of each;

Nc=length(speech_letter_c); Nr=length(speech_letter_r); fft_c=fft(speech_letter_c); fft_c=abs(fft_c(1:Nc/2)); fft_r=fft(speech_letter_r); fft_r=abs(fft_r(1:Nr/2));

Next, we device a measure based on the energy in high and low freuency regions respectively;

c_lowf=sum(fft_c(1:Nc/4))/(Nc/4); c_highf=sum(fft_c(1+Nc/4:Nc/2))/(Nc/4); r_lowf=sum(fft_r(1:Nr/4))/(Nr/4); r_highf=sum(fft_r(1+Nr/4:Nr/2))/(Nr/4);

And compute the ratio of these;

c_ratio=c_highf/c_lowf; r_ratio=r_highf/r_lowf;

Given an unknown recording of a single letter, we can make a pretty good guess whether it is a 'c' or an 'r' by computing the FFT of the recording, and then computing the low and high frequency sums and in turn the ratio;

U_lowf=sum(fft_U(1:Nr/4))/(Nr/4); U_highf=sum(fft_U(1+Nr/4:Nr/2))/(Nr/4); U_ratio=U_highf/U_lowf;

Looking at the final result, 'U_ratio', we would compare that to 'c_ratio' and 'r_ratio' to determine which is the closest match.

MATLAB code to analyse voiced speech, leading to Figure 7.6, is reproduced below;

len=length(segment); %Take the cepstrum ps=log(abs(fft(segment))); cep=ifft(ps); %Perform the filtering cut =30; cep2=zeros(1,len); cep2(1:cut-1)=cep(1:cut-1)*2; cep2(1)=cep(1); cep2(cut)=cep(cut); %Convert to frequency domain env=real(fft(cep2)); act=real(fft(cep)); %Plot the result pl1=20*log10(env(1:len/2)); pl2=20*log10(act(1:len/2)); span=[1:fs/len:fs/2]; plot(span,pl1,'k-.',span,pl2,'b'); xlabel('Frequency, Hz'); ylabel('Amplitude, dB');

if(~exist('violin')) disp('Ensure an array named violin has been recorded'); end %Plots a simple single spectrum of a segment of the violin music P=16; Ws=256; %window size=256 samples vseg=violin(1:Ws).*hamming(Ws); a=lpc(vseg, P); w=lpc_lsp(a); lpcsp(a, w);

And the plotting of a waterfall track;

[audio,Fs]=audioread('music_recording.wav'); w=256; %Window size order=18; %LPC analysis order c=0; for index=1:w:length(audio)-w c=c+1; a=lpc(audio(index:index+w),order); lsp(c,1:order)=lpc_lsp(a); end plot(lsp*Fs/pi);

We also can explore the effect of different analysis orders as in Figure 7.10 as follows;

c=1; for i=12:12:48 subplot(4,1,c); a=lpc(violin,i); lpcsp(a,lpc_lsp(a)); xlabel('Frequency'); ylabel('Amplitude'); legend(['Order=',num2str(i)]); c=c+1; end