Reading tones from *.wav files?

**Jose_VB** · Dec 1st, 2014, 09:42 AM

Well, I've been looking for some information about reading individual frequencies from *.wav files.
I know that the wav file is structured in 1 chunk and 2 sub-chunks (format and data) as we can see in the following picture:
Name: waveRiff.png
Views: 4392
Size: 10.2 KB

Name: waveRiff.png
Views: 4392
Size: 10.2 KB

The part I don't still understand is when we speak about raw data. The last part is where there is the audio data. My question is how is possible to read individual frequencies from it, one by one. I mean, transforming the byte data to frequency in hertzs.
Can someone explain it?

**passel** · Dec 1st, 2014, 09:52 AM

I'm not going to try to explain it, as it would be a fairly complex task.
A wave file's data are amplitudes of the signal at a given point in time which is based on the sampling frequency.
So, if you have a wave file that is sampled at 44000 times per second, then each sample represents the sound pressure (or amplitude) of the sound at that particular instance.
To pull frequencies from amplitude data will involve spectrum analysis, normally involving Fast Fourier Transforms to convert amplitude changes over time into frequency components.
Not something I've done.
You can try looking up those terms to get more information.

**Jose_VB** · Dec 1st, 2014, 03:26 PM

So you mean that if I've a 44,000 times per second sampled file, that means that each second there are 44,000 bytes of information inside the raw audio data chunk?.

I don't know if I've understood it good or bad. But imagine that in one point in time there is a frequency 154,87 hertz. My question is that 154,87 hertz cannot be expressed (I think it cannot be) just using 1 byte of information. I speak of the most simple example, that is a mono audio file and not a stereo one.

**jmsrickland** · Dec 1st, 2014, 04:47 PM

Do you need sample code to do what you don't understand

**some1uk03** · Dec 1st, 2014, 04:50 PM

Depends if Mono / Stereo or 8 bit or 16 Bit etc...

8 Bit are represented as single bytes
16 Bit are represented as 2 byte Integers

Nice Link: https://ccrma.stanford.edu/courses/4...ts/WaveFormat/

**passel** · Dec 1st, 2014, 10:45 PM

Originally Posted by Jose_VB

So you mean that if I've a 44,000 times per second sampled file, that means that each second there are 44,000 bytes of information inside the raw audio data chunk?.

I don't know if I've understood it good or bad. But imagine that in one point in time there is a frequency 154,87 hertz. My question is that 154,87 hertz cannot be expressed (I think it cannot be) just using 1 byte of information. I speak of the most simple example, that is a mono audio file and not a stereo one.

Yes, to the first part, sort of. I believe most of the wave files I've seen often use 16-bit samples, so there would be 88000 bytes of information per second, representing 44000 samples.

As for the second part, there is no frequency information stored as data in a wave file (other than the sampling frequency in the header). The data is the amplitude of the sound wave at a given point in time, not frequency.
So, your 154,87 hertz tone, if it was a sine wave for instance, would go from 0 to some maximum value then fall back through 0 to some minimum value and back to 0 (to complete one cycle) 154,87 times per second. If the amplitude information was stored in a single byte, then one cycle of the wave would be about 284 samples, or 284 bytes (44000 / 154,87 = 284,10925 samples per cycle).
If the waveform was a single tone, then it wouldn't be hard to determine the frequency, given the sampling frequency. You would just loop through the data bytes looking for an easy to find transition, and count the number of samples between those transitions.
For instance, look for the highest value, so when you see the amplitude start to fall from the highest value, count samples until you see the next rise to the highest level.
Just divide the count of samples between the peaks into the sampling frequency to determine the wave frequency.
You could use the lowest value, or the transition through 0 in a particular direction, it really doesn't matter as long as you can identify a point that represents the same point in a cycle of the wave form.
If on the other head, you are trying to identify frequencies of sounds in a wave file that contains multiple tones of different frequencies simultaneously, then you would need much more complicated math to break those out.

**Jose_VB** · Dec 2nd, 2014, 01:53 AM

Originally Posted by some1uk03

Depends if Mono / Stereo or 8 bit or 16 Bit etc...

8 Bit are represented as single bytes
16 Bit are represented as 2 byte Integers

Nice Link: https://ccrma.stanford.edu/courses/4...ts/WaveFormat/

Thanks for the link, I've just read it and I think I can understand it a little bit better.
The main question that remains at the moment is about bit depth. After reading the link you've suggested to me I can understand that bit depth (8 or 16) is a way of representing the information and it doesn't affects to que sound quality (I don't know if this last thing is wrong or correct).

So in a 8 bit samples system, the information is stored in bytes ranging from 0 to 255. For example, in a mono 8 bit system each sample is one byte in the raw audio data sub-chunk. Right?

In a 16 bit system (mono) each sample is represented using 2 bytes of information. And here is where I've the doubt... in your link I can read

16-bit samples are stored as 2's-complement signed integers, ranging from -32768 to 32767

Reading that I can understand that in 16 bit system the information stored is a number from -32768 to 32768 and that information cannot be stored using 1 byte only, so it's used 2 bytes to store it.

Another question is I don't fully understand when I read in the text: "signed integers" and "unsigned bytes". Can you explain me more about that? Specially the part of signed/unsigned information because I don't know what difference there is between an unsigned and a signed byte.

Originally Posted by jmsrickland

Do you need sample code to do what you don't understand

At the moment I think that after understanding a little bit more the mechanism, I won't need code since I've found some in the net. The problem is that I don't fully understand the mechanism behind all. At the moment with the help of all of you I'm starting to understand it better. Thanks anyway.

Originally Posted by passel

Yes, to the first part, sort of. I believe most of the wave files I've seen often use 16-bit samples, so there would be 88000 bytes of information per second, representing 44000 samples.

As for the second part, there is no frequency information stored as data in a wave file (other than the sampling frequency in the header). The data is the amplitude of the sound wave at a given point in time, not frequency.
So, your 154,87 hertz tone, if it was a sine wave for instance, would go from 0 to some maximum value then fall back through 0 to some minimum value and back to 0 (to complete one cycle) 154,87 times per second. If the amplitude information was stored in a single byte, then one cycle of the wave would be about 284 samples, or 284 bytes (44000 / 154,87 = 284,10925 samples per cycle).
If the waveform was a single tone, then it wouldn't be hard to determine the frequency, given the sampling frequency. You would just loop through the data bytes looking for an easy to find transition, and count the number of samples between those transitions.
For instance, look for the highest value, so when you see the amplitude start to fall from the highest value, count samples until you see the next rise to the highest level.
Just divide the count of samples between the peaks into the sampling frequency to determine the wave frequency.
You could use the lowest value, or the transition through 0 in a particular direction, it really doesn't matter as long as you can identify a point that represents the same point in a cycle of the wave form.
If on the other head, you are trying to identify frequencies of sounds in a wave file that contains multiple tones of different frequencies simultaneously, then you would need much more complicated math to break those out.

I've been also looking in the forum for other similar questions and some users suggested to read the following link (http://en.wikipedia.org/wiki/Pulse-code_modulation) that have helped a little bit to better understand the mechanism.

Well, I asked about how to represent information in hertz using the bytes from the raw audio data sub-chunk that we can find in the *.wav file.
Reading all the information that you and all the users are posting, I think I understand it a little bit better.

Depending the bit depth (8 or 16),the samples/sec and the number of channels (mono/stereo) mainly, I need to interpret the information in one way or another.
1. Depending the bit depth, in 8 bit system the information is stored in 1 byte. In 16 bit it's stored in 2 bytes.
2. The samples represent the points of information (samples) that are stored in 1 second of time. Usually 44,000 samples per second.
3. One channel means that each sample is stored after the last one. In stereo systems we can find the left channel sample and after the right channel sample in the raw audio data.
Example: in a 16 bit stereo there is 44,000 (samples/sec) * 2 (16 bit system) * 2 (stereo) = 176,000 bytes of information in a second. Right?

And as you say, I can see more easy now that the information are points in the time that represents points of the wave.
Like this: http://dwellangle.files.wordpress.co...pling-dots.png

**some1uk03** · Dec 2nd, 2014, 07:45 AM

Another question is I don't fully understand when I read in the text: "signed integers" and "unsigned bytes". Can you explain me more about that? Specially the part of signed/unsigned information because I don't know what difference there is between an unsigned and a signed byte.

Signed Integers = A value that can hold information negative of 0 (-32768) and positive of 0 (+32768)
Unsigned Bytes = the Numbers range Positive only 0 to 255. No Negative/below -255

**passel** · Dec 2nd, 2014, 09:21 AM

Originally Posted by some1uk03

Signed Integers = A value that can hold information negative of 0 (-32768) and positive of 0 (+32768)
Unsigned Bytes = the Numbers range Positive only 0 to 255. No Negative/below -255

In your post, and in post #7, the range -32768 to 32768 has been mentioned, but the maximum value is 32767, not 32768.
With 16 bits you have 65536 possible states, aka numbers.
If you divide that in half, you would have two ranges of 32768 values.
In one's complement, the "lower" half of the range would be -0 to -32767 (32768 values).
The upper half would have the range 0 to 32767 (32768 values).
But since people are not used to dealing with negative zero, twos complement was invented to remove negative 0.
So, the negative range goes -1 to -32768 (32768 values), while zero is still included in the "positive" range, 0 to 32767.
So, the total range for a 16-bit 2's complement values is -32768 to 32767.

Some languages support a signed byte, so its range would go -128 to 127.
Likewise, some support 16-bit unsigned values, so the range would be 0 to 65535.

It doesn't matter which way you interpret the bits, signed or unsigned, as long as you know what the middle value is and that the wave amplitude is centered on that range.
So, for a byte which is unsigned the zero level of the wave would be 128.
If you had 16-bit unsigned values, the zero level of the wave would be 32768, with positive wave amplitude above that point and negative amplitude below that point.

**Jose_VB** · Dec 3rd, 2014, 12:49 AM

I want also to know how to calculate de decibel of a specific frequency.
Any ideas?

**Jose_VB** · Dec 4th, 2014, 01:53 AM

Just one more question...
If I want to read one individual sample... how can I do that?
Maybe in this way?

Mono, 8 bit unsigned PCM

Code:

Dim IndividualSample as Integer
Dim TheFile as String

TheFile = CommonDialog1.Filename

Open TheFile for Binary Access Read as #1
Get #1, 45, IndividualSample
Close #1

MsgBox IndividualSample

**Navion** · Dec 4th, 2014, 08:53 AM

No.

8 bits mono is one byte.

Code:

Dim IndividualSample as Integer
Dim Onebyte as byte

...
Get #1, 45, Onebyte
...

IndividualSample = OneByte-128

Also... you are assuming that the WAV header is 44 bytes... it is most often the case but not always. The wav header uses a chunk system. It has to be read in chunks. Code examples you will find on the internet will only make case of the most standard 44 bytes header. Wav files headers recorded by the MCI are always 44 bytes. Wav files recorded with commercial recording software most often are not.

**Jose_VB** · Dec 5th, 2014, 03:44 AM

Originally Posted by Navion

No.

8 bits mono is one byte.

Code:

Dim IndividualSample as Integer
Dim Onebyte as byte

...
Get #1, 45, Onebyte
...

IndividualSample = OneByte-128

Also... you are assuming that the WAV header is 44 bytes... it is most often the case but not always. The wav header uses a chunk system. It has to be read in chunks. Code examples you will find on the internet will only make case of the most standard 44 bytes header. Wav files headers recorded by the MCI are always 44 bytes. Wav files recorded with commercial recording software most often are not.

https://ccrma.stanford.edu/courses/4...ts/WaveFormat/

8-bit samples are stored as unsigned bytes, ranging from 0 to 255.

When I try the code you've shown, I get integer values below 0. I mean, negative numbers.
There is something that I don't understand.

**passel** · Dec 5th, 2014, 04:58 AM

OneByte - 128
changes the range 0 to 255 (unsigned range) to -128 to 127 (0 - 128 = -128, 255 - 128 = 127).

That will give you a signed (+/-) wave sample. If you wanted to treat it as a 16 bit signed sample you could simply now multiply by 256 to put it in the range -32768 to 32767.

**Navion** · Dec 5th, 2014, 12:23 PM

In an 8 bit wav, numbers are stored 0 to 255 because of the fact that there is not (in most languages anyway) a signed byte data format, which would range from -128 to 127. Only an unsigned byte format that ranges for 0 to 255 and that must be converted by substracting 128 to each element to make it useful data.

For 16 bits waves samples, the standard signed integer data type is used. That ranges from -32768 to 32767, so the data being read can be used as is, without a conversion step.

A waveform is like a sine wave (the simplest, purest waveform). It ranges from -amplitude to +amplitude. The exact amplitude amount is determined by programming need.

You might want to see it another way. The wave file data is a unitless format that ranges form -100% to 100%. The fact that numbers within a certain range are used does not make them a "UNIT". They are always a percentage of a unit to be decided later.

In 8 bits format, that -100% and 100% corresponds to -128 to 127, in 16 bits signed integer format from -32768 to 32767.

Let'a assume that you have a file containing at least 1000 bytes of 8 bit wave data and you want to plot the first 1000 samples (1001 actually). You also have a form and an horizontally elongated picturebox. The following bit of code will do that.

Code:

ReDim bytes(1000) As Byte

filenum = FreeFile
Open TheFile For Binary As filenum
Seek filenum, 45
Get filenum, , bytes
Close filenum

Picture1.Scale (0, 127)-(1000, -128)
Picture1.PSet (0, bytes(0) - 128)
For i = 1 To 1000
    Picture1.Line -(i, bytes(i) - 128)
Next

**Jose_VB** · Dec 6th, 2014, 02:00 AM

Originally Posted by passel

OneByte - 128
changes the range 0 to 255 (unsigned range) to -128 to 127 (0 - 128 = -128, 255 - 128 = 127).

That will give you a signed (+/-) wave sample. If you wanted to treat it as a 16 bit signed sample you could simply now multiply by 256 to put it in the range -32768 to 32767.

Of course, but how can I read a 16 bit wave file?

16 Bit are represented as 2 byte Integers

As you and others have explained the 8 bit mono can be read as 1 unsigned byte per sample.
The result will be -128 to 127 and I need to change the range to unsigned (0 to 255) bytes. Of course.

What about 16 bits? How shoud I interpret the information contained in the wav file?
Do I have to multiply the first integer times the second integer to get a value from -32768 to 32767?

**Jose_VB** · Dec 6th, 2014, 02:03 AM

Originally Posted by Navion

In an 8 bit wav, numbers are stored 0 to 255 because of the fact that there is not (in most languages anyway) a signed byte data format, which would range from -128 to 127. Only an unsigned byte format that ranges for 0 to 255 and that must be converted by substracting 128 to each element to make it useful data.

For 16 bits waves samples, the standard signed integer data type is used. That ranges from -32768 to 32767, so the data being read can be used as is, without a conversion step.

A waveform is like a sine wave (the simplest, purest waveform). It ranges from -amplitude to +amplitude. The exact amplitude amount is determined by programming need.

You might want to see it another way. The wave file data is a unitless format that ranges form -100% to 100%. The fact that numbers within a certain range are used does not make them a "UNIT". They are always a percentage of a unit to be decided later.

In 8 bits format, that -100% and 100% corresponds to -128 to 127, in 16 bits signed integer format from -32768 to 32767.

Let'a assume that you have a file containing at least 1000 bytes of 8 bit wave data and you want to plot the first 1000 samples (1001 actually). You also have a form and an horizontally elongated picturebox. The following bit of code will do that.

Code:

ReDim bytes(1000) As Byte

filenum = FreeFile
Open TheFile For Binary As filenum
Seek filenum, 45
Get filenum, , bytes
Close filenum

Picture1.Scale (0, 127)-(1000, -128)
Picture1.PSet (0, bytes(0) - 128)
For i = 1 To 1000
    Picture1.Line -(i, bytes(i) - 128)
Next

After reading your post I understand it better. I'm going to perform some practise with the information you've suggested.

**Jose_VB** · Jan 31st, 2015, 01:17 PM

Hello again,
After reading more information about the subject I understand it better. A raw *.wav file (in the data chunk) contains amplitudes. These amplitudes represents the samples of the original sound pressure (acoustic wave) that has been stored on the file. The file doesn't not contains frequencies to represent the sound. The file contains an audio signal that is composed of samples. The audio signal is a non periodic, complex function in the time spectrum.
To determine the frequencies of that complex signal (data chunk) I've to use FFT that breaks down the complex audio signal into single components called harmonics. After breaking down all that information from the complex audio signal, I would be able to obtain the individual frequencies, harmonics, that the complex signal is composed of.
Is this right?

**Navion** · Jan 31st, 2015, 06:36 PM

Yes... Eventually you will end up with something like this.

The top picture shows a synthesizer tone and it's harmonics. You can clearly see the frequencies and the frequency shift upward as glide is applied to the tone.

For multi-signal samples, like music (bottom picture), things get more complex very fast. This is a one second sample of some musical audio track.

Note that audio signals are functions of real numbers only (no complex numbers involved ) since there is no phase (such as data acquired from rotating mechanical parts).

Name: revo.jpg
Views: 1835
Size: 20.7 KB

Name: revo.jpg
Views: 1835
Size: 20.7 KB

Name: music.jpg
Views: 1870
Size: 42.2 KB

**Jose_VB** · Feb 1st, 2015, 02:13 AM

Originally Posted by Navion

Yes... Eventually you will end up with something like this.

The top picture shows a synthesizer tone and it's harmonics. You can clearly see the frequencies and the frequency shift upward as glide is applied to the tone.

For multi-signal samples, like music (bottom picture), things get more complex very fast. This is a one second sample of some musical audio track.

Note that audio signals are functions of real numbers only (no complex numbers involved ) since there is no phase (such as data acquired from rotating mechanical parts).

Yes, I see. A pure tone is the most simple case because it's a periodic function and it's very easy to calculate.
The calculations get harder when there is not a single tone. Usually in any audio data there are a multiple combinations of tones (with their specific amplitude each tone) that made the final audio signal.

In the *.wav file I can find (in the data chunk) a function in time F(t) that is the audio signal. This function is a combination of multiple periodic functions that have been mixed into one. So, there is no way (or maybe it's very difficult), to measure the frequencies directly without any algorhytm. So, for that reason I've to employ the fast fourier transform algorhytm to decompose that data chunk signal into its basic elements (a sum of pure tones with their specific amplitudes each one).
If I perform a summation of all those single tones, I would have the original audio signal. I think I understood well this part.

The main question right now is: What is the mechanism to decompose into single tones that original audio signal?
I've heard about butterfly calculations, sin, imaginary and real part... I know those tools are related to FFT algorythm process (or I think so). But at the moment I don't see the whole picture of the mechanism involved.

**Jose_VB** · Jul 21st, 2015, 03:57 AM

Hello again,
I'm still with this question and I understand it a little bit better.

At this moment, I can read the single bytes (8-bytes, mono) of the *.wav file. So I understand that each byte (point) represents a wave amplitude at a given time. There are not frequencies in the *.wav file. There are wave's amplitudes.

Of course, the next step is to determine the frequency content of that signal using FFT or DFT or Goertzel (for one frequency at time).
So, at this moment my doubt is how to obtain the frequency content of the signal using the 8bit-mono bytes from the sound file. I've read some source codes and tutorials on FFT and DFT but I don't view the full picture.

About DFT I understand that:
1. Select one point of the original wav signal and compare it with sine and cosine periodic waves.
2. From that comparison-correlation, I get a correlation measurement.
3. That correlation measurement has 2 components: a real number (cosine wave correlation result) and a imaginary number (sine wave correlation result).
4. The magnitude of the correlation result (complex number (real + imaginary)) is used to calculate the magnitude spectrum of the signal.

Well, I don't how to compare-correlate the original wave function with the sine and cosine basis function.
Someone knows?

**namrekka** · Jul 21st, 2015, 07:18 AM

I advice you to have a look here:

http://www.dspguide.com/pdfbook.htm

In chapter 8 you will find the DFT

**Jose_VB** · Jul 21st, 2015, 08:17 AM

Originally Posted by namrekka

I advice you to have a look here:

http://www.dspguide.com/pdfbook.htm

In chapter 8 you will find the DFT

I've been reading a part of that website that you've suggested to me.
One question... when we speak about correlating the waves, what that means?
I've seen that maybe correlation is about superposing one basis sinusoidal form on the original input and examine how much space they've in common.
Is this right?

**Jose_VB** · Jul 21st, 2015, 11:40 AM

I've found this code: http://www.planet-source-code.com/vb...68324&lngWId=1

Where I'm going to load the byte array of the 8-bit mono wav file and see what happens.

**Jose_VB** · Jul 21st, 2015, 12:56 PM

I've found a module and I've an output of unsigned integers (0 to 255) that correspond to microphone audio. I would like to know how can I pass that microphone bytes to the FFT module and interpret the output.

VBFFT.bas

**The trick** · Jul 21st, 2015, 01:20 PM

I showed you couple my projects where using Fast Fourier Transform.
Vocoder using these transform for a filtering (equalising) and multiplication of spectrum. This project contained useful clase for transform - clsFFT.
Visualizer using this transform for a separating tones (what you wanted) and visualizing it as turn per octave (watch related video for explanation).

**The trick** · Jul 21st, 2015, 01:41 PM

Also in article about vocoder being contained little bit explanation about Fourier transform.

**Jose_VB** · Jul 21st, 2015, 03:28 PM

Originally Posted by The trick

Also in article about vocoder being contained little bit explanation about Fourier transform.

I'm reading it

**Jose_VB** · Jul 21st, 2015, 03:56 PM

Originally Posted by The trick

Also in article about vocoder being contained little bit explanation about Fourier transform.

I'm reading your posts and I understand a little bit better. But I've a big problem understanding things and I've to ask lot of questions to fully understand the concept.

Well, Imagine that I've a data of 8bit-mono audio information. 8bit-mono bytes are from 0 to 255. Each byte represents a point in time.
Of course, as far I know to obtain the frequency domain information I've to use FFT to get the frequency informations.

To perform that task I've to correlate (compare) the time domain signal with harmonics (1, 2, 3... N) and know how much the harmonic has in common with the time domain signal. I've to perform 2 correlations: first is the real part (cosine) and then the imaginary part (sine). From sine and cosine I've to obtain the absolute value of |cos + sin|. Right?

Another question, how can I obtain the correlation value of the sine or cosine?

I've seen this picture and I don't know how to obtain a value from the graphic. I mean, in the sine or cosine correlation graphic there is a function (correlation values) and I've to obtain a number from that sine or cosine function. How can I obtain it?

**namrekka** · Jul 22nd, 2015, 09:38 AM

On this page there is an example, in good old Basic, of DFT:

http://www.dspguide.com/ch8/6.htm

Line numbers 340 and 350 the magic happens. Its just an accumulator.

**The trick** · Jul 22nd, 2015, 02:40 PM

Use the correlation theorem. You should multiply spectrum only before you must do a conjugation for one of sequence. Further you shoul calculate inverse Fourier transform for a previous multiplication. Then you obtain a circular correlation. If you want get usual correlation then you should add zero values to each input sequence. In my vocoder i did a calculating of spectrum for a carrier and a modulation signals, after i've multiplied theses spectrums and also multiplied these spectrums on a equalization function.

Thread: Reading tones from *.wav files?

Thread Tools

Display

Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Re: Reading tones from *.wav files?

Tags for this Thread

Posting Permissions