PDA

Click to See Complete Forum and Search --> : mean and standard deviation


tink
Aug 14th, 2001, 10:30 AM
hi , im new to vb6 and need help in writing a program to calculate mean and standard deviation, basic terminology please

reply to abbo123@aol.com

nishantp
Aug 14th, 2001, 01:02 PM
Well Mean is very simple...suppose you have an array of numbers:

Function Mean(ByRef Nums() As Double) As Double
Dim i As Long
Dim Temp As Double
For i = 1 To UBound(Nums)
Temp = Temp + Nums(i)
Next
Mean = Temp / (UBound(Nums))
End Function

However im not sure about standard deviation. You could always take the wasy way out, add get Excel to do it for you.:D

chrisf
Aug 15th, 2001, 08:06 AM
This will return the standard deviation:

Function StDev(ByRef Nums() As Double, ByRef Mean As Double) As Double
Dim i As Long
Dim Temp As Double
Dim Temp2 As Double
For i = 1 To UBound (Nums)
Temp2 = Nums(i) - Mean
Temp = Temp + Temp2 * Temp2
Next
StDev = Sqr(Temp / Ubound (Nums))
End Function

nishantp
Aug 15th, 2001, 01:28 PM
Damn i thought it was harder than that...:D

Kaverin
Aug 15th, 2001, 04:15 PM
You need to be careful of which S formula you use. This probably isn't much of a concern in your case, but there is a formula for treating the data as a population, and one for treating it as a sample of a larger population. The difference is in the set where you calculate the variance (variance = standard_deviation^2). For a population, you divide by the number of items in the set, but for a sample, you divide by the number - 1. The effects to the final S appear slight, but the impact can be large depending on the data itself.

I can't tell in those two blocks which is being used, as it depends on whether or not those arrays were 0 or 1 based. If they were 0 based, UBound gives the N-1 value, and will throw off the mean as well. I posted a module a few weeks ago in another forum that you may want to look at. I had it track some other things and save them into a UDT.

http://www.vbforums.com/showthread.php?s=&threadid=89377&highlight=standard

If you want to use this:

Dim Info As STAT_ANALYSIS
Info = Analyze(yourdataarray, True) 'treat as a whole population

Then Info contains information about the set of data.

Muddy
Aug 16th, 2001, 11:22 PM
Im not sure how picky you need to be, but to properly find the mean and std dev of a set of numbers, you have to first fit the data to several types of distributions (Normal, LogNormal, Weibull ...) using whatever method you prefer (rank regression, Maximum Likelyhood Estimation) then apply some tests to see which distribtion is the best fit. Then you can use the parameters of the "best fit" to ascertain the mean and standard deviation.

You can see big errors from data sets sampled from distributions other than Normal and Uniform (Weibull, LogNormal) if you just divide the sum of the values by "n" to find the mean.

Guv
Aug 17th, 2001, 11:08 AM
Muddy: While not always meaningful, the mean and standard deviation are defined for any set of numerical data. No analysis is required to determine them.

BTW: The posted formula for the standard deviation is not generally used in practice to compute it. There is an equivalent formula which uses the sum and the sum of the squares of the variables. It allows the use of one loop to calculate the two sums, instead of requiring one loop to calculate the mean and another to calculate the deviation.

I do not remember the formula, but it might be the following or something similar. If I had to write a program, I would look it up in some statistics text.

Mean = Sum/N
Variance = SumSquares/N - Mean^2
Deviation = SquareRoot(Variance).

Kaverin
Aug 17th, 2001, 11:42 AM
In the module in that link I posted, I get the parts needed for S in a single loop.

variance = (sumofthesquareofeachelement - (squareofsumofelements / N)) / N
stdev = sqr(variance)
where the bold N means the set was treated as a population, and N - 1 would be used for a sample.

Muddy
Aug 17th, 2001, 01:51 PM
Originally posted by Guv
Muddy: While not always meaningful, the mean and standard deviation are defined for any set of numerical data. No analysis is required to determine them.

You have to know the parent distribution of the data (or at least approximate it) to properly calculate the mean. Sum/N works if the parent population is normal or exponential. This may or may not be a good assumption.

So if you are looking at experimental error or human life expectancy you are probably OK. If you are looking at a steep wear out phenomena then you could be very wrong.

At least this is the way I understand it.

Guv
Aug 17th, 2001, 04:12 PM
Kaverin: Your formula looks equivalent to the one I posted. Your post makes me feel better about my formula.

Muddy: It is difficult for me to understand what you are saying. Given N values, the mean is clearly defined as the sum of the N values divided by N. Are you aware that mean is statistician jargon for average?

Can you tell me another formula that is applicable to some set of values, and why it would be used?

I am aware that for some sets of data, the mean is not a representative value. Some small colleges quote the average income of the last five graduating classes as an inducement to enrolling. If a few graduates are from incredibly wealthy families, the mean is misleading. To take an extreme case, consider 99 people who are unemployed and one with an income of ten million per year. Mean income is one hundred thousand, which is misleading. The fact that it is misleading does not suggest that it is not the mean.

Muddy
Aug 17th, 2001, 04:53 PM
Originally posted by Guv
Kaverin: Your formula looks equivalent to the one I posted. Your post makes me feel better about my formula.

Muddy: It is difficult for me to understand what you are saying. Given N values, the mean is clearly defined as the sum of the N values divided by N. Are you aware that mean is statistician jargon for average?

Can you tell me another formula that is applicable to some set of values, and why it would be used?

Actually its kind of a moot point (Im sure that this is not a problem (probably) in this case). Remember I said in my first post that I was getting picky with this.

Say you have a group of bearings that have an increasing failure rate with respect to time. You record failures on them for the first hundred hours of operation and calculate a mean time to failure by dividing the total run time by the number of failures.

Now because you only sampled early in bearing life you would have an optimistic estimate of mean time to failure. A curve fitting method would recognize the increasing failure rate (with respect to time) and give you a mean that more accurately estimates the mean life over an infinite time iterval. I have seen cases with substantial differences.

Prolly nothing to do with this guy's problem and to be honest I am not sure why I even brought it up.