Results 1 to 11 of 11

Thread: mean and standard deviation

  1. #1

    Thread Starter
    Lively Member
    Join Date
    Aug 2001
    Location
    cornwall, England
    Posts
    110

    Unhappy mean and standard deviation

    hi , im new to vb6 and need help in writing a program to calculate mean and standard deviation, basic terminology please

    reply to [email protected]
    Last edited by tink; Aug 14th, 2001 at 10:34 AM.

  2. #2
    Frenzied Member nishantp's Avatar
    Join Date
    Jan 2001
    Location
    Where you least expect me to be
    Posts
    1,375
    Well Mean is very simple...suppose you have an array of numbers:
    VB Code:
    1. Function Mean(ByRef Nums() As Double) As Double
    2. Dim i As Long
    3. Dim Temp As Double
    4. For i = 1 To UBound(Nums)
    5.     Temp = Temp + Nums(i)
    6. Next
    7. Mean = Temp / (UBound(Nums))
    8. End Function
    However im not sure about standard deviation. You could always take the wasy way out, add get Excel to do it for you.
    You just proved that sig advertisements work.

  3. #3
    Addicted Member
    Join Date
    Mar 2001
    Posts
    157
    This will return the standard deviation:

    Function StDev(ByRef Nums() As Double, ByRef Mean As Double) As Double
    Dim i As Long
    Dim Temp As Double
    Dim Temp2 As Double
    For i = 1 To UBound (Nums)
    Temp2 = Nums(i) - Mean
    Temp = Temp + Temp2 * Temp2
    Next
    StDev = Sqr(Temp / Ubound (Nums))
    End Function

  4. #4
    Frenzied Member nishantp's Avatar
    Join Date
    Jan 2001
    Location
    Where you least expect me to be
    Posts
    1,375
    Damn i thought it was harder than that...
    You just proved that sig advertisements work.

  5. #5
    Fanatic Member Kaverin's Avatar
    Join Date
    Oct 2000
    Posts
    930
    You need to be careful of which S formula you use. This probably isn't much of a concern in your case, but there is a formula for treating the data as a population, and one for treating it as a sample of a larger population. The difference is in the set where you calculate the variance (variance = standard_deviation^2). For a population, you divide by the number of items in the set, but for a sample, you divide by the number - 1. The effects to the final S appear slight, but the impact can be large depending on the data itself.

    I can't tell in those two blocks which is being used, as it depends on whether or not those arrays were 0 or 1 based. If they were 0 based, UBound gives the N-1 value, and will throw off the mean as well. I posted a module a few weeks ago in another forum that you may want to look at. I had it track some other things and save them into a UDT.

    http://www.vbforums.com/showthread.p...light=standard

    If you want to use this:
    VB Code:
    1. Dim Info As STAT_ANALYSIS
    2. Info = Analyze(yourdataarray, True) 'treat as a whole population
    Then Info contains information about the set of data.
    Last edited by Kaverin; Aug 15th, 2001 at 04:29 PM.
    I'm baaaack...
    VB5 Professional Edition, VC++ 6
    Using a 1 gHz Thunderbird, 256 mb RAM, 40 gb HD system with Win98se

    I feel special because I finally figured out how to loop midis: Post link
    I'm a fanatic too

  6. #6
    PowerPoster
    Join Date
    Feb 2001
    Location
    Crossroads
    Posts
    3,046
    Im not sure how picky you need to be, but to properly find the mean and std dev of a set of numbers, you have to first fit the data to several types of distributions (Normal, LogNormal, Weibull ...) using whatever method you prefer (rank regression, Maximum Likelyhood Estimation) then apply some tests to see which distribtion is the best fit. Then you can use the parameters of the "best fit" to ascertain the mean and standard deviation.

    You can see big errors from data sets sampled from distributions other than Normal and Uniform (Weibull, LogNormal) if you just divide the sum of the values by "n" to find the mean.
    Last edited by Muddy; Aug 17th, 2001 at 12:05 AM.

  7. #7
    Frenzied Member
    Join Date
    Jul 1999
    Location
    Huntingdon Valley, PA 19006
    Posts
    1,151

    Well defined.

    Muddy: While not always meaningful, the mean and standard deviation are defined for any set of numerical data. No analysis is required to determine them.

    BTW: The posted formula for the standard deviation is not generally used in practice to compute it. There is an equivalent formula which uses the sum and the sum of the squares of the variables. It allows the use of one loop to calculate the two sums, instead of requiring one loop to calculate the mean and another to calculate the deviation.

    I do not remember the formula, but it might be the following or something similar. If I had to write a program, I would look it up in some statistics text.

    Mean = Sum/N
    Variance = SumSquares/N - Mean^2
    Deviation = SquareRoot(Variance).
    Live long & prosper.

    The Dinosaur from prehistoric era prior to computers.

    Eschew obfuscation!
    If a billion people believe a foolish idea, it is still a foolish idea!
    VB.net 2010 Express
    64Bit & 32Bit Windows 7 & Windows XP. I run 4 operating systems on a single PC.

  8. #8
    Fanatic Member Kaverin's Avatar
    Join Date
    Oct 2000
    Posts
    930
    In the module in that link I posted, I get the parts needed for S in a single loop.
    Code:
    variance = (sumofthesquareofeachelement - (squareofsumofelements / N)) / N
    stdev = sqr(variance)
    where the bold N means the set was treated as a population, and N - 1 would be used for a sample.
    I'm baaaack...
    VB5 Professional Edition, VC++ 6
    Using a 1 gHz Thunderbird, 256 mb RAM, 40 gb HD system with Win98se

    I feel special because I finally figured out how to loop midis: Post link
    I'm a fanatic too

  9. #9
    PowerPoster
    Join Date
    Feb 2001
    Location
    Crossroads
    Posts
    3,046

    Re: Well defined.

    Originally posted by Guv
    Muddy: While not always meaningful, the mean and standard deviation are defined for any set of numerical data. No analysis is required to determine them.
    You have to know the parent distribution of the data (or at least approximate it) to properly calculate the mean. Sum/N works if the parent population is normal or exponential. This may or may not be a good assumption.

    So if you are looking at experimental error or human life expectancy you are probably OK. If you are looking at a steep wear out phenomena then you could be very wrong.

    At least this is the way I understand it.

  10. #10
    Frenzied Member
    Join Date
    Jul 1999
    Location
    Huntingdon Valley, PA 19006
    Posts
    1,151
    Kaverin: Your formula looks equivalent to the one I posted. Your post makes me feel better about my formula.

    Muddy: It is difficult for me to understand what you are saying. Given N values, the mean is clearly defined as the sum of the N values divided by N. Are you aware that mean is statistician jargon for average?

    Can you tell me another formula that is applicable to some set of values, and why it would be used?

    I am aware that for some sets of data, the mean is not a representative value. Some small colleges quote the average income of the last five graduating classes as an inducement to enrolling. If a few graduates are from incredibly wealthy families, the mean is misleading. To take an extreme case, consider 99 people who are unemployed and one with an income of ten million per year. Mean income is one hundred thousand, which is misleading. The fact that it is misleading does not suggest that it is not the mean.
    Live long & prosper.

    The Dinosaur from prehistoric era prior to computers.

    Eschew obfuscation!
    If a billion people believe a foolish idea, it is still a foolish idea!
    VB.net 2010 Express
    64Bit & 32Bit Windows 7 & Windows XP. I run 4 operating systems on a single PC.

  11. #11
    PowerPoster
    Join Date
    Feb 2001
    Location
    Crossroads
    Posts
    3,046
    Originally posted by Guv
    Kaverin: Your formula looks equivalent to the one I posted. Your post makes me feel better about my formula.

    Muddy: It is difficult for me to understand what you are saying. Given N values, the mean is clearly defined as the sum of the N values divided by N. Are you aware that mean is statistician jargon for average?

    Can you tell me another formula that is applicable to some set of values, and why it would be used?
    Actually its kind of a moot point (Im sure that this is not a problem (probably) in this case). Remember I said in my first post that I was getting picky with this.

    Say you have a group of bearings that have an increasing failure rate with respect to time. You record failures on them for the first hundred hours of operation and calculate a mean time to failure by dividing the total run time by the number of failures.

    Now because you only sampled early in bearing life you would have an optimistic estimate of mean time to failure. A curve fitting method would recognize the increasing failure rate (with respect to time) and give you a mean that more accurately estimates the mean life over an infinite time iterval. I have seen cases with substantial differences.

    Prolly nothing to do with this guy's problem and to be honest I am not sure why I even brought it up.
    Last edited by Muddy; Aug 17th, 2001 at 05:00 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width