Results 1 to 8 of 8

Thread: Any Statisticians here?

  1. #1

    Thread Starter
    Hyperactive Member Foxer's Avatar
    Join Date
    Oct 2001
    Location
    Australia
    Posts
    278

    Any Statisticians here?

    Statiistics are not my strong point but I've got a bunch of figures here and I need to be able to predict the next N numbers with X % of accuracy.

    If I had a series of numbers :-

    20
    25
    21
    22
    20
    26
    22
    24
    19
    21
    22

    whatever, is there some formula that tells me
    "Based on the last x numbers, there is a y% chance the next number will be Z." or
    "Based on the last x numbers, there is a y% chance the next number will be in the range of Z -> W."

    Note the numbers are not entirely random. The number before has some influence on the number that follows. Given that premise, I'm trying to determine to what degree that influence is and if I can use that to predict the next number and how accurate that prediction will be.

    Oh, and if I do pick the next lottery numbers, I'll give you a big reward (j/k)
    Rate my response if I helped

    Go Hard Or Go Home


  2. #2
    Fanatic Member alkatran's Avatar
    Join Date
    Apr 2002
    Location
    Canada
    Posts
    860
    Z =~ 2*23 - X + random
    Don't pay attention to this signature, it's contradictory.

  3. #3
    I don't do your homework! opus's Avatar
    Join Date
    Jun 2000
    Location
    Good Old Europe
    Posts
    3,863
    Looking at those numbers it seems like they have some sort of Median Or middle or whatever you want to call it.
    Each tiem the old number is 22 or above the next one is lower and if it is 22 or below the next is greater.
    The problem is to determine the range the next step could take.
    From your example the steps are from 1 to 6, but we do not really have enough example to come up with a good statistic.
    You're welcome to rate this post!
    If your problem is solved, please use the Mark thread as resolved button


    Wait, I'm too old to hurry!

  4. #4

    Thread Starter
    Hyperactive Member Foxer's Avatar
    Join Date
    Oct 2001
    Location
    Australia
    Posts
    278
    The numbers are not random, but there is no pattern to them either.

    The numbers represent thousands of widgets produced by my company each month. We are trying to analyse if high output one month results in an equally high output the month after, or does burn out/sickness/absenteeism produce lower figures the following months. We are trying to better predict our forecast sales.

    I'm looking for a formula (or formula's) that considers the data on hand, considers how wildly it fluctuates, and then assuming a certain degree of uncertainty (margin of error), predict what the next month's output will likely be.

    It's almost an average. Perhaps a better question would be "given the available data, what are the chances of the next number being the same as the average for the specified range?"
    Rate my response if I helped

    Go Hard Or Go Home


  5. #5
    Frenzied Member yrwyddfa's Avatar
    Join Date
    Aug 2001
    Location
    England
    Posts
    1,253
    Perhaps a polynomial trend line would help out . . .

    http://www.vbforums.com/showthread.p...76#post1830676

    I haven't done it yet but you can perform an analysis of the best estimates of y, and get a confidence value . . .

    Then again some 'random' datasets are intractable to analysis

  6. #6
    Member
    Join Date
    Oct 2005
    Posts
    38

    Re: Any Statisticians here?

    Gary,

    The first thing I would say about your approach is that I must assume that you do not have an equal number of production hours each month, (ie holidays, weekends, vacation trends...) and you would apparently not want these factors affecting what you are trying to deduce. I also assume that there is some trend in the data, hopefully, you are always producing more widgets! And I assume that there are both cyclical and seasonal factors. Take car production, for example....if you analyze the data over decades, you will find a trend, a cycle, a seasonal and a random component to their production.

    So, I would "normalize" my production number by some measure of available production...paid production worker hours (not actual pay which has inflation and overtime issues...just the hours spent producing), perhaps? After all, if I make 22k widgets on 2200 hours month 1 but 23k widgets on 2400 hours month 2 I actually have a DECREASE in productivity.

    Then you would have 22/2200 = .010000 for month 1 and 23/2400 = .009583 for month 2. This would theoretically, eliminate the need to "seasonally adjust" your numbers. This would also probably eliminate the need to correct for a "trend" or "cycle" in the data. Because now you are no longer talking about "production", you are instead talking about "productivity".

    Once your data is normalized, I would say that although it is appealing, it is probably counter-productive to try to come up with an equation that predicts productivity for the next month based on prior months, no matter how much data you have. This is something of a theoretical issue.

    If we normalize the data and therefore do not have a trend or cyclical/seasonal effect in the data, we should be left with only productivity itself and random influences.

    If you want to determine if one month's productivity might affect another month there are a number of "nonparametric" statistics that you might employ based on the nature of your situation.

    First, we need to establish what your "Null Hypothesis" is. Then we gather statistics to see if we have enough evidence to reject that hypothesis.

    It sounds like one question you are attempting to answer is: "Does high productivity in a given month lead to higher or lower productivity in the following month?"

    I might try something like a simple "runs" test where I arrange my normalized data in order by month, then I assign a + if the productivity is up from the previous month or - if it is down.

    Then you state the Null Hypothesis that "We assume that productivity is random (there will be no patterns to the +- data), is there sufficient evidence to conclude that we must reject that hypothesis?"

    Then you count the number of "runs" in the data. (ie in +++--++-+---- there are 6 runs out of a total possible 13 runs). If the number of runs is fewer than what you would expect by chance, you would say that you have enough evidence to reject the null hypothesis and conclude that there is some dependence between months. If the number of runs is too close to random chance, you have to accept the null hypothesis, you just don't have enough evidence to the contrary to convince the jury.

    If you need some help with analyzing the data once you determine the runs, post your results here..or google "Non-Parametric Statistics Runs Test".

    -MagicT
    MagicT

  7. #7
    Addicted Member Rassis's Avatar
    Join Date
    Jun 2004
    Location
    Lisbon
    Posts
    248

    Re: Any Statisticians here?

    Hy Gary,

    The "nonparametric" stats method that I use to check whether a bunch of data is random or not (based on the number of times data change chronologically from increasing to decreasing), tells me that the data are random despite the fact that I should have 26 observations at least to be sufficiently confident on the conclusion. On the other hand, an attempt could be made to find a trend and a cyclical effect using the Holt-Winter formulas, but data are just a few for the conclusions to be statistically significant. If this were not the case, you could estimate the value of the next observation within an interval given a certain level of confidence.
    I am sorry if I can’t provide you more help.
    Last edited by Rassis; Nov 27th, 2005 at 09:38 AM.
    ...este projecto dos Deuses que os homens teimam em arruinar...

  8. #8
    Addicted Member Rassis's Avatar
    Join Date
    Jun 2004
    Location
    Lisbon
    Posts
    248

    Re: Any Statisticians here?

    There is another "nonparametric" stats method, known as the Spearman correlation test method, which might be of help in the case you are suspicious that a relation of cause-effect exists between your data above (effect) and other data (cause). If you trace the behaviour of the later, an estimate of the prime could be achieved.
    ...este projecto dos Deuses que os homens teimam em arruinar...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width