Any Statisticians here?

**Foxer** · Oct 19th, 2004, 02:07 PM

Statiistics are not my strong point but I've got a bunch of figures here and I need to be able to predict the next N numbers with X % of accuracy.

If I had a series of numbers :-

20
25
21
22
20
26
22
24
19
21
22

whatever, is there some formula that tells me
"Based on the last x numbers, there is a y% chance the next number will be Z." or
"Based on the last x numbers, there is a y% chance the next number will be in the range of Z -> W."

Note the numbers are not entirely random. The number before has some influence on the number that follows. Given that premise, I'm trying to determine to what degree that influence is and if I can use that to predict the next number and how accurate that prediction will be.

Oh, and if I do pick the next lottery numbers, I'll give you a big reward (j/k)

**alkatran** · Oct 19th, 2004, 02:29 PM

Z =~ 2*23 - X + random

**opus** · Oct 20th, 2004, 02:21 AM

Looking at those numbers it seems like they have some sort of Median Or middle or whatever you want to call it.
Each tiem the old number is 22 or above the next one is lower and if it is 22 or below the next is greater.
The problem is to determine the range the next step could take.
From your example the steps are from 1 to 6, but we do not really have enough example to come up with a good statistic.

**Foxer** · Oct 20th, 2004, 11:12 PM

The numbers are not random, but there is no pattern to them either.

The numbers represent thousands of widgets produced by my company each month. We are trying to analyse if high output one month results in an equally high output the month after, or does burn out/sickness/absenteeism produce lower figures the following months. We are trying to better predict our forecast sales.

I'm looking for a formula (or formula's) that considers the data on hand, considers how wildly it fluctuates, and then assuming a certain degree of uncertainty (margin of error), predict what the next month's output will likely be.

It's almost an average. Perhaps a better question would be "given the available data, what are the chances of the next number being the same as the average for the specified range?"

**yrwyddfa** · Nov 5th, 2004, 07:27 AM

Perhaps a polynomial trend line would help out . . .

http://www.vbforums.com/showthread.p...76#post1830676

I haven't done it yet but you can perform an analysis of the best estimates of y, and get a confidence value . . .

Then again some 'random' datasets are intractable to analysis

**MagicT** · Nov 25th, 2005, 12:52 AM

Gary,

The first thing I would say about your approach is that I must assume that you do not have an equal number of production hours each month, (ie holidays, weekends, vacation trends...) and you would apparently not want these factors affecting what you are trying to deduce. I also assume that there is some trend in the data, hopefully, you are always producing more widgets! And I assume that there are both cyclical and seasonal factors. Take car production, for example....if you analyze the data over decades, you will find a trend, a cycle, a seasonal and a random component to their production.

So, I would "normalize" my production number by some measure of available production...paid production worker hours (not actual pay which has inflation and overtime issues...just the hours spent producing), perhaps? After all, if I make 22k widgets on 2200 hours month 1 but 23k widgets on 2400 hours month 2 I actually have a DECREASE in productivity.

Then you would have 22/2200 = .010000 for month 1 and 23/2400 = .009583 for month 2. This would theoretically, eliminate the need to "seasonally adjust" your numbers. This would also probably eliminate the need to correct for a "trend" or "cycle" in the data. Because now you are no longer talking about "production", you are instead talking about "productivity".

Once your data is normalized, I would say that although it is appealing, it is probably counter-productive to try to come up with an equation that predicts productivity for the next month based on prior months, no matter how much data you have. This is something of a theoretical issue.

If we normalize the data and therefore do not have a trend or cyclical/seasonal effect in the data, we should be left with only productivity itself and random influences.

If you want to determine if one month's productivity might affect another month there are a number of "nonparametric" statistics that you might employ based on the nature of your situation.

First, we need to establish what your "Null Hypothesis" is. Then we gather statistics to see if we have enough evidence to reject that hypothesis.

It sounds like one question you are attempting to answer is: "Does high productivity in a given month lead to higher or lower productivity in the following month?"

I might try something like a simple "runs" test where I arrange my normalized data in order by month, then I assign a + if the productivity is up from the previous month or - if it is down.

Then you state the Null Hypothesis that "We assume that productivity is random (there will be no patterns to the +- data), is there sufficient evidence to conclude that we must reject that hypothesis?"

Then you count the number of "runs" in the data. (ie in +++--++-+---- there are 6 runs out of a total possible 13 runs). If the number of runs is fewer than what you would expect by chance, you would say that you have enough evidence to reject the null hypothesis and conclude that there is some dependence between months. If the number of runs is too close to random chance, you have to accept the null hypothesis, you just don't have enough evidence to the contrary to convince the jury.

If you need some help with analyzing the data once you determine the runs, post your results here..or google "Non-Parametric Statistics Runs Test".

-MagicT

**Rassis** · Nov 26th, 2005, 04:09 PM

Hy Gary,

The "nonparametric" stats method that I use to check whether a bunch of data is random or not (based on the number of times data change chronologically from increasing to decreasing), tells me that the data are random despite the fact that I should have 26 observations at least to be sufficiently confident on the conclusion. On the other hand, an attempt could be made to find a trend and a cyclical effect using the Holt-Winter formulas, but data are just a few for the conclusions to be statistically significant. If this were not the case, you could estimate the value of the next observation within an interval given a certain level of confidence.
I am sorry if I can’t provide you more help.

**Rassis** · Nov 26th, 2005, 04:20 PM

There is another "nonparametric" stats method, known as the Spearman correlation test method, which might be of help in the case you are suspicious that a relation of cause-effect exists between your data above (effect) and other data (cause). If you trace the behaviour of the later, an estimate of the prime could be achieved.

Thread: Any Statisticians here?

Thread Tools

Display

Any Statisticians here?

Re: Any Statisticians here?

Re: Any Statisticians here?

Re: Any Statisticians here?

Posting Permissions