A horse race is going to take place with six runners.
The race is over 5 furlongs (1000 meters) and for each of the six contestants it is known that their probable times at this distance are:

But, as is always the case in horse races, these times are uncertain, so the outcome is unknown.
In fact each of the above times is accurate by plus or minus 0.50 seconds, i.e. for horse "1" there is a Gaussian distribution with mean 57.00 and standard deviation 0.5, for horse "2" there is a Gaussian with mean 57.20 and st. dev. 0.5 and so on.

What is the probability for each horse to win the race ?

There is an easy (but may be a little slow) answer by Monte Carlo simulation using random numbers but it's not what I 'm asking for.
Does anyone know a functional approximation for the winner's pdf ?

If you compare each horse with the most probable winner (horse 6), you get the following chances that they can win the race (or that the horse 6 looses to each one of the other horses):

What are these formulas ?
I don't understand.
Are you trying to do it pairwise ? I don't think it works.

Probability of horse 1 should be of the order of 40% and probability of horse 6 less than 1% - and they should add up to 100% of course.
You can sample numbers from the normal distribution with mean 1 and std 1 and change their scale - it's in all mathematical handbooks - to find the numerical answer.
There is a so called Weibull distribution - may be it approximates the solution somewhat.

You are right, I solved the problem incorrectly – I took mistakenly horse 6 as the most probable winner instead of horse 1.

The formulas that I used come from the difference of two Normal distributions – one representing random times taken by horse # and the other representing random times taken by horse 1 (the difference between two Normal distributions of the variables X and Y is still a Normal distribution, whose mean is equal to the difference of the means of X and Y and whose standard deviation is equal to the square root of the sum of the squares of the standard deviations of X and Y).

As there is no closed form in the case of the Normal distribution, I opted for an easier way, which consisted of the built-in Excel function NORMDIST(x;mean;standard deviation;1).

The solutions are:

If you compare each horse with the most probable winner (horse 1), you get the following chances that they can win the race (or that the horse 1 looses to each one of the other horses):

I 'm going to check what you 've done.
But did you try to verify your method using random number simulation ?
Random number simulation gives the correct answer to say one decimal place (it's drawback being that it is slow).
Another application of this one is in component failure statistics.

And hey, those numbers of yours add to 0.88801 not 1 (?)

RANDOM NUMBERS
------------------
To generate normally distributed numbers with mean zero and standard deviation 1, first generate two uniformly distributed random numbers x,y in the interval [0,1] (using the rnd function).

The normally distributed number is computed by the formula:

u = sqrt (- 2 . log(x) ) . cos ( 2 . π . y )

(π is pi, in case the Greek letter is not visible to you).

u is adjusted to mean m and standard deviation s by:

I 'm going to check what you 've done.
But did you try to verify your method using random number simulation ?
Random number simulation gives the correct answer to say one decimal place (it's drawback being that it is slow).
Another application of this one is in component failure statistics.

And hey, those numbers of yours add to 0.88801 not 1 (?)

RANDOM NUMBERS
------------------
To generate normally distributed numbers with mean zero and standard deviation 1, first generate two uniformly distributed random numbers x,y in the interval [0,1] (using the rnd function).

The normally distributed number is computed by the formula:

u = sqrt (- 2 . log(x) ) . cos ( 2 . π . y )

(π is pi, in case the Greek letter is not visible to you).

u is adjusted to mean m and standard deviation s by:

The random number program was n't terribly slow but I don't like it all that much.

Hello, I am studying Monte Carlo simulation and found this post interesting. Could you mind to show me how to get 0.47...0.29.. etc? Sorry that I am very new in this area.

I also used simulation along with the numerical method that I attached in my previous post and, of course, both results coincide. The Box-Muller method to generate random values from a Normal distribution SQRT[- 2 ln r1.cos(2.pi.r2)], where r1 and r2 are random numbers (0 <= r <= 1), can be used but, if you choose Excel to perform the simulation, you can use the built-in function NORMINV(RAND(); mean; standard deviation) instead.

Doing this you will come to the same results as mine.

...este projecto dos Deuses que os homens teimam em arruinar...

I am sorry but I didn’t read your post to the end and missed seeing the solutions you got from simulation. I know now what type of answer you were expecting. Yes I agree with it. I constructed a simple Monte-Carlo simulation model in Excel myself and I arrived to the same results as you did. But my aim is to reach the very same results analytically. I am still working on it. I will let you know as soon as I succeed.

...este projecto dos Deuses que os homens teimam em arruinar...

I took a look to the discussion between you and EnumaElish on the Physics forum and couldn’t see any clues on how to proceed. The calculation of the integrals of the Gaussian function presents no problem if you use the CDF function built in Excel. What I can’t understand is the EnumaElish´s attempt to integrate the Gaussian function between 0 and t (what limit is this?).

The only thing I managed so far was the probabilities of each horse to loose to any of the others (confirmed by simulation). I attach a table with these. I am almost certain that probabilities of each horse to win the race can be derived from these. What do you think?

Last edited by Rassis; Sep 7th, 2005 at 02:52 AM.

...este projecto dos Deuses que os homens teimam em arruinar...

I finally reached an analytical solution. Because it is too lengthy in the case of six horses, let me simplify it to three horses only (the first three).

The solution can be found in the attached Excel file. If you feel in the mood to apply the method to the six horses, please feel free! (it will take you quite a while for sure).

Thanks for the opportunity to review the probabilities theory.

Last edited by Rassis; Sep 10th, 2005 at 06:07 PM.

...este projecto dos Deuses que os homens teimam em arruinar...

First you perform pairwise comparisons and calculate the probabilities that every horse looses (takes more time) to any one of the others. Call A, B, C, D, E and F to the horses for notation purposes and you get:

You do this the way I did in the Excel file attached to my post dated 09-01-05 for horse 1 (A) and will reach the results that I showed in the Excel file attached to my post dated 09-03-05 for all the horses.

Then you calculate the probabilities that each horse is the absolute 1st, 2nd, 3rd, 4th, 5th and 6th. For instance, the probability that horse 1 (A) is second is equal to the probability that it takes more time than horse 2 (B) and less time than any one of the others or that it takes more time than horse 3 (C) and less time than any one of the others and so forth.

To know the probability of horse 1 (A) to be the absolute third, it is far more complicated. While in the previous case you have 5 terms in the equation, now you have 24 terms.

It would be too monotonous and boring to write down all the equations and perform all the calculations. That is why I decided to simplify the case to three horses only.

...este projecto dos Deuses que os homens teimam em arruinar...

So you are in effect saying all the possible orderings have to be considered separately (n! cases) and perform the integrations (exp x erf).
That looks likely.
But where is the formula in those xls files ? Is there somewhere else I have to look ?

I 've been using an approximation like

p = C . exp ( - a . (T - Tbest) ^ n ) --------------------------------- (1)

where a, n are experimental constants, C is normalization constant.

Maybe a somewhat better approximation can be got by fixing 3 parameters,
which might take more time:

First sort the means is ascending order, then do:

P(i) = (1 - P(i-1) - p(i-2) - ... - P(1)) x PRODUCT OF ( 1 - a x exp (-b . SUM OF (t(i) - t(j) ) ^ c ) ^ d

In this the product is from i + 1 to n and the sum is from i + 1 to n also.
There appear to be four constants here, a-b-c-d, but one is eliminated by the condition that if all t's under the sum are equal then PRODUCT = 1 / (n - i)

Maybe the improvement by taking it further than eq. (1) is marginal.

Finally, highbeam.com (a subscription site) may have something on this.

So you are in effect saying all the possible orderings have to be considered separately (n! cases) and perform the integrations (exp x erf).
That looks likely.
But where is the formula in those xls files ? Is there somewhere else I have to look ?

I didn’t perform the integrations directly but I used the Excel function NORMDIST instead, as I described in my two posts both dated 09-01-05. I used this function in cells J6, J7, J9, J11, J13 and J14 in the Excel file named Horse race_7 which I attached yesterday.

With regard to the numerical method that you describe, I can see no reason to follow that way despite...that it might take less time? But it is an approximation and you have to find out the values of all those experimental constants…I wouldn’t go that way. There are instances when numerical methods are superior and practically unavoidable, but it doesn’t look so to me this time.

...este projecto dos Deuses que os homens teimam em arruinar...