PDA

Click to See Complete Forum and Search --> : Least squares line of regression


th_05
Apr 16th, 2010, 08:55 AM
Hi folks, not sure I'm in the right topic but here goes. Im massively stuck on working out this least sqaures line of regression question, any help will be greatly appreciated (im using E is my symbol as I have no idea how to type the correct one! and ^2 means sqaured)



EX=63 EY=2340 EX^2=535 EXY=19792 EY^2=752388

Calculate the least squares line of regression of Y on X in the form Y=a+bX and the correlation coefficient r. Show your working.

I have the following formula for 'b' but it doesnt seem to work and gives me a high figure of something like 35.09. (n = 8 by the way!)

nExy-ExEy / nEx^2-(Ex)^2

jemidiah
Apr 17th, 2010, 12:22 AM
It's been a while since I've done stats, but your formula looks correct from this page (http://en.wikipedia.org/wiki/Linear_least_squares#Computation). I'm interpreting the E's as capital sigmas, i.e. summation symbols from i = 1 to n, instead of as expectation values. I also get your ~35.09 value for b. Intuitively, X doesn't seem to be varying nearly as much as Y, since the variance of X is only about 17.64 while the variance of Y is about 737.2, a factor of ~42 apart. This difference, depending on the exact distribution of the points, could be an approximate upper bound on the slope of the regression. If the regression fits quite well, you would expect the slope to be near this value.

Do you have a reason to distrust a slope of ~35.09?


The Wikipedia page lists another similar formula for the y-intercept a. As for the correlation coefficient, you might have to make some assumptions about the distribution of your errors to work out a formula for that with the given information, which is probably done in your book.

th_05
Apr 18th, 2010, 07:02 AM
Thanks for that, the reason I dont trust that answer is because when I use the forumula for a:

a = EY/n - bEX/2

Then I get 2340/8 - 35.1 (b) * 63/2

292.5 - 1105.65 = -813.15

This surely can't be correct for a??

jemidiah
Apr 18th, 2010, 04:27 PM
I believe your formula for a is wrong. The Wikipedia page lists, using your notation,

a = EY/n - bEX/n

Using this, I get a ~= 16.17, which seems fine.

th_05
Apr 18th, 2010, 05:37 PM
I believe your formula for a is wrong. The Wikipedia page lists, using your notation,

a = EY/n - bEX/n

Using this, I get a ~= 16.17, which seems fine.

A-ha! That will be where I am going wrong. Thanks very much thats fantastic.

th_05
Apr 19th, 2010, 10:11 AM
I guess I really should have paid attention in regression classes. :D I have one more question on the topic. Following the calculation, I am asked:

'Interpret the value of b in the context of this situation and explain why it is not sensible to interpret a?'

Any help would be great, thanks.

jemidiah
Apr 19th, 2010, 06:21 PM
I can make guesses, but they're only guesses.

You should probably find that the correlation coefficient is close to 1. That means that, as x changes, y changes, and that the relationship is very linear--if x changes some amount dx, y will change some amount dy, where dy is proportional to dx. The constant of proportionality is basically just b: at least, dy/dx = b. So, b is quite well-known and you can be confident that your two variables, x and y, are in fact related in this way, through b.

It's not terribly sensible to interpret a since it has a high error margin. The slope is large compared to the computed value of a, and x varies over a long distance not centered around 0. If the slope were just slightly off, your a value would be wildly different.

th_05
Apr 20th, 2010, 06:38 AM
I can make guesses, but they're only guesses.

You should probably find that the correlation coefficient is close to 1. That means that, as x changes, y changes, and that the relationship is very linear--if x changes some amount dx, y will change some amount dy, where dy is proportional to dx. The constant of proportionality is basically just b: at least, dy/dx = b. So, b is quite well-known and you can be confident that your two variables, x and y, are in fact related in this way, through b.

It's not terribly sensible to interpret a since it has a high error margin. The slope is large compared to the computed value of a, and x varies over a long distance not centered around 0. If the slope were just slightly off, your a value would be wildly different.

I think its finally gone in and I understand it. :) Thanks very much for all your help, it has been priceless!