![]() ![]() To estimate the population standard deviation of y, σ, use the standard deviation of the residuals, s. The slope b and intercept a of the least-squares line estimate the slope β and intercept α of the population (true) regression line. Random The data are produced from a well-designed random sample or randomized experiment.Equal variance The standard deviation of the y values is equal for each x value.Normal The y values are distributed normally for any value of x.Independent The residuals are assumed to be independent.Linear In the population, there is a linear relationship that models the average value of y for different values of x.Linear regression is a procedure for fitting a straight line of the form ŷ = a + bx to data. The data are produced from a well-designed, random sample or randomized experiment.The residual errors are mutually independent (no pattern).In other words, each of these normal distributions of y values has the same shape and spread about the line. The standard deviations of the population y values about the line are equal for each value of x.Assumption (1) implies that these normal distributions are centered on the line: the means of these normal distributions of y values lie on the line. This implies that there are more y values scattered closer to the line than are scattered farther away. The y values for any particular x value are normally distributed about the line.Our regression line from the sample is our best estimate of this line in the population.) (We do not know the equation for the line for the population. In other words, the expected value of y for each particular value lies on a straight line in the population. There is a linear relationship in the population that models the average value of y for varying values of x.The assumptions underlying the test of significance are: Examining the scatterplot and testing the significance of the correlation coefficient helps us determine if it is appropriate to do this. We want to use this best-fit line for the sample as an estimate of the best-fit line for the population. The regression line equation that we calculate from the sample data gives the best-fit line for our particular sample. We are examining the sample to draw a conclusion about whether the linear relationship that we see between x and y in the sample data provides strong enough evidence so that we can conclude that there is a linear relationship between x and y in the population. We have not examined the entire population because it is not possible or feasible to do so. ![]() The premise of this test is that the data are a sample of observed points taken from a larger population. Testing the significance of the correlation coefficient requires that certain assumptions about the data are satisfied. If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is “significant.”Īssumptions in Testing the Significance of the Correlation Coefficient We decide this based on the sample correlation coefficient r and the sample size n. The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is “close to zero” or “significantly different from zero”. r = sample correlation coefficient (known calculated from sample data).ρ = population correlation coefficient (unknown).The symbol for the population correlation coefficient is ρ, the Greek letter “rho.”.The sample correlation coefficient, r, is our estimate of the unknown population correlation coefficient. But because we have only have sample data, we cannot calculate the population correlation coefficient. If we had data for the entire population, we could find the population correlation coefficient. The sample data are used to compute r, the correlation coefficient for the sample. We perform a hypothesis test of the “significance of the correlation coefficient” to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population. We need to look at both the value of the correlation coefficient r and the sample size n, together. However, the reliability of the linear model also depends on how many observed data points are in the sample. The correlation coefficient, r, tells us about the strength and direction of the linear relationship between x and y. Testing the Significance of the Correlation Coefficient ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |