Adakah yang bisa membantu saya memahami rumus korelasi Pearson? sampel = mean dari produk dari nilai standar variabel dan .
Saya agak mengerti mengapa mereka perlu membuat standar dan , tetapi bagaimana memahami produk dari kedua skor z?
Formula ini juga disebut "koefisien korelasi momen-produk", tetapi apa alasan tindakan produk itu? Saya tidak yakin apakah pertanyaan saya sudah jelas, tetapi saya hanya ingin mengingat formula secara intuitif.
correlation
descriptive-statistics
pearson-r
Aaron Lu
sumber
sumber
Jawaban:
Dalam komentar, 15 cara untuk memahami koefisien korelasi disarankan:
13 cara yang dibahas dalam artikel Rodgers dan Nicewander (The American Statistician, Februari 1988) adalah
Fungsi Skor dan Cara Mentah,
Kovarian Standar,
di mana adalah kovarians sampel dan s X dan s Y adalah standar deviasi sampel.sXY sX sY
Kemiringan Standar Jalur Regresi,
di mana dan b X ⋅ Y adalah kemiringan garis regresi.bY⋅X bX⋅Y
Mean Geometris dari Dua Lereng Regresi,
Akar Kuadrat dari Rasio Dua Varian (Proporsi Variabilitas Disumbang),
Produk Lintas Rata-Rata dari Variabel Standar,
A Function of the Angle Between the Two Standardized Regression Lines. The two regression lines (ofY vs. X and X vs. Y ) are symmetric about the diagonal. Let the angle between the two lines be β . Then
A Function of the Angle Between the Two Variable Vectors,
A Rescaled Variance of the Difference Between Standardized Scores. LettingzY−zX be the difference between standardized X and Y variables for each observation,
Estimated from the "Balloon" Rule,
whereH is the vertical range of the entire X−Y scatterplot and h is the range through the "center of the distribution on the X axis" (that is, through the point of means).
In Relation to the Bivariate Ellipses of Isoconcentration,
whereD and d are the major and minor axis lengths, respectively. r also equals the slope of the tangent line of an isocontour (in standardized coordinates) at the point the contour crosses the vertical axis.
A Function of Test Statistics from Designed Experiments,
wheret is the test statistic in a two-independent sample t test for a designed experiment with two treatment conditions (coded as X=0,1 ) and n is the combined total number of observations in the two treatment groups.
The Ratio of Two Means. Assume bivariate normality and standardize the variables. Select some arbitrarily large valueXc of X . Then
(Most of this is verbatim, with very slight changes in some of the notation.)
Some other methods (perhaps original to this site) are
Via circles.r is the slope of the regression line in standardized coordinates. This line can be characterized in various ways, including geometric ones, such as minimizing the total area of circles drawn between the line and the data points in a scatterplot.
By coloring rectangles. Covariance can be assessed by coloring rectangles in a scatterplot (that is, by summing signed areas of rectangles). When the scatterplot is standardized, the net amount of color--the total signed error--isr .
sumber