Saya memiliki beberapa jerat 3D triangulasi. Statistik untuk area segitiga adalah:
- Min 0.000
- Maks 2341.141
- Berarti 56.317
- Std dev 98.720
Jadi, apakah itu berarti sesuatu yang sangat berguna tentang standar deviasi atau menyarankan ada bug dalam menghitungnya, ketika angka-angkanya seperti di atas? Wilayah-wilayah tersebut tentu jauh dari distribusi normal.
Dan seperti seseorang yang disebutkan dalam salah satu tanggapan mereka di bawah ini, hal yang benar-benar mengejutkan saya adalah hanya butuh satu SD dari angka-angka untuk menjadi negatif dan dengan demikian keluar dari domain hukum.
Terima kasih
distributions
mean
standard-deviation
Andy Dent
sumber
sumber
Jawaban:
Tidak ada yang menyatakan bahwa standar deviasi harus kurang dari atau lebih dari rata-rata. Diberikan seperangkat data, Anda dapat menjaga mean tetap sama tetapi mengubah standar deviasi ke tingkat yang sewenang-wenang dengan menambahkan / mengurangi angka positif dengan tepat .
Menggunakan dataset contoh @ whuber dari komentarnya ke pertanyaan: {2, 2, 2, 202}. Seperti yang dinyatakan oleh @whuber: mean adalah 52 dan standar deviasi adalah 100.
Sekarang, ganggu setiap elemen data sebagai berikut: {22, 22, 22, 142}. Rata-rata masih 52 tetapi standar deviasi adalah 60.
sumber
Of course, these are independent parameters. You can set simple explorations in R (or another tool you may prefer).
Similarly, you standardize the data you are looking at by subtracting the mean and dividing by the standard deviation.
Edit And following @whuber's idea, here is one an infinity of data sets which come close to your four measurements:
sumber
I am not sure why @Andy is surprised at this result, but I know he is not alone. Nor am I sure what the normality of the data has to do with the fact that the sd is higher than the mean. It is quite simple to generate a data set that is normally distributed where this is the case; indeed, the standard normal has mean of 0, sd of 1. It would be hard to get a normally distribute data set of all positive values with sd > mean; indeed, it ought not be possible (but it depends on the sample size and what test of normality you use... with a very small sample, odd things happen)
However, once you remove the stipulation of normality, as @Andy did, there's no reason why sd should be larger or smaller then the mean, even for all positive values. A single outlier will do this. e.g.
x <- runif(100, 1, 200) x <- c(x, 2000)
gives mean of 113 and sd of 198 (depending on seed, of course).
But a bigger question is why this surprises people.
I don't teach statistics, but I wonder what about the way statistics is taught makes this notion common.
sumber
Just adding a generic point that, from a calculus perspective,
sumber
Perhaps the OP is surprised that the mean - 1 S.D. is a negative number (especially where the minimum is 0).
Here are two examples that may clarify.
Suppose you have a class of 20 first graders, where 18 are 6 years old, 1 is 5, and 1 is 7. Now add in the 49-year-old teacher. The average age is 8.0, while the standard deviation is 9.402.
You might be thinking: one standard deviation ranges for this class ranges from -1.402 to 17.402 years. You might be surprised that the S.D. includes a negative age, which seems unreasonable.
You don't have to worry about the negative age (or the 3D plots extending less than the minimum of 0.0). Intuitively, you still have about two-thirds of the data within 1 S.D. of the mean. (You actually have 95% of the data within 2 S.D. of the mean.)
When the data takes on a non-normal distribution, you will see surprising results like this.
Second example. In his book, Fooled by Randomness, Nassim Taleb sets up the thought experiment of a blindfolded archer shooting at a wall of inifinte length. The archer can shoot between +90 degrees and -90 degrees.
Every once in a while, the archer will shoot the arrow parallel to the wall, and it will never hit. Consider how far the arrow misses the target as the distribution of numbers. The standard deviation for this scenario would be inifinte.
sumber
A gamma random variableX with density
R
to get a feeling about this. Here are examples withsumber
As pointed out in the other answers, the meanx¯ and standard deviation
σx are essentially unrelated in that it is not necessary for the standard deviation to be smaller than the mean. However, if the data are nonnegative, taking on values in [0,c] , say, then, for large data sets (where the distinction between dividing by n or by n−1 does not matter very much), the following inequality
holds:
sumber
What you seem to have in mind implicitly is a prediction interval that would bound the occurrence of new observations. The catch is: you must postulate a statistical distribution compliant with the fact that your observations (triangle areas) must remain non-negative. Normal won't help, but log-normal might be just fine. In practical terms, take the log of observed areas, calculate the mean and standard deviation, form a prediction interval using the normal distribution, and finally evaluate the exponential for the lower and upper limits -- the transformed prediction interval won't be symmetric around the mean, and is guaranteed to not go below zero. This is what I think the OP actually had in mind.
sumber
Felipe Nievinski points to a real issue here. It makes no sense to talk in normal distribution terms when the distribution is clearly not a normal distribution. All-positive values with a relatively small mean and relatively large standard deviation cannot have a normal distribution. So, the task is to figure out what sort of distribution fits the situation. The original post suggests that a normal distribution (or some such) was clearly in mind. Otherwise negative numbers would not come up. Log normal, Rayleigh, Weibull come to mind ... I don't know but wonder what might be best in a case like this?
sumber