Saya ingin membuat kode untuk memplot ACF dan PACF dari data time-series. Sama seperti plot yang dihasilkan ini dari minitab (di bawah).
Saya sudah mencoba mencari formula, tetapi saya masih belum memahaminya dengan baik. Maukah Anda memberi tahu saya formula dan bagaimana menggunakannya? Apa garis merah horizontal pada plot ACF dan PACF di atas? Apa formulanya?
Terima kasih,
correlation
data-visualization
autocorrelation
partial-correlation
Surya Dewangga
sumber
sumber
Jawaban:
Autokorelasi
Korelasi antara dua variabely1,y2 didefinisikan sebagai:
di mana E adalah operator ekspektasi,μ1 dan μ2 adalah sarana masing-masing untuk y1 dan y2 dan σ1,σ2 adalah standar deviasi mereka.
Dalam konteks variabel tunggal, yaitu auto -correlation,y1 adalah seri asli dan y2 adalah versi lagged. Setelah definisi di atas, autocorrelations sampel agar k=0,1,2,... dapat diperoleh dengan menghitung ekspresi berikut dengan diamati seri yt , t=1,2,...,n :
di manay¯ adalah mean sampel dari data.
Autokorelasi parsial
Autokorelasi parsial mengukur ketergantungan linier dari satu variabel setelah menghilangkan efek variabel lain yang mempengaruhi kedua variabel. Sebagai contoh, autokorelasi parsial langkah-langkah agar efek (ketergantungan linear) dariyt−2 pada yt setelah mengeluarkan efek yt−1 pada kedua yt dan yt−2 .
Setiap autokorelasi parsial dapat diperoleh sebagai serangkaian regresi bentuk:
di manay~t adalah seri asli dikurangi mean sampel, yt−y¯ . Estimasi ϕ22 akan memberikan nilai autokorelasi parsial order 2. Memperluas regresi dengan k lag tambahan, estimasi term terakhir akan memberikan autokorelasi parsial order k .
whereρ(⋅) are the sample autocorrelations. This mapping between the sample autocorrelations and the partial autocorrelations is known as the
Durbin-Levinson recursion.
This approach is relatively easy to implement for illustration. For example, in the R software, we can obtain the partial autocorrelation of order 5 as follows:
Confidence bands
Confidence bands can be computed as the value of the sample autocorrelations±z1−α/2n√ , where z1−α/2 is the quantile 1−α/2 in the Gaussian distribution, e.g. 1.96 for 95% confidence bands.
Sometimes confidence bands that increase as the order increases are used. In this cases the bands can be defined as±z1−α/21n(1+2∑ki=1ρ(i)2)−−−−−−−−−−−−−−−−√ .
sumber
Although the OP is a bit vague, it may possibly be more targeted to a "recipe"-style coding formulation than a linear algebra model formulation.
The ACF is rather straightforward: we have a time series, and basically make multiple "copies" (as in "copy and paste") of it, understanding that each copy is going to be offset by one entry from the prior copy, because the initial data containst data points, while the previous time series length (which excludes the last data point) is only t−1 . We can make virtually as many copies as there are rows. Each copy is correlated to the original, keeping in mind that we need identical lengths, and to this end, we'll have to keep on clipping the tail end of the initial data series to make them comparable. For instance, to correlate the initial data to tst−3 we'll need to get rid of the last 3 data points of the original time series (the first 3 chronologically).
Example:
We'll concoct a times series with a cyclical sine pattern superimposed on a trend line, and noise, and plot the R generated ACF. I got this example from an online post by Christoph Scherber, and just added the noise to it:
Ordinarily we would have to test the data for stationarity (or just look at the plot above), but we know there is a trend in it, so let's skip this part, and go directly to the de-trending step:
Now we are ready to takle this time series by first generating the ACF with the
acf()
function in R, and then comparing the results to the makeshift loop I put together:OK. That was successful. On to the PACF. Much more tricky to hack... The idea here is to again clone the initial ts a bunch of times, and then select multiple time points. However, instead of just correlating with the initial time series, we put together all the lags in-between, and perform a regression analysis, so that the variance explained by the previous time points can be excluded (controlled). For example, if we are focusing on the PACF ending at timetst−4 , we keep tst , tst−1 , tst−2 and tst−3 , as well as tst−4 , and we regress tst∼tst−1+tst−2+tst−3+tst−4 through the origin and keeping only the coefficient for tst−4 :
And finally plotting again side-by-side, R-generated and manual calculations:
That the idea is correct, beside probable computational issues, can be seen comparing
PACF
topacf(st.y, plot = F)
.code here.
sumber
Well, in the practise we found error (noise) which is represented byet the confidence bands help you to figure out if a level can be considerate as only noise (because about the 95% times will be into the bands).
sumber
Here is a python code to compute ACF:
sumber