Apakah korelasi mengasumsikan stasioneritas data?

Analisis antar pasar adalah metode pemodelan perilaku pasar dengan cara menemukan hubungan antara pasar yang berbeda. Sering kali, suatu korelasi dihitung antara dua pasar, katakanlah S&P 500 dan treasury AS 30-Tahun. Perhitungan ini lebih sering didasarkan pada data harga, yang jelas bagi semua orang bahwa itu tidak sesuai dengan definisi deret waktu stasioner.

Kemungkinan solusi samping (menggunakan pengembalian sebagai gantinya), apakah perhitungan korelasi yang datanya non-stasioner bahkan perhitungan statistik yang valid?

Apakah Anda akan mengatakan bahwa perhitungan korelasi semacam itu agak tidak dapat diandalkan, atau hanya omong kosong belaka?

correlation stationarity Milktrader
sumber

apa yang Anda maksud dengan "perhitungan statistik yang valid" Anda harus mengatakan perhitungan statistik (estimasi) yang valid dari sesuatu. Di sini sesuatu itu sangat penting. Korelasi adalah perhitungan yang valid dari hubungan linear antara dua set data. Saya tidak mengerti mengapa Anda membutuhkan stasioneritas, maksud Anda korelasi otomatis?

robin girard

ada situs baru yang mungkin lebih cocok untuk pertanyaan Anda: quant.stackexchange.com . Sekarang Anda jelas membingungkan perhitungan dengan interpretasi.

mpiktas

@mpiktas, komunitas kuant diselesaikan menggunakan pengembalian vs harga karena stasioneritas pengembalian dan non-stasioneritas harga. Saya bertanya di sini untuk sesuatu yang lebih dari sekadar penjelasan intuitif mengapa ini harus terjadi.

Milktrader

@robin, ada beberapa hal yang mungkin membuat Anda mempertanyakan analisis statistik. Ukuran sampel datang ke pikiran, seperti halnya hal-hal yang lebih jelas seperti data yang dimanipulasi. Apakah non-stasioneritas data mempertanyakan perhitungan korelasi?

Milktrader

bukan perhitungan, mungkin interpretasi jika korelasinya tidak tinggi. Jika tinggi itu berarti korelasi yang tinggi (hubungan linear yaitu tinggi), dua seri waktu non stationnary mengatakan

dan

dapat berpotensi sangat berkorelasi (misalnya ketika

(X_{t})

$(X_t)$

(Y_{t})

$(Y_t)$

X_{t} = Y_{t}

$X_t=Y_t$

robin girard

Jawaban:

Korelasi mengukur hubungan linear. Dalam konteks informal hubungan berarti sesuatu yang stabil. Ketika kami menghitung korelasi sampel untuk variabel stasioner dan meningkatkan jumlah poin data yang tersedia, korelasi sampel ini cenderung korelasi benar.

Dapat ditunjukkan bahwa untuk harga, yang biasanya adalah jalan acak, korelasi sampel cenderung variabel acak. Ini berarti bahwa tidak peduli berapa banyak data yang kita miliki, hasilnya akan selalu berbeda.

Catatan saya mencoba mengekspresikan intuisi matematika tanpa matematika. Dari sudut pandang matematis, penjelasannya sangat jelas: Contoh momen dari proses stasioner bertemu dalam probabilitas ke konstanta. Contoh momen jalan acak bertemu dengan integral gerakan brown yang merupakan variabel acak. Karena hubungan biasanya dinyatakan sebagai angka dan bukan variabel acak, alasan untuk tidak menghitung korelasi untuk variabel non-stasioner menjadi jelas.

Perbarui Karena kita tertarik korelasi antara dua variabel berasumsi pertama yang mereka datang dari stasioner proses . Stasioneritas menunjukkan bahwa dan tidak bergantung pada . Jadi korelasinya $Z_t=(X_t,Y_t)$ $EZ_t$ $cov(Z_t,Z_{t-h})$ $t$

c o r r (X_{t}, Y_{t}) = \frac{c o v (X_{t}, Y_{t})}{\sqrt{D X_{t} D Y_{t}}}

$corr(X_t,Y_t)=\frac{cov(X_t,Y_t)}{\sqrt{DX_tDY_t}}$

juga tidak tergantung pada , karena semua jumlah dalam formula berasal dari matriks , yang tidak tergantung pada . Jadi perhitungan korelasi sampel $t$ $cov(Z_t)$ $t$

merek akal, karena kita mungkin memiliki harapan yang masuk akal bahwa korelasi sampel akan memperkirakan. Ternyata harapan ini tidak berdasar, karena untuk proses stasioner memuaskan kondisi tertentu kita memiliki

\hat{ρ} = \frac{\frac{1}{T} \sum_{t = 1}^{T} (X_{t} - \bar{X}) (Y_{t} - \bar{Y})}{\sqrt{\frac{1}{T^{2}} \sum_{t = 1}^{T} (X_{t} - \bar{X})^{2} \sum_{t = 1}^{T} (Y_{t} - \bar{Y})^{2}}}

$\hat{\rho}=\frac{\frac{1}{T}\sum_{t=1}^T(X_t-\bar{X})(Y_t-\bar{Y})}{\sqrt{\frac{1}{T^2}\sum_{t=1}^T(X_t-\bar{X})^2\sum_{t=1}^T(Y_t-\bar{Y})^2}}$

ρ = c o r r (X_{t}, Y_{t})

$\rho=corr(X_t,Y_t)$

, sebagai

dalam probabilitas. Selanjutnya

\hat{ρ} \to ρ

$\hat{\rho}\to\rho$

T \to \infty

$T\to\infty$

dalam distribusi, sehingga kita dapat menguji hipotesis tentang

\sqrt{T} (\hat{ρ} - ρ) \to N (0, σ_{ρ}^{2})

$\sqrt{T}(\hat{\rho}-\rho)\to N(0,\sigma_{\rho}^2)$

ρ

$\rho$

Sekarang anggaplah bahwa tidak stasioner. Maka dapat bergantung pada . Jadi ketika kita mengamati sampel berukuran kami potentialy perlu memperkirakan korelasi yang berbeda . Hal ini tentu saja tidak layak, sehingga dalam skenario kasus terbaik kita hanya bisa memperkirakan beberapa fungsional seperti mean atau varians. Tetapi hasilnya mungkin tidak memiliki interpretasi yang masuk akal. $Z_t$ $corr(X_t,Y_t)$ $t$ $T$ $T$ $\rho_t$ $\rho_t$

Sekarang mari kita periksa apa yang terjadi dengan korelasi dari proses acak berjalan yang mungkin paling banyak dipelajari. Kami menyebutnya proses acak berjalan jika , di mana adalah proses stasioner. Untuk mempermudah berasumsi bahwa . Kemudian $Z_t=(X_t,Y_t)$ $Z_t=\sum_{s=1}^t(U_t,V_t)$ $C_t=(U_t,V_t)$ $EC_t=0$

\begin{aligned} c o r r (X_{t} Y_{t}) = \frac{E X_{t} Y_{t}}{\sqrt{D X_{t} D Y_{t}}} = \frac{E \sum_{s = 1}^{t} U_{t} \sum_{s = 1}^{t} V_{t}}{\sqrt{D \sum_{s = 1}^{t} U_{t} D \sum_{s = 1}^{t} V_{t}}} \end{aligned}

$\begin{align} corr(X_tY_t)=\frac{EX_tY_t}{\sqrt{DX_tDY_t}}=\frac{E\sum_{s=1}^tU_t\sum_{s=1}^tV_t}{\sqrt{D\sum_{s=1}^tU_tD\sum_{s=1}^tV_t}} \end{align}$

Untuk menyederhanakan masalah lebih lanjut, menganggap bahwa adalah white noise. Ini berarti bahwa semua korelasi adalah nol untuk . Perhatikan bahwa ini tidak membatasi ke nol. $C_t=(U_t,V_t)$ $E(C_tC_{t+h})$ $h>0$ $corr(U_t,V_t)$

Kemudian

\begin{aligned} c o r r (X_{t}, Y_{t}) = \frac{t E U_{t} V_{t}}{\sqrt{t^{2} D U_{t} D V_{t}}} = c o r r (U_{0}, V_{0}) . \end{aligned}

$\begin{align} corr(X_t,Y_t)=\frac{tEU_tV_t}{\sqrt{t^2DU_tDV_t}}=corr(U_0,V_0). \end{align}$

Sejauh ini bagus, meskipun prosesnya tidak diam, korelasi masuk akal, meskipun kami harus membuat asumsi yang sama.

Sekarang untuk melihat apa yang terjadi pada sampel korelasi kita perlu menggunakan fakta berikut tentang jalan-jalan acak, yang disebut teorema batas pusat fungsional:

\begin{aligned} \frac{1}{\sqrt{T}} Z_{[T s]} = \frac{1}{\sqrt{T}} \sum_{t = 1}^{[T s]} C_{t} \to (c o v (C_{0}))^{- 1 / 2} W_{s}, \end{aligned}

$\begin{align} \frac{1}{\sqrt{T}}Z_{[Ts]}=\frac{1}{\sqrt{T}}\sum_{t=1}^{[Ts]}C_t\to (cov(C_0))^{-1/2}W_s, \end{align}$ in distribution, where

s \in [0, 1]

$s\in[0,1]$ and

W_{s} = (W_{1 s}, W_{2 s})

$W_s=(W_{1s},W_{2s})$ is bivariate Brownian motion (two-dimensional Wiener process). For convenience introduce definition

M_{s} = (M_{1 s}, M_{2 s}) = (c o v (C_{0}))^{- 1 / 2} W_{s}

$M_s=(M_{1s},M_{2s})=(cov(C_0))^{-1/2}W_s$ .

Again for simplicity let us define sample correlation as

\begin{aligned} \hat{ρ} = \frac{\frac{1}{T} \sum_{t = 1}^{T} X_{t} Y_{t}}{\sqrt{\frac{1}{T} \sum_{t = 1}^{T} X_{t}^{2} \frac{1}{T} \sum_{t = 1}^{T} Y_{t}^{2}}} \end{aligned}

$\begin{align} \hat{\rho}=\frac{\frac{1}{T}\sum_{t=1}^TX_{t}Y_t}{\sqrt{\frac{1}{T}\sum_{t=1}^TX_t^2\frac{1}{T}\sum_{t=1}^TY_t^2}} \end{align}$

Let us start with the variances. We have

\begin{aligned} E \frac{1}{T} \sum_{t = 1}^{T} X_{t}^{2} = \frac{1}{T} E \sum_{t = 1}^{T} {(\sum_{s = 1}^{t} U_{t})}^{2} = \frac{1}{T} \sum_{t = 1}^{T} t σ_{U}^{2} = σ_{U} \frac{T + 1}{2} . \end{aligned}

$\begin{align} E\frac{1}{T}\sum_{t=1}^TX_t^2=\frac{1}{T}E\sum_{t=1}^T\left(\sum_{s=1}^tU_t\right)^2=\frac{1}{T}\sum_{t=1}^Tt\sigma_U^2=\sigma_U\frac{T+1}{2}. \end{align}$

This goes to infinity as $T$ increases, so we hit the first problem, sample variance does not converge. On the other hand continuous mapping theorem in conjunction with functional central limit theorem gives us

\begin{aligned} \frac{1}{T^{2}} \sum_{t = 1}^{T} X_{t}^{2} = \sum_{t = 1}^{T} \frac{1}{T} {(\frac{1}{\sqrt{T}} \sum_{s = 1}^{t} U_{t})}^{2} \to \int_{0}^{1} M_{1 s}^{2} d s \end{aligned}

$\begin{align} \frac{1}{T^2}\sum_{t=1}^TX_t^2=\sum_{t=1}^T\frac{1}{T}\left(\frac{1}{\sqrt{T}}\sum_{s=1}^tU_t\right)^2\to \int_0^1M_{1s}^2ds \end{align}$ where convergence is convergence in distribution, as

T \to \infty

$T\to \infty$ .

Similarly we get

\begin{aligned} \frac{1}{T^{2}} \sum_{t = 1}^{T} Y_{t}^{2} \to \int_{0}^{1} M_{2 s}^{2} d s \end{aligned}

$\begin{align} \frac{1}{T^2}\sum_{t=1}^TY_t^2\to \int_0^1M_{2s}^2ds \end{align}$ and

\begin{aligned} \frac{1}{T^{2}} \sum_{t = 1}^{T} X_{t} Y_{t} \to \int_{0}^{1} M_{1 s} M_{2 s} d s \end{aligned}

$\begin{align} \frac{1}{T^2}\sum_{t=1}^TX_tY_t\to \int_0^1M_{1s}M_{2s}ds \end{align}$

So finally for sample correlation of our random walk we get

\begin{aligned} \hat{ρ} \to \frac{\int_{0}^{1} M_{1 s} M_{2 s} d s}{\sqrt{\int_{0}^{1} M_{1 s}^{2} d s \int_{0}^{1} M_{2 s}^{2} d s}} \end{aligned}

$\begin{align} \hat{\rho}\to \frac{\int_0^1M_{1s}M_{2s}ds}{\sqrt{\int_0^1M_{1s}^2ds\int_0^1M_{2s}^2ds}} \end{align}$ in distribution as

T \to \infty

$T\to \infty$ .

So although correlation is well defined, sample correlation does not converge towards it, as in stationary process case. Instead it converges to a certain random variable.

mpiktas
sumber

The mathematical point of view explanation is what I was looking for. It gives me something to contemplate and explore further. Thanks.

Milktrader

This response seems to sidestep the original question: Aren't you just saying that yes, calculating correlation makes sense for stationary processes?

whuber

@whuber, I was answering the question having in mind the comment, but I reread the question again and as far as I understand the OP asks about calculation of correlation for non-stationary data. Calculation of correlation for stationary processes makes sense, all the macroeconometric analysis (VAR, VECM) relies on that.

mpiktas

I'll try to clarify my question with a response.

whuber

@whuber my take away from the answer is that a correlation based on non-stationary data yields a random variable, which may or may not be useful. Correlation based on stationary data converges to a constant. This may explain why traders are attracted to "x-day rolling correlation" because the correlated behavior is fleeting and spurious. Whether "x-day rolling correlation" is valid or useful is for another question.

Milktrader

...is the computation of correlation whose data is non-stationary even a valid statistical calculation?

Let $W$ be a discrete random walk. Pick a positive number $h$ . Define the processes $P$ and $V$ by $P(0) = 1$ , $P(t+1) = -P(t)$ if $V(t) > h$ , and otherwise $P(t+1) = P(t)$ ; and $V(t) = P(t)W(t)$ . In other words, $V$ starts out identical to $W$ but every time $V$ rises above $h$ , it switches signs (otherwise emulating $W$ in all respects).

enter image description here

(In this figure (for $h=5$ ) $W$ is blue and $V$ is red. There are four switches in sign.)

In effect, over short periods of time $V$ tends to be either perfectly correlated with $W$ or perfectly anticorrelated with it; however, using a correlation function to describe the relationship between $V$ and $W$ wouldn't be useful (a word that perhaps more aptly captures the problem than "unreliable" or "nonsense").

Mathematica code to produce the figure:

With[{h=5},
pv[{p_, v_}, w_] := With[{q=If[v > h, -p, p]}, {q, q w}];
w = Accumulate[RandomInteger[{-1,1}, 25 h^2]];
{p,v} = FoldList[pv, {1,0}, w] // Transpose;
ListPlot[{w,v}, Joined->True]]

whuber
sumber

it is good that your answer points that out but I wouldn't say the process are correlated, I would say they are dependent. This is the point. Calculation of correlation is valide and here it will say "no correlation" and we all know this does not mean "no dependence".

robin girard

@robin That's a good point, but I constructed this example specifically so that for potentially long periods of time these two processes are perfectly correlated. The issue is not one of dependence versus correlation but inherently is related to a subtler phenomenon: that the relationship between the processes changes at random periods. That, in a nutshell, is exactly what can happen in real markets (or at least we ought to worry that it can happen!).

whuber

@whubert yes, and this is a very good example showing that there are processes that have very high correlation for potentially long periods of time and still are not correlated at all (but highly dependent) when regarding the larger temporal scale.

robin girard

@robin girard, I think the key here is that for non-stationary processes the theoretical correlation varies with time, when for the stationary processes theoretical correlation stays the same. So with sample correlation which basically is one number, it is impossible to capture the variation of true correlations in case of non-stationary processes.

mpiktas