Haruskah kekikiran benar-benar masih menjadi standar emas?

31

Hanya pemikiran saja:

Model Parsimonious selalu menjadi pilihan standar dalam pemilihan model, tetapi sampai sejauh mana pendekatan ini sudah usang? Saya ingin tahu tentang seberapa besar kecenderungan kita terhadap kekikiran adalah peninggalan zaman abaci dan aturan geser (atau, lebih serius, komputer non-modern). Kekuatan komputasi saat ini memungkinkan kami untuk membangun model yang semakin kompleks dengan kemampuan prediksi yang semakin besar. Sebagai hasil dari peningkatan daya komputasi ini, apakah kita benar-benar masih perlu bergerak ke arah kesederhanaan?

Tentu saja, model yang lebih sederhana lebih mudah untuk dipahami dan ditafsirkan, tetapi di era set data yang terus tumbuh dengan jumlah variabel yang lebih besar dan pergeseran menuju fokus yang lebih besar pada kemampuan prediksi, ini mungkin bahkan tidak lagi dapat dicapai atau diperlukan.

Pikiran?

ahli kehutanan
sumber
4
Dengan permintaan maaf kepada Richard Hamming: Tujuan pemodelan adalah wawasan, bukan angka. Model yang rumit menghambat wawasan.
Eric Towers
12
Model-model yang disederhanakan menyederhanakan wawasan bahkan lebih.
Frank Harrell
6
Itu mungkin tergantung pada aplikasi; dalam fisika, saya pikir argumen untuk kekikiran akan memiliki dasar yang kuat. Namun, banyak aplikasi akan memiliki sejumlah efek kecil yang tidak dapat dihilangkan (pertimbangkan model untuk preferensi politik, misalnya). Sejumlah pekerja menyarankan bahwa penggunaan regularisasi (seperti metode yang mengarah pada penyusutan atau dalam banyak aplikasi penyusutan perbedaan, atau keduanya) daripada penghapusan variabel lebih masuk akal; yang lain condong ke arah seleksi dan penyusutan (misalnya LASSO, melakukan keduanya).
Glen_b -Reinstate Monica
3
Model Parsimonious bukan "masuk" ke dalam pemilihan model. Kalau tidak, kita akan selalu memodelkan segala sesuatu dengan sampel rata-rata dan menyebutnya sehari.
shadowtalker
1
Juga, beberapa bahan untuk dipikirkan: Mease dan Wyner (2008) merekomendasikan pembelajar yang lebih kaya di AdaBoost, yang sedikit tidak intuitif. Sebuah pertanyaan terbuka dalam jalur penelitian itu tampaknya adalah apakah pembelajar dasar yang pelit benar-benar mengarah pada ansambel yang pelit.
shadowtalker

Jawaban:

25

@ Jawaban asli Matt melakukan pekerjaan yang baik untuk menggambarkan salah satu manfaat kekikiran tetapi saya tidak berpikir itu benar-benar menjawab pertanyaan Anda. Pada kenyataannya, kekikiran bukanlah standar emas. Tidak sekarang juga belum pernah. "Standar emas" yang terkait dengan kekikiran adalah kesalahan generalisasi. Kami ingin mengembangkan model yang tidak cocok. Itu berguna untuk prediksi (atau sebagai dapat ditafsirkan atau dengan kesalahan minimum) dari sampel seperti pada sampel. Ternyata (karena hal-hal yang dijelaskan di atas) bahwa kekikiran sebenarnya adalah proksi yang cukup baik untuk kesalahan generalisasi tetapi tidak berarti satu-satunya.

Sungguh, pikirkan mengapa kita menggunakan validasi silang atau bootstrap atau set train / test. Tujuannya adalah untuk membuat model dengan akurasi generalisasi yang baik. Banyak waktu, cara-cara ini memperkirakan kinerja sampel akhirnya memilih model dengan kompleksitas yang lebih rendah tetapi tidak selalu. Sebagai contoh ekstrem, bayangkan ramalan itu memberi kita model yang benar-benar rumit tetapi sangat miskin dan model pelit. Jika kekikiran benar-benar tujuan kita maka kita akan memilih yang kedua tetapi dalam kenyataannya, yang pertama adalah apa yang ingin kita pelajari jika kita bisa. Sayangnya banyak waktu kalimat terakhir itu adalah kicker, "jika kita bisa".

Nick Thieme
sumber
Yang merupakan "jawaban asli"?
mattdm
:) cukup adil. Komentar Matt.
Nick Thieme
22

Model Parsimonious diinginkan bukan hanya karena persyaratan komputasi, tetapi juga untuk kinerja generalisasi. Tidak mungkin untuk mencapai data infinite ideal yang sepenuhnya dan akurat mencakup ruang sampel, yang berarti bahwa model non-pelit memiliki potensi untuk menyesuaikan dan memodelkan kebisingan atau keanehan dalam populasi sampel.

It's certainly possible to build a model with millions of variables, but you'd be using variables that have no impact on the output to model the system. You could achieve great predictive performance on your training dataset, but those irrelevant variables will more than likely decrease your performance on an unseen test set.

If an output variable truly is the result of a million input variables, then you would do well to put them all in your predictive model, but only if you have enough data. To accurately build a model of this size, you'd need several million data points, at minimum. Parsimonious models are nice because in many real-world systems, a dataset of this size simply isn't available, and furthermore, the output is largely determined by a relatively small number of variables.

Nuclear Wang
sumber
5
+1. I suggest reading The Elements of Statistical Learning (freely available on the web), which discusses this problem in depth.
S. Kolassa - Reinstate Monica
3
On the other hand, when you have millions of variables and few objects, it is likely that purely by chance some variables are better at explaining outcome that the true interaction. In such case parsimony-based modelling will be more susceptible to overfitting than a brute-force approach.
@CagdasOzgenc For instance a large random subspace ensemble.
I feel like something like a Lasso approach could apply here.
theforestecologist
17

I think the previous answers do a good job of making important points:

  • Parsimonious models tend to have better generalization characteristics.
  • Parsimony is not truly a gold standard, but just a consideration.

I want to add a few comments that come out of my day to day job experience.

The generalization of predictive accuracy argument is, of course, strong, but is academically bias in its focus. In general, when producing a statistical model, the economies are not such that predictive performance is a completely dominant consideration. Very often there are large outside constraints on what a useful model looks like for a given application:

  • The model must be implementable within an existing framework or system.
  • The model must be understandable by a non-technical entity.
  • The model must be efficient computationally.
  • The model must be documentable.
  • The model must pass regulatory constraints.

In real application domains, many if not all of these considerations come before, not after, predictive performance - and the optimization of model form and parameters is constrained by these desires. Each of these constraints biases the scientist towards parsimony.

It may be true that in many domains these constraints are being gradually lifted. But it is the lucky scientist indeed that gets to ignore them are focus purely on minimizing generalization error.

This can be very frustrating for the first time scientist, fresh out of school (it definitely was for me, and continues to be when I feel that the constraints placed on my work are not justified). But in the end, working hard to produce an unacceptable product is a waste, and that feels worse than the sting to your scientific pride.

Matthew Drury
sumber
2
No parsimony is not a consideration. A sound inference procedure MUST rank a parsimonious model over a non-parsimonious model if they explain the data equally well. Otherwise total compressed codelength of the model and the data encoded by the model will not be the smallest. So yes it is a gold standard.
Cagdas Ozgenc
3
Parsimony is NOT a "gold standard"! That statement is preposterous. If it were true, then why don't we always build models that fit nothing but the unconditional mean? We trade off bias and variance with reference to either a test set or, better still, completely new observations, and we do so within the constraints of our field, organization, and the law. Sometimes you only have enough information to make naive predictions. Sometimes you've got enough to add complexity.
Brash Equilibrium
1
@BrashEquilibrium I think what Cagdas is saying is, given the choice between equally predictive models, one should choose the most parsimonious one.
Matthew Drury
1
Ah. That's a different thing. Yes, in that case choose the most parsimonious model. I still don't think that amounts to parsimony being a "gold standard" though.
Brash Equilibrium
1
@MatthewDrury Brash, Cagdas. Interesting. Perhaps, parsimony is just one component of the gold standard; which is probably (or ought to be) better based around the notion of encompassing. A good exposition of this idea is provided in the following astrophysics lecture from Yale: oyc.yale.edu/astronomy/astr-160/lecture-11. 7:04 onwards. The idea also features in the econometric/forecasting literature by David Hendry and Grayham Mizon. They argue that encompassing is part of a progressive research strategy, of which parsimony is a single aspect.
Graeme Walsh
14

I think this is a very good question. In my opinion parsimony is overrated. Nature is rarely parsimonious, and so we shouldn't necessarily expect accurate predictive or descriptive models to be so either. Regarding the question of interpretability, if you choose a simpler model that only modestly conforms to reality merely because you can understand it, what exactly are you understanding? Assuming a more complex model had better predictive power, it would appear to be closer to the actual facts anyways.

dsaxton
sumber
8
Well said @dsaxton. There is a great misunderstanding of parsimony and a great under-appreciation of how volatile feature selection is. Parsimony is nice when it results from pre-specification. Most parsimony that results from data dredging is misleading and is only understood because it's wrong.
Frank Harrell
2
@FrankHarrell Would you elaborate on "only understood because it's wrong", or perhaps link to something you wrote previously about this? This is an interesting point that I would like to make sure I understand.
gui11aume
8
This is an extreme example but people who engage in racial profiling think they understand, with a single feature (e.g., skin color), what value someone has. To them the answer is simple. They only understand it because they are making a wrong judgment by oversimplifying. Parsimony is usually an illusion (except in Newtonian mechanics and a few other areas).
Frank Harrell
1
"Nature is rarely parsimonious": and one point where nature is particularly non-parsimonious is individuals (as opposed to our typical sample sizes!). Evolution uses a whole new population of new individuals each generation... IMHO the parsimony (Frank Harrell's pre-specified type - allowing any n of m available features into the model is in fact a very complex model - even if n << m, this is a non-so-small fraction of the original search space) is how we try to get at least something out of our far-too-small data sets.
cbeleites supports Monica
2

Parsimony is not a golden start. It's an aspect in modeling. Modeling and especially forecasting can not be scripted, i.e. you can't just hand a script to a modeler to follow. You rather define principles upon which the modeling process must be based. So, the parsimony is one of these principles, application of which can not be scripted (again!). A modeler will consider the complexity when a selecting model.

Computational power has little to do with this. If you're in the industry your models will be consumed by business folks, product people, whoever you call them. You have to explain your model to them, it should make a sense to them. Having parsimonious models helps in this regard.

For instance, you're forecasting product sales. You should be able to describe what are the drivers of sales, and how they work. These must be related to concepts with which business operates, and the correlations must be understood and accepted by business. With complex models it could be very difficult to interpret the results of the model or attribute the differences with actuals. If you can't explain your models to business, you will not be valued by it.

One more thing that is particularly important for forecasting. Let's say your model is dependent on N exogenous variables. This means that you have to first obtain the forecasts of these variables in order to forecast your dependent variable. Having smaller N makes your life easier, so a simpler model is easier to use.

Aksakal
sumber
Although you mention forecasting, most of your answer seems to apply only to explanatory modeling.
rolando2
@rolando2, it sounds like that because in my domain you can't simply hand the forecast to users. We have to explain the forecast, link it to drivers etc. When you get weather forecast you don't normally ask the forecaster to explain you why exactly they think it's going to rain with 50% chance. In my case I not only have to do it, but do it in a way that my consumers understand the results by linking it to business drivers that they deal with daily. That's why parsimony is valuable in its own right
Aksakal
1

Perhaps have a review of the Akaike Information Criterion, a concept that I only discovered by serendipity yesterday. The AIC seeks to identify which model and how many parameters are the best explanation for the observations at hand, rather than any basic Occam's Razor, or parsimony approach.

Philip Oakley
sumber