Contoh ketika menggunakan akurasi sebagai ukuran hasil akan mengarah pada kesimpulan yang salah

8

Saya mencari berbagai ukuran kinerja untuk model prediksi. Banyak yang ditulis tentang masalah menggunakan akurasi, bukan sesuatu yang lebih berkelanjutan untuk mengevaluasi kinerja model. Frank Harrell http://www.fharrell.com/post/class-damage/ memberikan contoh ketika menambahkan variabel informatif ke model akan menyebabkan penurunan akurasi, kesimpulan yang jelas berlawanan dengan intuisi dan salah.

masukkan deskripsi gambar di sini

Namun, dalam kasus ini, ini tampaknya disebabkan oleh memiliki kelas yang tidak seimbang, dan karenanya dapat diselesaikan hanya dengan menggunakan akurasi yang seimbang sebagai gantinya ((sens + spec) / 2). Apakah ada beberapa contoh di mana menggunakan akurasi pada dataset yang seimbang akan menghasilkan beberapa kesimpulan yang jelas salah atau berlawanan dengan intuisi?

Edit

Saya mencari sesuatu di mana keakuratan akan turun bahkan ketika modelnya jelas lebih baik, atau bahwa menggunakan akurasi akan mengarah pada pilihan positif palsu dari beberapa fitur. Sangat mudah untuk membuat contoh negatif palsu, di mana akurasi sama untuk dua model di mana satu jelas lebih baik menggunakan kriteria lain.

rep_ho
sumber
4
Terkait, tetapi bukan duplikat: Mengapa akurasi bukan ukuran terbaik untuk menilai model klasifikasi?
Stephan Kolassa
2
Posting Stephan terhubung di atas adalah sumber yang bagus dan memiliki semua yang Anda butuhkan. Langkah awal Anda dengan asumsi bahwa klasifikasi paksa (keputusan prematur) diperlukan telah menyebabkan masalah. Dan (sens + spec) / 2 bukan skor akurasi yang tepat; mengoptimalkannya akan mengarah pada memilih fitur yang salah dan memberi mereka bobot yang salah, belum lagi mengabaikan informasi berguna yang berasal dari probabilitas, seperti zona "tidak ada keputusan".
Frank Harrell
Saya memahami nilai prediksi probabilistik, tetapi saya mencari contoh di mana hal-hal buruk yang Anda sebutkan benar-benar terjadi untuk data yang seimbang atau akurasi yang seimbang.
rep_ho
BTW: menurut Gneiting & Raftery 2007, akurasi tepat (meskipun tidak sepenuhnya tepat). ada komentar tentang itu? amstat.tandfonline.com/doi/abs/10.1198/...
rep_ho
Saya harap jawaban saya bermanfaat. (@FrankHarrell: komentar apa pun akan diterima.) Akurasi yang seimbang juga tidak akan membantu di sini. Mengenai akurasi sebagai aturan penilaian yang tepat tetapi tidak sepenuhnya tepat: Anda mungkin tertarik pada Apakah akurasi aturan penilaian yang tidak tepat dalam pengaturan klasifikasi biner? Jika utas itu tidak menjawab pertanyaan Anda, pertimbangkan untuk mengajukan pertanyaan baru (dan tunjukkan mengapa utas yang ada bukanlah jawaban sehingga tidak tertipu-tertutup).
Stephan Kolassa

Jawaban:

14

Saya akan curang.

Secara khusus, saya sering berdebat (misalnya, di sini ) bahwa bagian statistik pemodelan dan prediksi hanya mencakup membuat prediksi probabilistik untuk keanggotaan kelas (atau memberikan kepadatan prediksi, dalam hal peramalan numerik). Memperlakukan contoh spesifik seolah-olah itu milik kelas tertentu (atau prediksi titik dalam kasus numerik), bukan statistik yang benar lagi. Itu adalah bagian dari aspek teoretik keputusan .

Dan keputusan tidak hanya harus didasarkan pada prediksi probabilistik, tetapi juga pada biaya kesalahan klasifikasi, dan pada sejumlah tindakan yang mungkin lainnya . Misalnya, bahkan jika Anda hanya memiliki dua kelas yang memungkinkan, "sakit" vs. "sehat", Anda dapat memiliki sejumlah besar tindakan yang mungkin tergantung pada seberapa besar kemungkinan seorang pasien menderita penyakit, mengirimnya pulang karena ia hampir pasti sehat, memberinya dua aspirin, menjalankan tes tambahan, segera memanggil ambulans dan menempatkannya sebagai penunjang kehidupan.

Menilai akurasi mengandaikan keputusan seperti itu. Akurasi sebagai metrik evaluasi untuk klasifikasi adalah kesalahan kategori .

Jadi, untuk menjawab pertanyaan Anda, saya akan berjalan di jalur kesalahan kategori seperti itu. Kami akan mempertimbangkan skenario sederhana dengan kelas yang seimbang di mana klasifikasi tanpa memperhatikan biaya kesalahan klasifikasi memang akan menyesatkan kita.


Misalkan wabah Ganas Gutrot merajalela di populasi. Untungnya, kita dapat menyaring semua orang dengan mudah untuk beberapa sifatt (0t1), dan kita tahu bahwa kemungkinan pengembangan MG tergantung secara linear t, p=γt untuk beberapa parameter γ (0γ1). Sifat itut didistribusikan secara merata dalam populasi.

Untungnya, ada vaksinnya. Sayangnya, itu mahal, dan efek sampingnya sangat tidak nyaman. (Aku akan membiarkan imajinasimu memberikan perinciannya.) Namun, mereka lebih baik daripada menderita MG.

Demi kepentingan abstraksi, saya berpendapat bahwa memang hanya ada dua tindakan yang mungkin dilakukan untuk setiap pasien, mengingat nilai sifat mereka t: baik memvaksinasi, atau tidak memvaksinasi.

Dengan demikian, pertanyaannya adalah: bagaimana seharusnya kita memutuskan siapa yang akan divaksinasi dan siapa yang tidak t? Kami akan menjadi utilitarian tentang hal ini dan bertujuan memiliki total biaya terendah yang diharapkan. Jelas bahwa ini adalah memilih ambangθ dan untuk memvaksinasi semua orang dengan tθ.


Model dan keputusan 1 digerakkan oleh akurasi. Pas dengan model. Untungnya, kita sudah tahu modelnya. Pilih ambangnyaθyang memaksimalkan akurasi ketika mengklasifikasikan pasien , dan memvaksinasi semua orang dengantθ. Kita dengan mudah melihatnyaθ=12γ adalah angka ajaib - semua orang dengan tθmemiliki kemungkinan lebih tinggi tertular MG daripada tidak, dan sebaliknya, sehingga ambang probabilitas klasifikasi ini akan memaksimalkan akurasi. Dengan asumsi kelas seimbang,γ=1, kami akan memvaksinasi setengah populasi. Lucunya, kalauγ<12, kami tidak akan memvaksinasi siapa pun . (Kami sebagian besar tertarik pada kelas yang seimbang, jadi mari kita abaikan bahwa kita membiarkan sebagian dari populasi mati sebagai Kematian yang Mengerikan yang Mengerikan.)

Tidak perlu dikatakan, ini tidak memperhitungkan biaya diferensial dari kesalahan klasifikasi.


Model-dan-keputusan 2 memanfaatkan prediksi probabilistik kami ("mengingat sifat Anda t, probabilitas Anda terkena MG γt") dan struktur biaya.

Pertama, ini adalah grafik kecil. Sumbu horizontal memberikan sifat, sumbu vertikal probabilitas MG. Segitiga yang diarsir memberikan proporsi populasi yang akan mengontrak MG. Garis vertikal memberi beberapa tertentuθ. Garis putus-putus horisontal diγθakan membuat perhitungan di bawah ini sedikit lebih mudah diikuti. Kami berasumsiγ>12, hanya untuk membuat hidup lebih mudah.

classification

Mari kita beri nama biaya kita dan hitung kontribusinya pada total biaya yang diharapkan, diberikan θ dan γ (dan fakta bahwa sifat tersebut terdistribusi secara merata dalam populasi).

  • Membiarkan c++menunjukkan biaya untuk pasien yang divaksinasi dan akan mengontrak MG. Diberikanθ, proporsi populasi yang mengeluarkan biaya ini adalah trapesium yang diarsir di bagian kanan bawah area
    (1θ)γθ+12(1θ)(γγθ).
  • Membiarkan c+menunjukkan biaya untuk pasien yang divaksinasi dan tidak akan mengontrak MG. Diberikanθ, proporsi populasi yang mengeluarkan biaya ini adalah trapesium yang tidak diarsir di bagian kanan atas area
    (1θ)(1γ)+12(1θ)(γγθ).
  • Membiarkan cmenunjukkan biaya untuk pasien yang tidak divaksinasi dan tidak akan mengontrak MG. Diberikanθ, proporsi populasi yang mengeluarkan biaya ini adalah trapesium yang tidak diarsir di bagian kiri atas area
    θ(1γθ)+12θγθ.
  • Membiarkan c+menunjukkan biaya untuk pasien yang tidak divaksinasi dan akan mengontrak MG. Diberikanθ, proporsi populasi yang mengeluarkan biaya ini adalah segitiga berarsir di bagian kiri bawah dengan luas
    12θγθ.

(Di setiap trapesium, pertama-tama saya menghitung luas persegi panjang, lalu menambahkan luas segitiga.)

Total biaya yang diharapkan adalah

c++((1θ)γθ+12(1θ)(γγθ))+c+((1θ)(1γ)+12(1θ)(γγθ))+c(θ(1γθ)+12θγθ)+c+12θγθ.

Differentiating and setting the derivative to zero, we obtain that expected costs are minimized by

θ=c+cγ(c++c+c++c).

This is only equal to the accuracy maximizing value of θ for a very specific cost structure, namely if and only if

12γ=c+cγ(c++c+c++c),
or
12=c+cc++c+c++c.

As an example, suppose that γ=1 for balanced classes and that costs are

c++=1,c+=2,c+=10,c=0.
Then the accuracy maximizing θ=12 will yield expected costs of 1.875, whereas the cost minimizing θ=211 will yield expected costs of 1.318.

In this example, basing our decisions on non-probabilistic classifications that maximized accuracy led to more vaccinations and higher costs than using a decision rule that explicitly used the differential cost structures in the context of a probabilistic prediction.


Bottom line: accuracy is only a valid decision criterion if

  • there is a one-to-one relationship between classes and possible actions
  • and the costs of actions applied to classes follow a very specific structure.

In the general case, evaluating accuracy asks a wrong question, and maximizing accuracy is a so-called type III error: providing the correct answer to the wrong question.


R code:

rm(list=ls())
gamma <- 0.7

cost_treated_positive <- 1          # cost of treatment, side effects unimportant
cost_treated_negative <- 2          # cost of treatment, side effects unnecessary
cost_untreated_positive <- 10       # horrible, painful death
cost_untreated_negative <- 0        # nothing

expected_cost <- function ( theta ) {
    cost_treated_positive * ( (1-theta)*theta*gamma + (1-theta)*(gamma-gamma*theta)/2 ) +
    cost_treated_negative * ( (1-theta)*(1-gamma) + (1-theta)*(gamma-gamma*theta)/2 ) +
    cost_untreated_negative *( theta*(1-gamma*theta) + theta*gamma*theta/2 ) +
    cost_untreated_positive * theta*gamma*theta/2
}

(theta <- optim(par=0.5,fn=expected_cost,lower=0,upper=1,method="L-BFGS-B")$par)
(cost_treated_negative-cost_untreated_negative)/
    (gamma*(cost_treated_negative+cost_untreated_positive-cost_treated_positive-cost_untreated_negative))

plot(c(0,1),c(0,1),type="n",bty="n",xaxt="n",xlab="Trait t",yaxt="n",ylab="MG probability")
rect(0,0,1,1)
axis(1,c(0,theta,1),c(0,"theta",1),lty=0,line=-1)
axis(2,c(0,1),lty=0,line=-1,las=1)
axis(4,c(0,gamma,1),c(0,"gamma",1),lty=0,line=-1.8,las=1)
polygon(c(0,1,1),c(0,0,gamma),col="lightgray")
abline(v=theta,col="red",lwd=2)
abline(h=gamma*theta,lty=2,col="red",lwd=2)

expected_cost(1/(2*gamma))
expected_cost(theta)
Stephan Kolassa
sumber
1
Great post! Just logged in to express my thanks!
Wolfone
"but also on costs of misclassifications" I don't think its true: just as your calculation itself shows, it also (quite surprisingly!) depends on the cost of correct classifications as well!
Tamas Ferenci
I didn't want to edit your excellent answer that deeply, but it'd be perhaps instructive to note that the optimal threshold depends on the four cost, but only through cd+=c+c++ and cd=c+c: θ=cdγ(cd+cd+).
Tamas Ferenci
We could also make a plot of this: levelplot( thetastar ~ cdminus + cdplus, data = data.table( expand.grid( cdminus = seq( 0, 10, 0.01 ), cdplus = seq( 0, 10, 0.01 ) ) )[ , .( cdminus, cdplus, thetastar = cdminus/(cdminus + cdplus) ) ] )
Tamas Ferenci
"the optimal threshold depends on the four cost, but only through" More specifically, only through their ratio, sorry: θ=1/γ1+cd+cd.
Tamas Ferenci
4

It might worth adding another, perhaps more straightforward example to Stephen's excellent answer.

Let's consider a medical test, the result of which is normally distributed, both in sick and in healthy people, with different parameters of course (but for simplicity, let's assume homoscedasticity, i.e., that the variance is the same):

TDN(μ,σ2)TDN(μ+,σ2).
Let's denote the prevalence of the disease with p (i.e. DBern(p)), so this, together with the above, which are essentially conditional distributions, fully specifies the joint distribution.

Thus the confusion matrix with threshold b (i.e., those with test results above b are classified as sick) is

(DDTp(1Φ+(b))(1p)(1Φ(b))TpΦ+(b)(1p)Φ(b)).


Accuracy-based approach

The accuracy is

p(1Φ+(b))+(1p)Φ(b),

we take its derivative w.r.t. b, set it equal to 0, multiply with 1πσ2 and rearrange a bit:

pφ+(b)+φ(b)pφ(b)=0e(bμ)22σ2[(1p)pe2b(μμ+)+(μ+2μ2)2σ2]=0
The first term can't be zero, so the only way the product can be zero is if the second term is zero:
(1p)pe2b(μμ+)+(μ+2μ2)2σ2=02b(μμ+)+(μ+2μ2)2σ2=log1pp2b(μ+μ)+(μ2μ+2)=2σ2log1pp
So the solution is
b=(μ+2μ2)+2σ2log1pp2(μ+μ)=μ++μ2+σ2μ+μlog1pp.

Note that this - of course - doesn't depend on the costs.

If the classes are balanced, the optimum is the average of the mean test values in sick and healthy people, otherwise it is displaced based on the imbalance.


Cost-based approach

Using Stephen's notation, the expected overall cost is

c++p(1Φ+(b))+c+(1p)(1Φ(b))+c+pΦ+(b)+c(1p)Φ(b).
Take its derivate w.r.t b and set it equal to zero:
c++pφ+(b)c+(1p)φ(b)+c+pφ+(b)+c(1p)φ(b)==φ+(b)p(c+c++)+φ(b)(1p)(cc+)==φ+(b)pcd+φ(b)(1p)cd=0,
using the notation I introduced in my comments below Stephen's answer, i.e., cd+=c+c++ and cd=c+c.

The optimal threshold is therefore given by the solution of the equation

φ+(b)φ(b)=(1p)cdpcd+.
Two things should be noted here:

  1. This results is totally generic and works for any distribution of the test results, not only normal. (φ in that case of course means the probability density function of the distribution, not the normal density.)
  2. Whatever the solution for b is, it is surely a function of (1p)cdpcd+. (I.e., we immediately see how costs matter - in addition to class imbalance!)

I'd be really interested to see if this equation has a generic solution for b (parametrized by the φs), but I would be surprised.

Nevertheless, we can work it out for normal! 2πσ2s cancel on the left hand side, so we have

e12((bμ+)2σ2(bμ)2σ2)=(1p)cdpcd+(bμ)2(bμ+)2=2σ2log(1p)cdpcd+2b(μ+μ)+(μ2μ+2)=2σ2log(1p)cdpcd+
therefore the solution is
b=(μ+2μ2)+2σ2log(1p)cdpcd+2(μ+μ)=μ++μ2+σ2μ+μlog(1p)cdpcd+.

(Compare it the the previous result! We see that they are equal if and only if cd=cd+, i.e. the differences in misclassification cost compared to the cost of correct classification is the same in sick and healthy people.)


A short demonstration

Let's say c=0 (it is quite natural medically), and that c++=1 (we can always obtain it by dividing the costs with c++, i.e., by measuring every cost in c++ units). Let's say that the prevalence is p=0.2. Also, let's say that μ=9.5, μ+=10.5 and σ=1.

In this case:

library( data.table )
library( lattice )

cminusminus <- 0
cplusplus <- 1
p <- 0.2
muminus <- 9.5
muplus <- 10.5
sigma <- 1

res <- data.table( expand.grid( b = seq( 6, 17, 0.1 ),
                                cplusminus = c( 1, 5, 10, 50, 100 ),
                                cminusplus = c( 2, 5, 10, 50, 100 ) ) )
res$cost <- cplusplus*p*( 1-pnorm( res$b, muplus, sigma ) ) +
  res$cplusminus*(1-p)*(1-pnorm( res$b, muminus, sigma ) ) +
  res$cminusplus*p*pnorm( res$b, muplus, sigma ) +
  cminusminus*(1-p)*pnorm( res$b, muminus, sigma )

xyplot( cost ~ b | factor( cminusplus ), groups = cplusminus, ylim = c( -1, 22 ),
        data = res, type = "l", xlab = "Threshold",
        ylab = "Expected overall cost", as.table = TRUE,
        abline = list( v = (muplus+muminus)/2+
                         sigma^2/(muplus-muminus)*log((1-p)/p) ),
        strip = strip.custom( var.name = expression( {"c"^{"+"}}["-"] ),
                              strip.names = c( TRUE, TRUE ) ),
        auto.key = list( space = "right", points = FALSE, lines = TRUE,
                         title = expression( {"c"^{"-"}}["+"] ) ),
        panel = panel.superpose, panel.groups = function( x, y, col.line, ... ) {
          panel.xyplot( x, y, col.line = col.line, ... )
          panel.points( x[ which.min( y ) ], min( y ), pch = 19, col = col.line )
        } )

The result is (points depict the minimum cost, and the vertical line shows the optimal threshold with the accuracy-based approach):

Expected overall cost

We can very nicely see how cost-based optimum can be different than the accuracy-based optimum. It is instructive to think over why: if it is more costly to classify a sick people erroneously healthy than the other way around (c+ is high, c+ is low) than the threshold goes down, as we prefer to classify more easily into the category sick, on the other hand, if it is more costly to classify a healthy people erroneously sick than the other way around (c+ is low, c+ is high) than the threshold goes up, as we prefer to classify more easily into the category healthy. (Check these on the figure!)


A real-life example

Let's have a look at an empirical example, instead of a theoretical derivation. This example will be different basically from two aspects:

  • Instead of assuming normality, we will simply use the empirical data without any such assumption.
  • Instead of using one single test, and its results in its own units, we will use several tests (and combine them with a logistic regression). Threshold will be given to the final predicted probability. This is actually the preferred approach, see Chapter 19 - Diagnosis - in Frank Harrell's BBR.

The dataset (acath from the package Hmisc) is from the Duke University Cardiovascular Disease Databank, and contains whether the patient had significant coronary disease, as assessed by cardiac catheterization, this will be our gold standard, i.e., the true disease status, and the "test" will be the combination of the subject's age, sex, cholesterol level and duration of symptoms:

library( rms )
library( lattice )
library( latticeExtra )
library( data.table )

getHdata( "acath" )
acath <- acath[ !is.na( acath$choleste ), ]
dd <- datadist( acath )
options( datadist = "dd" )

fit <- lrm( sigdz ~ rcs( age )*sex + rcs( choleste ) + cad.dur, data = acath )

It worth plotting the predicted risks on logit-scale, to see how normal they are (essentially, that was what we assumed previously, with one single test!):

densityplot( ~predict( fit ), groups = acath$sigdz, plot.points = FALSE, ref = TRUE,
             auto.key = list( columns = 2 ) )

Distribution of predicted risks

Well, they're hardly normal...

Let's go on and calculate the expected overall cost:

ExpectedOverallCost <- function( b, p, y, cplusminus, cminusplus,
                                 cplusplus = 1, cminusminus = 0 ) {
  sum( table( factor( p>b, levels = c( FALSE, TRUE ) ), y )*matrix(
    c( cminusminus, cplusminus, cminusplus, cplusplus ), nc = 2 ) )
}

table( predict( fit, type = "fitted" )>0.5, acath$sigdz )

ExpectedOverallCost( 0.5, predict( fit, type = "fitted" ), acath$sigdz, 2, 4 )

And let's plot it for all possible costs (a computational note: we don't need to mindlessly iterate through numbers from 0 to 1, we can perfectly reconstruct the curve by calculating it for all unique values of predicted probabilities):

ps <- sort( unique( c( 0, 1, predict( fit, type = "fitted" ) ) ) )

xyplot( sapply( ps, ExpectedOverallCost,
                p = predict( fit, type = "fitted" ), y = acath$sigdz,
                cplusminus = 2, cminusplus = 4 ) ~ ps, type = "l", xlab = "Threshold",
        ylab = "Expected overall cost", panel = function( x, y, ... ) {
          panel.xyplot( x, y, ... )
          panel.points( x[ which.min( y ) ], min( y ), pch = 19, cex = 1.1 )
          panel.text( x[ which.min( y ) ], min( y ), round( x[ which.min( y ) ], 3 ),
                      pos = 3 )
        } )

Expected overall cost as a function of threshold

We can very well see where we should put the threshold to optimize the expected overall cost (without using sensitivity, specificity or predictive values anywhere!). This is the correct approach.

It is especially instructive to contrast these metrics:

ExpectedOverallCost2 <- function( b, p, y, cplusminus, cminusplus,
                                  cplusplus = 1, cminusminus = 0 ) {
  tab <- table( factor( p>b, levels = c( FALSE, TRUE ) ), y )
  sens <- tab[ 2, 2 ] / sum( tab[ , 2 ] )
  spec <- tab[ 1, 1 ] / sum( tab[ , 1 ] )
  c( `Expected overall cost` = sum( tab*matrix( c( cminusminus, cplusminus, cminusplus,
                                                   cplusplus ), nc = 2 ) ),
     Sensitivity = sens,
     Specificity = spec,
     PPV = tab[ 2, 2 ] / sum( tab[ 2, ] ),
     NPV = tab[ 1, 1 ] / sum( tab[ 1, ] ),
     Accuracy = 1 - ( tab[ 1, 1 ] + tab[ 2, 2 ] )/sum( tab ),
     Youden = 1 - ( sens + spec - 1 ),
     Topleft = ( 1-sens )^2 + ( 1-spec )^2
  )
}

ExpectedOverallCost2( 0.5, predict( fit, type = "fitted" ), acath$sigdz, 2, 4 )

res <- melt( data.table( ps, t( sapply( ps, ExpectedOverallCost2,
                                        p = predict( fit, type = "fitted" ),
                                        y = acath$sigdz,
                                        cplusminus = 2, cminusplus = 4 ) ) ),
             id.vars = "ps" )

p1 <- xyplot( value ~ ps, data = res, subset = variable=="Expected overall cost",
              type = "l", xlab = "Threshold", ylab = "Expected overall cost",
              panel=function( x, y, ... ) {
                panel.xyplot( x, y,  ... )
                panel.abline( v = x[ which.min( y ) ],
                              col = trellis.par.get()$plot.line$col )
                panel.points( x[ which.min( y ) ], min( y ), pch = 19 )
              }  )
p2 <- xyplot( value ~ ps, groups = variable,
              data = droplevels( res[ variable%in%c( "Expected overall cost",
                                                     "Sensitivity",
                                                     "Specificity", "PPV", "NPV" ) ] ),
              subset = variable%in%c( "Sensitivity", "Specificity", "PPV", "NPV" ),
              type = "l", xlab = "Threshold", ylab = "Sensitivity/Specificity/PPV/NPV",
              auto.key = list( columns = 3, points = FALSE, lines = TRUE ) )
doubleYScale( p1, p2, use.style = FALSE, add.ylab2 = TRUE )

Expected overall cost and traditional metrics as a function of threshold

We can now analyze those metrics that are sometimes specifically advertised as being able to come up with an optimal cutoff without costs, and contrast it with our cost-based approach! Let's use the three most often used metrics:

  • Accuracy (maximize accuracy)
  • Youden rule (maximize Sens+Spec1)
  • Topleft rule (minimize (1Sens)2+(1Spec)2)

(For simplicity, we will subtract the above values from 1 for the Youden and the Accuracy rule so that we have a minimization problem everywhere.)

Let's see the results:

p3 <- xyplot( value ~ ps, groups = variable,
              data = droplevels( res[ variable%in%c( "Expected overall cost", "Accuracy",
                                                     "Youden", "Topleft"  ) ] ),
              subset = variable%in%c( "Accuracy", "Youden", "Topleft"  ),
              type = "l", xlab = "Threshold", ylab = "Accuracy/Youden/Topleft",
              auto.key = list( columns = 3, points = FALSE, lines = TRUE ),
              panel = panel.superpose, panel.groups = function( x, y, col.line, ... ) {
                panel.xyplot( x, y, col.line = col.line, ... )
                panel.abline( v = x[ which.min( y ) ], col = col.line )
                panel.points( x[ which.min( y ) ], min( y ), pch = 19, col = col.line )
              } )
doubleYScale( p1, p3, use.style = FALSE, add.ylab2 = TRUE )

Choices to select the optimal cutoff

This of course pertains to one specific cost structure, c=0, c++=1, c+=2, c+=4 (this obviously matters only for the optimal cost decision). To investigate the effect of cost structure, let's pick just the optimal threshold (instead of tracing the whole curve), but plot it as a function of costs. More specifically, as we have already seen, the optimal threshold depends on the four costs only through the cd/cd+ ratio, so let's plot the optimal cutoff as a function of this, along with the typically used metrics that don't use costs:

res2 <- data.frame( rat = 10^( seq( log10( 0.02 ), log10( 50 ), length.out = 500 ) ) )
res2$OptThreshold <- sapply( res2$rat,
                             function( rat ) ps[ which.min(
                               sapply( ps, Vectorize( ExpectedOverallCost, "b" ),
                                       p = predict( fit, type = "fitted" ),
                                       y = acath$sigdz,
                                       cplusminus = rat,
                                       cminusplus = 1,
                                       cplusplus = 0 ) ) ] )

xyplot( OptThreshold ~ rat, data = res2, type = "l", ylim = c( -0.1, 1.1 ),
        xlab = expression( {"c"^{"-"}}["d"]/{"c"^{"+"}}["d"] ), ylab = "Optimal threshold",
        scales = list( x = list( log = 10, at = c( 0.02, 0.05, 0.1, 0.2, 0.5, 1,
                                                   2, 5, 10, 20, 50 ) ) ),
        panel = function( x, y, resin = res[ ,.( ps[ which.min( value ) ] ),
                                             .( variable ) ], ... ) {
          panel.xyplot( x, y, ... )
          panel.abline( h = resin[variable=="Youden"] )
          panel.text( log10( 0.02 ), resin[variable=="Youden"], "Y", pos = 3 )
          panel.abline( h = resin[variable=="Accuracy"] )
          panel.text( log10( 0.02 ), resin[variable=="Accuracy"], "A", pos = 3 )
          panel.abline( h = resin[variable=="Topleft"] )
          panel.text( log10( 0.02 ), resin[variable=="Topleft"], "TL", pos = 1 )
        } )

Optimal thresholds for different costs

Horizontal lines indicate the approaches that don't use costs (and are therefore constant).

Again, we nicely see that as the additional cost of misclassification in the healthy group rises compared to that of the diseased group, the optimal threshold increases: if we really don't want healthy people to be classified as sick, we will use higher cutoff (and the other way around, of course!).

And, finally, we yet again see why those methods that don't use costs are not (and can't!) be always optimal.

Tamas Ferenci
sumber