Bayesian Healthcare

This project was a consultation with a healthcare actuarial firm. The two functions below are used to calculate the potential distribution of costs for a procedure, and by extension to set insurance premiums. A bayesian approach was taken so that the firm could give concrete, intepretable probability ranges to shareholders.

Function 1: Estimating how often a procedure will be utilized by customers.

The bayesUtilization function calculates a beta distribution on the mean utilization of a procedure in 1000 insured persons per month. The “cases” variable is the total number of observed uses of the procedure, “population” is the total number of insured persons over the period, and “months” is the total time period that the data was collected over, in months. This allows for a standardization to the industry standard of reporting utilization in units of procedures per 1000 insured persons per month.

This function uses a conjugate beta prior for a gamma distribution. The gamma distribution offers superior support given the right-skewed potential of the distribution– which is the exact kind of skew that would cause the firm to lose money.

bayesUtilization<-function(cases,population,months){
  
  ###Set Distribution Parameters
    alpha<-cases
    beta<-population-cases
    
  ###Simulate Distribution for Graphical Output
    dist<-(rbeta(1000000,alpha,beta)/months)*1000
  
  ###Create Distribution histogram
    hist(dist,breaks=100,col=blues9,xlab = "Utilization (Per 1000 Persons Per Month)",
         main="Simulation Distribution of Mean Utilization",)
  
  ###Save Output  
    output<-c(qbeta(c(.025,.975),alpha,beta),
      alpha/(alpha+beta),
      ((alpha-1/3)/((alpha+beta)-2/3)),
      ((alpha-1)/((alpha+beta)-2)),
      sqrt((alpha*beta)/((alpha+beta)^2*(alpha+beta+1))))
    
    output<-round((output/months)*1000,digits = 5)
    names(output)<-c("95% Credible Interval Lower Bound","95% Credible Interval Upper Bound","Mean","Median","Mode","Standard Deviation")
  ###Print Output  
    output
}
out<-bayesUtilization(30,100000,1)

out
## 95% Credible Interval Lower Bound 95% Credible Interval Upper Bound 
##                           0.20242                           0.41647 
##                              Mean                            Median 
##                           0.30000                           0.29667 
##                              Mode                Standard Deviation 
##                           0.29001                           0.05476

Function 2: Estimate procedue cost per insured person.

The bayesCost function estimates a distribution for the average cost of a procedure. It requires a vector of observed values and four prior values:

  • priorSD: Estimate of the standard deviation of the cost of the procedure.
  • priorSDofSD: This is the estimated standard deviation OF the standard deviation. I.e., if “Theta” is the standard deviation of our prior, then priorSdofSd is the standard deviation of “Theta.”
  • priorMean: the estimated mean cost of the procedure.
  • priorSample: the prior “sample size.” If good rational can be given for a certain sample size for the prior, it can be entered here. If not, the default of 1 is used. This makes the entire prior carry much less weight.

We allow both the standard deviation and the mean to be random variables, meaning we use a Gamma prior for the standard deviation, and a normal prior for the mean. These Berkley lecture notes give a thorough but brief proof of the conjugate prior.

bayesCost<-function(dat,priorSd,priorSdofSd,priorMean,priorSample=1){

  ###Initial Prior Parameters
    alpha<-(priorSd)^2/priorSdofSd^2
    beta<-priorSd/priorSdofSd^2
    muNot<-priorMean
    nNot<-priorSample
    n<-length(dat)
  
    
  ###Gamma Posterior Parameters
    
    ###Alpha
    postAlpha<-alpha+n/2
    
    
    ###Beta
      ###calculate summation for beta formula
        datdiffs<-1:length(dat)
        meanval<-mean(dat)
        for(i in 1:length(dat)){
          datdiffs[i]<-(dat[i]-meanval)^2 
          }
    
      ###Output beta parameter
        postBeta<-beta+(1/2)*sum(datdiffs)+((n*nNot)/(2*(n+nNot)))*(mean(dat)-muNot)^2
    
  ###Draws of SD from Gamma
    tau<-as.matrix(rgamma(1000000,postAlpha,postBeta))
  
  ###Draws from Normal
    draws<-1:length(tau)
    
    
    for(i in 1:length(tau)){
      weightedsamp<-n*tau[i]
      totalsamp<-n*tau[i]+nNot*tau[i]
      priorsamp<-nNot*tau[i]
      
      draws[i]<-rnorm(1,mean=(weightedsamp/totalsamp)*meanval+(priorsamp/totalsamp)*priorMean,
                      sd=1/sqrt(weightedsamp+priorsamp))
    }
    
    
    draws<-sort(draws)
    
  ####Histogram of Data
      hist(draws,breaks=100,col=blues9,xlab = "Cost (US Dollars)",
           main="Distribution of Mean Cost",ylab = "Density")
  
  ####Save Output
      
      output<-c(draws[25000],draws[975000],mean(draws),draws[500000],sd(draws)) 
      names(output)<-c("95% Credible Interval Lower Bound","95% Credible Interval Upper Bound","Mean","Median","Standard Deviation")
  
      
  ####Print Output
      output
  
}
dat<-rep(c(5,6,7,8,9,8,7,6,5,4,5,6,7,8,9,8,7,6,5,6,7,8,9,8,9,8,7,6,5,6,5,6,7,6,5,4,3,4,5,4,5,5),40)
out<-bayesCost(dat,priorSd=1,priorSdofSd=5,priorMean=10,priorSample = 0)

out
## 95% Credible Interval Lower Bound 95% Credible Interval Upper Bound 
##                        6.21085631                        6.36053260 
##                              Mean                            Median 
##                        6.28576272                        6.28578029 
##                Standard Deviation 
##                        0.03817285