Bayesian Healthcare
Bayesian Healthcare
Tanner Phillips
2/12/2022
This project was a consultation with a healthcare actuarial firm. The two functions below are used to calculate the potential distribution of costs for a procedure, and by extension to set insurance premiums. A bayesian approach was taken so that the firm could give concrete, intepretable probability ranges to shareholders.
Function 1: Estimating how often a procedure will be utilized by customers.
The bayesUtilization function calculates a beta distribution on the mean utilization of a procedure in 1000 insured persons per month. The “cases” variable is the total number of observed uses of the procedure, “population” is the total number of insured persons over the period, and “months” is the total time period that the data was collected over, in months. This allows for a standardization to the industry standard of reporting utilization in units of procedures per 1000 insured persons per month.
This function uses a conjugate beta prior for a gamma distribution. The gamma distribution offers superior support given the right-skewed potential of the distribution– which is the exact kind of skew that would cause the firm to lose money.
bayesUtilization<-function(cases,population,months){
###Set Distribution Parameters
alpha<-cases
beta<-population-cases
###Simulate Distribution for Graphical Output
dist<-(rbeta(1000000,alpha,beta)/months)*1000
###Create Distribution histogram
hist(dist,breaks=100,col=blues9,xlab = "Utilization (Per 1000 Persons Per Month)",
main="Simulation Distribution of Mean Utilization",)
###Save Output
output<-c(qbeta(c(.025,.975),alpha,beta),
alpha/(alpha+beta),
((alpha-1/3)/((alpha+beta)-2/3)),
((alpha-1)/((alpha+beta)-2)),
sqrt((alpha*beta)/((alpha+beta)^2*(alpha+beta+1))))
output<-round((output/months)*1000,digits = 5)
names(output)<-c("95% Credible Interval Lower Bound","95% Credible Interval Upper Bound","Mean","Median","Mode","Standard Deviation")
###Print Output
output
}
out<-bayesUtilization(30,100000,1)
out
## 95% Credible Interval Lower Bound 95% Credible Interval Upper Bound
## 0.20242 0.41647
## Mean Median
## 0.30000 0.29667
## Mode Standard Deviation
## 0.29001 0.05476
Function 2: Estimate procedue cost per insured person.
The bayesCost function estimates a distribution for the average cost of a procedure. It requires a vector of observed values and four prior values:
- priorSD: Estimate of the standard deviation of the cost of the procedure.
- priorSDofSD: This is the estimated standard deviation OF the standard deviation. I.e., if “Theta” is the standard deviation of our prior, then priorSdofSd is the standard deviation of “Theta.”
- priorMean: the estimated mean cost of the procedure.
- priorSample: the prior “sample size.” If good rational can be given for a certain sample size for the prior, it can be entered here. If not, the default of 1 is used. This makes the entire prior carry much less weight.
We allow both the standard deviation and the mean to be random variables, meaning we use a Gamma prior for the standard deviation, and a normal prior for the mean. These Berkley lecture notes give a thorough but brief proof of the conjugate prior.
bayesCost<-function(dat,priorSd,priorSdofSd,priorMean,priorSample=1){
###Initial Prior Parameters
alpha<-(priorSd)^2/priorSdofSd^2
beta<-priorSd/priorSdofSd^2
muNot<-priorMean
nNot<-priorSample
n<-length(dat)
###Gamma Posterior Parameters
###Alpha
postAlpha<-alpha+n/2
###Beta
###calculate summation for beta formula
datdiffs<-1:length(dat)
meanval<-mean(dat)
for(i in 1:length(dat)){
datdiffs[i]<-(dat[i]-meanval)^2
}
###Output beta parameter
postBeta<-beta+(1/2)*sum(datdiffs)+((n*nNot)/(2*(n+nNot)))*(mean(dat)-muNot)^2
###Draws of SD from Gamma
tau<-as.matrix(rgamma(1000000,postAlpha,postBeta))
###Draws from Normal
draws<-1:length(tau)
for(i in 1:length(tau)){
weightedsamp<-n*tau[i]
totalsamp<-n*tau[i]+nNot*tau[i]
priorsamp<-nNot*tau[i]
draws[i]<-rnorm(1,mean=(weightedsamp/totalsamp)*meanval+(priorsamp/totalsamp)*priorMean,
sd=1/sqrt(weightedsamp+priorsamp))
}
draws<-sort(draws)
####Histogram of Data
hist(draws,breaks=100,col=blues9,xlab = "Cost (US Dollars)",
main="Distribution of Mean Cost",ylab = "Density")
####Save Output
output<-c(draws[25000],draws[975000],mean(draws),draws[500000],sd(draws))
names(output)<-c("95% Credible Interval Lower Bound","95% Credible Interval Upper Bound","Mean","Median","Standard Deviation")
####Print Output
output
}
dat<-rep(c(5,6,7,8,9,8,7,6,5,4,5,6,7,8,9,8,7,6,5,6,7,8,9,8,9,8,7,6,5,6,5,6,7,6,5,4,3,4,5,4,5,5),40)
out<-bayesCost(dat,priorSd=1,priorSdofSd=5,priorMean=10,priorSample = 0)
out
## 95% Credible Interval Lower Bound 95% Credible Interval Upper Bound
## 6.21085631 6.36053260
## Mean Median
## 6.28576272 6.28578029
## Standard Deviation
## 0.03817285