B Propensity Score Ranges
With regard to propensity score ranges, the range tends to shrink as the ratio of treatment-to-control increases. Figure 20 depicts the range and distribution of propensity scores (using logistic regression) with varying treatment-to-control ratios. The data used to create this figure is simulated and available in Appendix K. The psrange
and plot.psrange
functions are included in the multilevelPSA
R package. Propensity scores are estimated with a single covariate where the mean for the treatment and control are 0.6 and 0.4, respectively. The standard deviation for both is 0.4. There are 100 treatment units and 1,000 control units simulated. The goal in choosing these means and standard deviations is to have some separation between treatment and control. Each row in the figure represents the percentage of control units sampled before estimating the propensity scores, starting with 100% (i.e. all 1,000 control units) to 10% (100 of the control units). As the figure shows, as the ratio decreases to where there are equal treatment and control units, the range of the propensity scores becomes more normal. To calculate the ranges, each sampling step is bootstrapped so the green bar and black points represent each of the 20 bootstrap samples taken. The bars then represent the mean of the minimum and mean of the maximum for each step.
The “shrinking” of propensity score ranges as the ratio of treatment-to-control increases has implications for the interpretation of propensity scores. Typically, propensity scores are interpreted as the probability of being in the treatment. For studies where the number of treatment and control units are roughly equal, this interpretation is valid. However, in cases where the ratio of treatment-to-control is large, it best to simply interpret the propensity scores as adjustment scores and not probabilities. Since the matching and stratification procedures utilize standard scores (i.e. the propensity score divided by the standard deviation of the propensity scores), should only impact interpretation of the propensity scores and should not impact on the estimated treatment e↵ects. It appears this issue has not been explored in either the PSA or logistic regression literature and additional exploration of the topic appears to be warranted.
library(multilevelPSA)
getSimulatedData <- function(nvars = 3, ntreat = 100, treat.mean = 0.6, treat.sd = 0.5,
ncontrol = 1000, control.mean = 0.4, control.sd = 0.5) {
if (length(treat.mean) == 1) {
treat.mean = rep(treat.mean, nvars)
}
if (length(treat.sd) == 1) {
treat.sd = rep(treat.sd, nvars)
}
if (length(control.mean) == 1) {
control.mean = rep(control.mean, nvars)
}
if (length(control.sd) == 1) {
control.sd = rep(control.sd, nvars)
}
df <- c(rep(0, ncontrol), rep(1, ntreat))
for (i in 1:nvars) {
df <- cbind(df, c(rnorm(ncontrol, mean = control.mean[1], sd = control.sd[1]),
rnorm(ntreat, mean = treat.mean[1], sd = treat.sd[1])))
}
df <- as.data.frame(df)
names(df) <- c("treat", letters[1:nvars])
return(df)
}
1:10 (100 treatments, 1000 control units)
test.df1 <- getSimulatedData(ntreat = 100, ncontrol = 1000)
psranges1 <- psrange(test.df1, test.df1$treat, treat ~ ., samples = seq(100,
1000, by = 100), nboot = 20)
plot(psranges1)
summary(psranges1)
## p ntreat ncontrol ratio min.mean min.sd min.median min.se
## 1 10 100 100 1 0.15158085 0.0399247330 0.16675914 0.0089274417
## 21 20 100 200 2 0.08065862 0.0110939596 0.07987970 0.0024806848
## 41 30 100 300 3 0.05825915 0.0089598303 0.06117074 0.0020034790
## 61 40 100 400 4 0.03966169 0.0043536309 0.03993927 0.0009735015
## 81 50 100 500 5 0.03028191 0.0038363450 0.02969747 0.0008578328
## 101 60 100 600 6 0.02581532 0.0024130983 0.02594087 0.0005395852
## 121 70 100 700 7 0.02203176 0.0019850490 0.02179919 0.0004438705
## 141 80 100 800 8 0.01903728 0.0013822504 0.01864631 0.0003090806
## 161 90 100 900 9 0.01631016 0.0005450121 0.01627551 0.0001218684
## 181 100 100 1000 10 0.01453930 0.0000000000 0.01453930 0.0000000000
## min.min min.max max.mean max.sd max.median max.se max.min
## 1 0.06536710 0.19336279 0.8598281 0.04676196 0.8501328 0.010456291 0.7861258
## 21 0.05791970 0.10233899 0.7270197 0.04065439 0.7303578 0.009090597 0.6439753
## 41 0.03669016 0.07242601 0.6050233 0.05144961 0.5948197 0.011504482 0.5386225
## 61 0.03200825 0.05010319 0.5606233 0.02683686 0.5606224 0.006000904 0.5089432
## 81 0.02357618 0.03609465 0.5058454 0.02833751 0.4987880 0.006336460 0.4682137
## 101 0.02033462 0.02954242 0.4593096 0.02170712 0.4544798 0.004853860 0.4236357
## 121 0.01906877 0.02678785 0.4255822 0.02072414 0.4292869 0.004634059 0.3811220
## 141 0.01719583 0.02208829 0.3928887 0.01439171 0.3941605 0.003218084 0.3591961
## 161 0.01526377 0.01767972 0.3577524 0.01054775 0.3544386 0.002358548 0.3458710
## 181 0.01453930 0.01453930 0.3306953 0.00000000 0.3306953 0.000000000 0.3306953
## max.max
## 1 0.9361766
## 21 0.8123514
## 41 0.7304035
## 61 0.6127437
## 81 0.5618276
## 101 0.5129718
## 121 0.4548082
## 141 0.4260502
## 161 0.3798020
## 181 0.3306953
1:20 (100 treatments, 2000 control units)
test.df2 <- getSimulatedData(ncontrol = 2000)
psranges2 <- psrange(test.df2, test.df2$treat, treat ~ ., samples = seq(100,
2000, by = 100), nboot = 20)
plot(psranges2)
summary(psranges2)
## p ntreat ncontrol ratio min.mean min.sd min.median min.se
## 1 5 100 100 1 0.108479153 3.478953e-02 0.113600581 7.779176e-03
## 21 10 100 200 2 0.046371411 1.434092e-02 0.044963287 3.206727e-03
## 41 15 100 300 3 0.031643058 8.312285e-03 0.030554662 1.858683e-03
## 61 20 100 400 4 0.017812919 3.780810e-03 0.017660947 8.454147e-04
## 81 25 100 500 5 0.014711230 2.496677e-03 0.015022455 5.582738e-04
## 101 30 100 600 6 0.010719571 2.578118e-03 0.009863200 5.764847e-04
## 121 35 100 700 7 0.009514228 1.588785e-03 0.009334077 3.552632e-04
## 141 40 100 800 8 0.007691382 1.171925e-03 0.007477029 2.620504e-04
## 161 45 100 900 9 0.006889727 1.222477e-03 0.006726188 2.733541e-04
## 181 50 100 1000 10 0.006276889 5.128093e-04 0.006246931 1.146676e-04
## 201 55 100 1100 11 0.005318209 5.985626e-04 0.005257637 1.338427e-04
## 221 60 100 1200 12 0.004854049 3.641814e-04 0.004806155 8.143343e-05
## 241 65 100 1300 13 0.004545931 3.242687e-04 0.004520731 7.250869e-05
## 261 70 100 1400 14 0.004114851 2.213926e-04 0.004099411 4.950490e-05
## 281 75 100 1500 15 0.003827730 3.163611e-04 0.003794601 7.074050e-05
## 301 80 100 1600 16 0.003565950 1.937171e-04 0.003520204 4.331645e-05
## 321 85 100 1700 17 0.003321481 1.553915e-04 0.003348171 3.474661e-05
## 341 90 100 1800 18 0.003153977 7.759093e-05 0.003159270 1.734986e-05
## 361 95 100 1900 19 0.002936686 5.616000e-05 0.002935480 1.255776e-05
## 381 100 100 2000 20 0.002768276 0.000000e+00 0.002768276 0.000000e+00
## min.min min.max max.mean max.sd max.median max.se
## 1 0.058393439 0.170711051 0.8819158 0.032569683 0.8876059 0.007282803
## 21 0.028012247 0.080533570 0.8039199 0.038136538 0.8047723 0.008527589
## 41 0.019685674 0.053076019 0.7255648 0.038281827 0.7324588 0.008560077
## 61 0.012532975 0.025846915 0.6973759 0.039330036 0.6887384 0.008794463
## 81 0.009046485 0.020133140 0.6381940 0.032865186 0.6328090 0.007348879
## 101 0.008257497 0.019303295 0.6197714 0.029943565 0.6216856 0.006695585
## 121 0.006840465 0.013738017 0.5788401 0.029811196 0.5822845 0.006665986
## 141 0.005879615 0.011053336 0.5589409 0.043400845 0.5554842 0.009704724
## 161 0.005364998 0.011462546 0.5217990 0.027818443 0.5244945 0.006220393
## 181 0.005345154 0.007100721 0.4896128 0.022943577 0.4909629 0.005130340
## 201 0.004317742 0.006380071 0.4837967 0.023518184 0.4848799 0.005258826
## 221 0.004328371 0.005446223 0.4630265 0.022398047 0.4599904 0.005008355
## 241 0.004046610 0.005226542 0.4414052 0.023000235 0.4404226 0.005143009
## 261 0.003657161 0.004503838 0.4322617 0.021211389 0.4340543 0.004743011
## 281 0.003223369 0.004460980 0.4089992 0.015626515 0.4098423 0.003494195
## 301 0.003307291 0.003902424 0.3932721 0.015087435 0.3914948 0.003373653
## 321 0.002975258 0.003588740 0.3851770 0.011768615 0.3860391 0.002631542
## 341 0.003005001 0.003287277 0.3760762 0.009748364 0.3764867 0.002179801
## 361 0.002838330 0.003063846 0.3659962 0.003577472 0.3653889 0.000799947
## 381 0.002768276 0.002768276 0.3550118 0.000000000 0.3550118 0.000000000
## max.min max.max
## 1 0.8071012 0.9270047
## 21 0.7323388 0.9063155
## 41 0.6191583 0.7766993
## 61 0.5878796 0.7694392
## 81 0.5915755 0.6922287
## 101 0.5586192 0.6855857
## 121 0.5269503 0.6250264
## 141 0.4901740 0.6524009
## 161 0.4480063 0.5559810
## 181 0.4391822 0.5360344
## 201 0.4462260 0.5317380
## 221 0.4234709 0.5015449
## 241 0.3962534 0.4878237
## 261 0.3966437 0.4709061
## 281 0.3779369 0.4337704
## 301 0.3746633 0.4216166
## 321 0.3634942 0.4050216
## 341 0.3531399 0.3906495
## 361 0.3578767 0.3711270
## 381 0.3550118 0.3550118
100 treatments, 1000 control units, equal means and standard deviations
test.df3 <- getSimulatedData(ncontrol = 1000, treat.mean = 0.5, control.mean = 0.5)
psranges3 <- psrange(test.df3, test.df3$treat, treat ~ ., samples = seq(100,
1000, by = 100), nboot = 20)
plot(psranges3)
summary(psranges3)
## p ntreat ncontrol ratio min.mean min.sd min.median min.se
## 1 10 100 100 1 0.28350601 0.064087988 0.28885238 0.0143305097
## 21 20 100 200 2 0.16705507 0.031529985 0.16046048 0.0070503191
## 41 30 100 300 3 0.12344740 0.019802371 0.12273928 0.0044279448
## 61 40 100 400 4 0.09522531 0.011914719 0.09279009 0.0026642122
## 81 50 100 500 5 0.07563025 0.008418365 0.07552338 0.0018824037
## 101 60 100 600 6 0.06076604 0.008521491 0.06124068 0.0019054633
## 121 70 100 700 7 0.05269255 0.005924987 0.05247310 0.0013248673
## 141 80 100 800 8 0.04586865 0.002395134 0.04624844 0.0005355682
## 161 90 100 900 9 0.04048884 0.002059943 0.04014488 0.0004606172
## 181 100 100 1000 10 0.03634508 0.000000000 0.03634508 0.0000000000
## min.min min.max max.mean max.sd max.median max.se max.min
## 1 0.16192908 0.37939385 0.7132598 0.04361440 0.7129799 0.009752476 0.6394674
## 21 0.10563283 0.22609654 0.5202766 0.04031786 0.5161472 0.009015347 0.4561985
## 41 0.09125843 0.16864202 0.4143754 0.02875963 0.4181810 0.006430849 0.3529161
## 61 0.07550229 0.12048280 0.3532632 0.02511123 0.3510016 0.005615041 0.3137257
## 81 0.06256572 0.09642339 0.2941406 0.02138895 0.2973524 0.004782715 0.2463536
## 101 0.05119536 0.08325665 0.2668781 0.01733506 0.2681488 0.003876236 0.2279083
## 121 0.04216244 0.06796214 0.2363525 0.01411165 0.2348300 0.003155462 0.2141914
## 141 0.04222852 0.05135097 0.2190724 0.01032624 0.2199282 0.002309018 0.2030472
## 161 0.03701920 0.04386187 0.1978531 0.00701747 0.1986408 0.001569154 0.1863798
## 181 0.03634508 0.03634508 0.1835126 0.00000000 0.1835126 0.000000000 0.1835126
## max.max
## 1 0.7816167
## 21 0.6045684
## 41 0.4675561
## 61 0.3965734
## 81 0.3318606
## 101 0.2927092
## 121 0.2619002
## 141 0.2443409
## 161 0.2090405
## 181 0.1835126
100 treatments, 1000 control units, very little overlap
test.df4 <- getSimulatedData(ncontrol = 1000, treat.mean = 0.25, treat.sd = 0.3,
control.mean = 0.75, control.sd = 0.3)
psranges4 <- psrange(test.df4, test.df4$treat, treat ~ ., samples = seq(100,
1000, by = 100), nboot = 20)
plot(psranges4)
summary(psranges4)
## p ntreat ncontrol ratio min.mean min.sd min.median
## 1 10 100 100 1 1.525384e-05 2.837735e-05 2.299369e-06
## 21 20 100 200 2 2.386395e-06 4.263708e-06 6.796030e-07
## 41 30 100 300 3 7.854093e-07 1.369501e-06 3.322104e-07
## 61 40 100 400 4 1.139238e-07 9.950192e-08 9.888239e-08
## 81 50 100 500 5 4.970589e-08 7.085309e-08 1.910629e-08
## 101 60 100 600 6 9.456041e-08 8.805110e-08 6.876480e-08
## 121 70 100 700 7 4.562272e-08 3.277532e-08 4.271103e-08
## 141 80 100 800 8 4.834941e-08 1.732541e-08 4.683829e-08
## 161 90 100 900 9 3.734849e-08 1.141374e-08 3.473544e-08
## 181 100 100 1000 10 3.302817e-08 0.000000e+00 3.302817e-08
## min.se min.min min.max max.mean max.sd max.median
## 1 6.345368e-06 8.242210e-10 1.189959e-04 0.9999919 1.177109e-05 0.9999975
## 21 9.533941e-07 4.512240e-09 1.459366e-05 0.9999698 4.357035e-05 0.9999873
## 41 3.062298e-07 9.339904e-09 5.821350e-06 0.9999734 2.437621e-05 0.9999837
## 61 2.224931e-08 5.006341e-09 3.452425e-07 0.9999718 2.790507e-05 0.9999803
## 81 1.584323e-08 1.753035e-09 3.004791e-07 0.9999730 3.183032e-05 0.9999857
## 101 1.968882e-08 1.862385e-08 4.026231e-07 0.9999570 1.988995e-05 0.9999546
## 121 7.328785e-09 1.690112e-09 1.093181e-07 0.9999481 3.208612e-05 0.9999523
## 141 3.874079e-09 1.740209e-08 9.088383e-08 0.9999326 1.783491e-05 0.9999357
## 161 2.552189e-09 1.706567e-08 5.651540e-08 0.9999274 1.839133e-05 0.9999295
## 181 0.000000e+00 3.302817e-08 3.302817e-08 0.9999211 0.000000e+00 0.9999211
## max.se max.min max.max
## 1 2.632095e-06 0.9999539 1.0000000
## 21 9.742627e-06 0.9998202 0.9999998
## 41 5.450686e-06 0.9999165 0.9999973
## 61 6.239764e-06 0.9998793 0.9999976
## 81 7.117475e-06 0.9998587 0.9999988
## 101 4.447529e-06 0.9999103 0.9999906
## 121 7.174675e-06 0.9998882 0.9999962
## 141 3.988007e-06 0.9998969 0.9999700
## 161 4.112427e-06 0.9998986 0.9999602
## 181 0.000000e+00 0.9999211 0.9999211
100 treat, 1000 control, 10 covariates
test.df5 <- getSimulatedData(nvars = 10, ntreat = 100, ncontrol = 1000)
psranges5 <- psrange(test.df5, test.df5$treat, treat ~ ., samples = seq(100,
1000, by = 100), nboot = 20)
plot(psranges5)
summary(psranges5)
## p ntreat ncontrol ratio min.mean min.sd min.median
## 1 10 100 100 1 0.0101680007 5.571726e-03 0.0087507018
## 21 20 100 200 2 0.0032786858 1.986855e-03 0.0031712876
## 41 30 100 300 3 0.0016998723 8.777415e-04 0.0014178615
## 61 40 100 400 4 0.0012710429 7.476010e-04 0.0010189156
## 81 50 100 500 5 0.0009891488 3.613101e-04 0.0009233698
## 101 60 100 600 6 0.0006954162 1.699134e-04 0.0006825701
## 121 70 100 700 7 0.0005748031 1.098981e-04 0.0005927955
## 141 80 100 800 8 0.0005241338 7.766948e-05 0.0005085729
## 161 90 100 900 9 0.0004511794 4.957160e-05 0.0004584537
## 181 100 100 1000 10 0.0004033341 0.000000e+00 0.0004033341
## min.se min.min min.max max.mean max.sd max.median
## 1 1.245876e-03 0.0034055594 0.0218076912 0.9797889 0.007606976 0.9810769
## 21 4.442742e-04 0.0010920758 0.0092490802 0.9667801 0.012485108 0.9700905
## 41 1.962690e-04 0.0007561575 0.0045260935 0.9539731 0.013264618 0.9562029
## 61 1.671687e-04 0.0006177711 0.0040260207 0.9394156 0.010526497 0.9424796
## 81 8.079140e-05 0.0005352006 0.0019389389 0.9221930 0.014294291 0.9217427
## 101 3.799379e-05 0.0002784764 0.0009768850 0.9113210 0.017183853 0.9104470
## 121 2.457395e-05 0.0004073415 0.0007375938 0.9032366 0.010021744 0.9045313
## 141 1.736742e-05 0.0003929492 0.0006895618 0.8848322 0.011660804 0.8843501
## 161 1.108455e-05 0.0003240626 0.0005327837 0.8735074 0.008826874 0.8708373
## 181 0.000000e+00 0.0004033341 0.0004033341 0.8639081 0.000000000 0.8639081
## max.se max.min max.max
## 1 0.001700972 0.9612616 0.9912510
## 21 0.002791755 0.9357162 0.9860199
## 41 0.002966059 0.9106519 0.9720211
## 61 0.002353796 0.9129030 0.9566536
## 81 0.003196301 0.8929029 0.9531369
## 101 0.003842426 0.8850449 0.9491578
## 121 0.002240930 0.8856435 0.9235878
## 141 0.002607435 0.8583591 0.9084171
## 161 0.001973749 0.8641812 0.8983841
## 181 0.000000000 0.8639081 0.8639081