B Propensity Score Ranges

With regard to propensity score ranges, the range tends to shrink as the ratio of treatment-to-control increases. Figure 20 depicts the range and distribution of propensity scores (using logistic regression) with varying treatment-to-control ratios. The data used to create this figure is simulated and available in Appendix K. The psrange and plot.psrange functions are included in the multilevelPSA R package. Propensity scores are estimated with a single covariate where the mean for the treatment and control are 0.6 and 0.4, respectively. The standard deviation for both is 0.4. There are 100 treatment units and 1,000 control units simulated. The goal in choosing these means and standard deviations is to have some separation between treatment and control. Each row in the figure represents the percentage of control units sampled before estimating the propensity scores, starting with 100% (i.e. all 1,000 control units) to 10% (100 of the control units). As the figure shows, as the ratio decreases to where there are equal treatment and control units, the range of the propensity scores becomes more normal. To calculate the ranges, each sampling step is bootstrapped so the green bar and black points represent each of the 20 bootstrap samples taken. The bars then represent the mean of the minimum and mean of the maximum for each step.

The “shrinking” of propensity score ranges as the ratio of treatment-to-control increases has implications for the interpretation of propensity scores. Typically, propensity scores are interpreted as the probability of being in the treatment. For studies where the number of treatment and control units are roughly equal, this interpretation is valid. However, in cases where the ratio of treatment-to-control is large, it best to simply interpret the propensity scores as adjustment scores and not probabilities. Since the matching and stratification procedures utilize standard scores (i.e. the propensity score divided by the standard deviation of the propensity scores), should only impact interpretation of the propensity scores and should not impact on the estimated treatment e↵ects. It appears this issue has not been explored in either the PSA or logistic regression literature and additional exploration of the topic appears to be warranted.

library(multilevelPSA)
getSimulatedData <- function(nvars = 3, ntreat = 100, treat.mean = 0.6, treat.sd = 0.5, 
    ncontrol = 1000, control.mean = 0.4, control.sd = 0.5) {
    if (length(treat.mean) == 1) {
        treat.mean = rep(treat.mean, nvars)
    }
    if (length(treat.sd) == 1) {
        treat.sd = rep(treat.sd, nvars)
    }
    if (length(control.mean) == 1) {
        control.mean = rep(control.mean, nvars)
    }
    if (length(control.sd) == 1) {
        control.sd = rep(control.sd, nvars)
    }
    
    df <- c(rep(0, ncontrol), rep(1, ntreat))
    for (i in 1:nvars) {
        df <- cbind(df, c(rnorm(ncontrol, mean = control.mean[1], sd = control.sd[1]), 
            rnorm(ntreat, mean = treat.mean[1], sd = treat.sd[1])))
    }
    df <- as.data.frame(df)
    names(df) <- c("treat", letters[1:nvars])
    return(df)
}

1:10 (100 treatments, 1000 control units)

test.df1 <- getSimulatedData(ntreat = 100, ncontrol = 1000)
psranges1 <- psrange(test.df1, test.df1$treat, treat ~ ., samples = seq(100, 
    1000, by = 100), nboot = 20)
plot(psranges1)

summary(psranges1)

##       p ntreat ncontrol ratio   min.mean       min.sd min.median       min.se
## 1    10    100      100     1 0.15158085 0.0399247330 0.16675914 0.0089274417
## 21   20    100      200     2 0.08065862 0.0110939596 0.07987970 0.0024806848
## 41   30    100      300     3 0.05825915 0.0089598303 0.06117074 0.0020034790
## 61   40    100      400     4 0.03966169 0.0043536309 0.03993927 0.0009735015
## 81   50    100      500     5 0.03028191 0.0038363450 0.02969747 0.0008578328
## 101  60    100      600     6 0.02581532 0.0024130983 0.02594087 0.0005395852
## 121  70    100      700     7 0.02203176 0.0019850490 0.02179919 0.0004438705
## 141  80    100      800     8 0.01903728 0.0013822504 0.01864631 0.0003090806
## 161  90    100      900     9 0.01631016 0.0005450121 0.01627551 0.0001218684
## 181 100    100     1000    10 0.01453930 0.0000000000 0.01453930 0.0000000000
##        min.min    min.max  max.mean     max.sd max.median      max.se   max.min
## 1   0.06536710 0.19336279 0.8598281 0.04676196  0.8501328 0.010456291 0.7861258
## 21  0.05791970 0.10233899 0.7270197 0.04065439  0.7303578 0.009090597 0.6439753
## 41  0.03669016 0.07242601 0.6050233 0.05144961  0.5948197 0.011504482 0.5386225
## 61  0.03200825 0.05010319 0.5606233 0.02683686  0.5606224 0.006000904 0.5089432
## 81  0.02357618 0.03609465 0.5058454 0.02833751  0.4987880 0.006336460 0.4682137
## 101 0.02033462 0.02954242 0.4593096 0.02170712  0.4544798 0.004853860 0.4236357
## 121 0.01906877 0.02678785 0.4255822 0.02072414  0.4292869 0.004634059 0.3811220
## 141 0.01719583 0.02208829 0.3928887 0.01439171  0.3941605 0.003218084 0.3591961
## 161 0.01526377 0.01767972 0.3577524 0.01054775  0.3544386 0.002358548 0.3458710
## 181 0.01453930 0.01453930 0.3306953 0.00000000  0.3306953 0.000000000 0.3306953
##       max.max
## 1   0.9361766
## 21  0.8123514
## 41  0.7304035
## 61  0.6127437
## 81  0.5618276
## 101 0.5129718
## 121 0.4548082
## 141 0.4260502
## 161 0.3798020
## 181 0.3306953

1:20 (100 treatments, 2000 control units)

test.df2 <- getSimulatedData(ncontrol = 2000)
psranges2 <- psrange(test.df2, test.df2$treat, treat ~ ., samples = seq(100, 
    2000, by = 100), nboot = 20)
plot(psranges2)

summary(psranges2)

##       p ntreat ncontrol ratio    min.mean       min.sd  min.median       min.se
## 1     5    100      100     1 0.108479153 3.478953e-02 0.113600581 7.779176e-03
## 21   10    100      200     2 0.046371411 1.434092e-02 0.044963287 3.206727e-03
## 41   15    100      300     3 0.031643058 8.312285e-03 0.030554662 1.858683e-03
## 61   20    100      400     4 0.017812919 3.780810e-03 0.017660947 8.454147e-04
## 81   25    100      500     5 0.014711230 2.496677e-03 0.015022455 5.582738e-04
## 101  30    100      600     6 0.010719571 2.578118e-03 0.009863200 5.764847e-04
## 121  35    100      700     7 0.009514228 1.588785e-03 0.009334077 3.552632e-04
## 141  40    100      800     8 0.007691382 1.171925e-03 0.007477029 2.620504e-04
## 161  45    100      900     9 0.006889727 1.222477e-03 0.006726188 2.733541e-04
## 181  50    100     1000    10 0.006276889 5.128093e-04 0.006246931 1.146676e-04
## 201  55    100     1100    11 0.005318209 5.985626e-04 0.005257637 1.338427e-04
## 221  60    100     1200    12 0.004854049 3.641814e-04 0.004806155 8.143343e-05
## 241  65    100     1300    13 0.004545931 3.242687e-04 0.004520731 7.250869e-05
## 261  70    100     1400    14 0.004114851 2.213926e-04 0.004099411 4.950490e-05
## 281  75    100     1500    15 0.003827730 3.163611e-04 0.003794601 7.074050e-05
## 301  80    100     1600    16 0.003565950 1.937171e-04 0.003520204 4.331645e-05
## 321  85    100     1700    17 0.003321481 1.553915e-04 0.003348171 3.474661e-05
## 341  90    100     1800    18 0.003153977 7.759093e-05 0.003159270 1.734986e-05
## 361  95    100     1900    19 0.002936686 5.616000e-05 0.002935480 1.255776e-05
## 381 100    100     2000    20 0.002768276 0.000000e+00 0.002768276 0.000000e+00
##         min.min     min.max  max.mean      max.sd max.median      max.se
## 1   0.058393439 0.170711051 0.8819158 0.032569683  0.8876059 0.007282803
## 21  0.028012247 0.080533570 0.8039199 0.038136538  0.8047723 0.008527589
## 41  0.019685674 0.053076019 0.7255648 0.038281827  0.7324588 0.008560077
## 61  0.012532975 0.025846915 0.6973759 0.039330036  0.6887384 0.008794463
## 81  0.009046485 0.020133140 0.6381940 0.032865186  0.6328090 0.007348879
## 101 0.008257497 0.019303295 0.6197714 0.029943565  0.6216856 0.006695585
## 121 0.006840465 0.013738017 0.5788401 0.029811196  0.5822845 0.006665986
## 141 0.005879615 0.011053336 0.5589409 0.043400845  0.5554842 0.009704724
## 161 0.005364998 0.011462546 0.5217990 0.027818443  0.5244945 0.006220393
## 181 0.005345154 0.007100721 0.4896128 0.022943577  0.4909629 0.005130340
## 201 0.004317742 0.006380071 0.4837967 0.023518184  0.4848799 0.005258826
## 221 0.004328371 0.005446223 0.4630265 0.022398047  0.4599904 0.005008355
## 241 0.004046610 0.005226542 0.4414052 0.023000235  0.4404226 0.005143009
## 261 0.003657161 0.004503838 0.4322617 0.021211389  0.4340543 0.004743011
## 281 0.003223369 0.004460980 0.4089992 0.015626515  0.4098423 0.003494195
## 301 0.003307291 0.003902424 0.3932721 0.015087435  0.3914948 0.003373653
## 321 0.002975258 0.003588740 0.3851770 0.011768615  0.3860391 0.002631542
## 341 0.003005001 0.003287277 0.3760762 0.009748364  0.3764867 0.002179801
## 361 0.002838330 0.003063846 0.3659962 0.003577472  0.3653889 0.000799947
## 381 0.002768276 0.002768276 0.3550118 0.000000000  0.3550118 0.000000000
##       max.min   max.max
## 1   0.8071012 0.9270047
## 21  0.7323388 0.9063155
## 41  0.6191583 0.7766993
## 61  0.5878796 0.7694392
## 81  0.5915755 0.6922287
## 101 0.5586192 0.6855857
## 121 0.5269503 0.6250264
## 141 0.4901740 0.6524009
## 161 0.4480063 0.5559810
## 181 0.4391822 0.5360344
## 201 0.4462260 0.5317380
## 221 0.4234709 0.5015449
## 241 0.3962534 0.4878237
## 261 0.3966437 0.4709061
## 281 0.3779369 0.4337704
## 301 0.3746633 0.4216166
## 321 0.3634942 0.4050216
## 341 0.3531399 0.3906495
## 361 0.3578767 0.3711270
## 381 0.3550118 0.3550118

100 treatments, 1000 control units, equal means and standard deviations

test.df3 <- getSimulatedData(ncontrol = 1000, treat.mean = 0.5, control.mean = 0.5)
psranges3 <- psrange(test.df3, test.df3$treat, treat ~ ., samples = seq(100, 
    1000, by = 100), nboot = 20)
plot(psranges3)

summary(psranges3)

##       p ntreat ncontrol ratio   min.mean      min.sd min.median       min.se
## 1    10    100      100     1 0.28350601 0.064087988 0.28885238 0.0143305097
## 21   20    100      200     2 0.16705507 0.031529985 0.16046048 0.0070503191
## 41   30    100      300     3 0.12344740 0.019802371 0.12273928 0.0044279448
## 61   40    100      400     4 0.09522531 0.011914719 0.09279009 0.0026642122
## 81   50    100      500     5 0.07563025 0.008418365 0.07552338 0.0018824037
## 101  60    100      600     6 0.06076604 0.008521491 0.06124068 0.0019054633
## 121  70    100      700     7 0.05269255 0.005924987 0.05247310 0.0013248673
## 141  80    100      800     8 0.04586865 0.002395134 0.04624844 0.0005355682
## 161  90    100      900     9 0.04048884 0.002059943 0.04014488 0.0004606172
## 181 100    100     1000    10 0.03634508 0.000000000 0.03634508 0.0000000000
##        min.min    min.max  max.mean     max.sd max.median      max.se   max.min
## 1   0.16192908 0.37939385 0.7132598 0.04361440  0.7129799 0.009752476 0.6394674
## 21  0.10563283 0.22609654 0.5202766 0.04031786  0.5161472 0.009015347 0.4561985
## 41  0.09125843 0.16864202 0.4143754 0.02875963  0.4181810 0.006430849 0.3529161
## 61  0.07550229 0.12048280 0.3532632 0.02511123  0.3510016 0.005615041 0.3137257
## 81  0.06256572 0.09642339 0.2941406 0.02138895  0.2973524 0.004782715 0.2463536
## 101 0.05119536 0.08325665 0.2668781 0.01733506  0.2681488 0.003876236 0.2279083
## 121 0.04216244 0.06796214 0.2363525 0.01411165  0.2348300 0.003155462 0.2141914
## 141 0.04222852 0.05135097 0.2190724 0.01032624  0.2199282 0.002309018 0.2030472
## 161 0.03701920 0.04386187 0.1978531 0.00701747  0.1986408 0.001569154 0.1863798
## 181 0.03634508 0.03634508 0.1835126 0.00000000  0.1835126 0.000000000 0.1835126
##       max.max
## 1   0.7816167
## 21  0.6045684
## 41  0.4675561
## 61  0.3965734
## 81  0.3318606
## 101 0.2927092
## 121 0.2619002
## 141 0.2443409
## 161 0.2090405
## 181 0.1835126

100 treatments, 1000 control units, very little overlap

test.df4 <- getSimulatedData(ncontrol = 1000, treat.mean = 0.25, treat.sd = 0.3, 
    control.mean = 0.75, control.sd = 0.3)
psranges4 <- psrange(test.df4, test.df4$treat, treat ~ ., samples = seq(100, 
    1000, by = 100), nboot = 20)
plot(psranges4)

summary(psranges4)

##       p ntreat ncontrol ratio     min.mean       min.sd   min.median
## 1    10    100      100     1 1.525384e-05 2.837735e-05 2.299369e-06
## 21   20    100      200     2 2.386395e-06 4.263708e-06 6.796030e-07
## 41   30    100      300     3 7.854093e-07 1.369501e-06 3.322104e-07
## 61   40    100      400     4 1.139238e-07 9.950192e-08 9.888239e-08
## 81   50    100      500     5 4.970589e-08 7.085309e-08 1.910629e-08
## 101  60    100      600     6 9.456041e-08 8.805110e-08 6.876480e-08
## 121  70    100      700     7 4.562272e-08 3.277532e-08 4.271103e-08
## 141  80    100      800     8 4.834941e-08 1.732541e-08 4.683829e-08
## 161  90    100      900     9 3.734849e-08 1.141374e-08 3.473544e-08
## 181 100    100     1000    10 3.302817e-08 0.000000e+00 3.302817e-08
##           min.se      min.min      min.max  max.mean       max.sd max.median
## 1   6.345368e-06 8.242210e-10 1.189959e-04 0.9999919 1.177109e-05  0.9999975
## 21  9.533941e-07 4.512240e-09 1.459366e-05 0.9999698 4.357035e-05  0.9999873
## 41  3.062298e-07 9.339904e-09 5.821350e-06 0.9999734 2.437621e-05  0.9999837
## 61  2.224931e-08 5.006341e-09 3.452425e-07 0.9999718 2.790507e-05  0.9999803
## 81  1.584323e-08 1.753035e-09 3.004791e-07 0.9999730 3.183032e-05  0.9999857
## 101 1.968882e-08 1.862385e-08 4.026231e-07 0.9999570 1.988995e-05  0.9999546
## 121 7.328785e-09 1.690112e-09 1.093181e-07 0.9999481 3.208612e-05  0.9999523
## 141 3.874079e-09 1.740209e-08 9.088383e-08 0.9999326 1.783491e-05  0.9999357
## 161 2.552189e-09 1.706567e-08 5.651540e-08 0.9999274 1.839133e-05  0.9999295
## 181 0.000000e+00 3.302817e-08 3.302817e-08 0.9999211 0.000000e+00  0.9999211
##           max.se   max.min   max.max
## 1   2.632095e-06 0.9999539 1.0000000
## 21  9.742627e-06 0.9998202 0.9999998
## 41  5.450686e-06 0.9999165 0.9999973
## 61  6.239764e-06 0.9998793 0.9999976
## 81  7.117475e-06 0.9998587 0.9999988
## 101 4.447529e-06 0.9999103 0.9999906
## 121 7.174675e-06 0.9998882 0.9999962
## 141 3.988007e-06 0.9998969 0.9999700
## 161 4.112427e-06 0.9998986 0.9999602
## 181 0.000000e+00 0.9999211 0.9999211

100 treat, 1000 control, 10 covariates

test.df5 <- getSimulatedData(nvars = 10, ntreat = 100, ncontrol = 1000)
psranges5 <- psrange(test.df5, test.df5$treat, treat ~ ., samples = seq(100, 
    1000, by = 100), nboot = 20)
plot(psranges5)

summary(psranges5)

##       p ntreat ncontrol ratio     min.mean       min.sd   min.median
## 1    10    100      100     1 0.0101680007 5.571726e-03 0.0087507018
## 21   20    100      200     2 0.0032786858 1.986855e-03 0.0031712876
## 41   30    100      300     3 0.0016998723 8.777415e-04 0.0014178615
## 61   40    100      400     4 0.0012710429 7.476010e-04 0.0010189156
## 81   50    100      500     5 0.0009891488 3.613101e-04 0.0009233698
## 101  60    100      600     6 0.0006954162 1.699134e-04 0.0006825701
## 121  70    100      700     7 0.0005748031 1.098981e-04 0.0005927955
## 141  80    100      800     8 0.0005241338 7.766948e-05 0.0005085729
## 161  90    100      900     9 0.0004511794 4.957160e-05 0.0004584537
## 181 100    100     1000    10 0.0004033341 0.000000e+00 0.0004033341
##           min.se      min.min      min.max  max.mean      max.sd max.median
## 1   1.245876e-03 0.0034055594 0.0218076912 0.9797889 0.007606976  0.9810769
## 21  4.442742e-04 0.0010920758 0.0092490802 0.9667801 0.012485108  0.9700905
## 41  1.962690e-04 0.0007561575 0.0045260935 0.9539731 0.013264618  0.9562029
## 61  1.671687e-04 0.0006177711 0.0040260207 0.9394156 0.010526497  0.9424796
## 81  8.079140e-05 0.0005352006 0.0019389389 0.9221930 0.014294291  0.9217427
## 101 3.799379e-05 0.0002784764 0.0009768850 0.9113210 0.017183853  0.9104470
## 121 2.457395e-05 0.0004073415 0.0007375938 0.9032366 0.010021744  0.9045313
## 141 1.736742e-05 0.0003929492 0.0006895618 0.8848322 0.011660804  0.8843501
## 161 1.108455e-05 0.0003240626 0.0005327837 0.8735074 0.008826874  0.8708373
## 181 0.000000e+00 0.0004033341 0.0004033341 0.8639081 0.000000000  0.8639081
##          max.se   max.min   max.max
## 1   0.001700972 0.9612616 0.9912510
## 21  0.002791755 0.9357162 0.9860199
## 41  0.002966059 0.9106519 0.9720211
## 61  0.002353796 0.9129030 0.9566536
## 81  0.003196301 0.8929029 0.9531369
## 101 0.003842426 0.8850449 0.9491578
## 121 0.002240930 0.8856435 0.9235878
## 141 0.002607435 0.8583591 0.9084171
## 161 0.001973749 0.8641812 0.8983841
## 181 0.000000000 0.8639081 0.8639081

A Shiny Applications

C Methods for Estimating Propensity Scores