10 Effects of Tutoring on Course Grades

In the first example we will utilize observational data obtained to evaluate the effectiveness of tutoring services on course grades. Treatment students consisted of those students who used tutoring services while enrolled in a online writing course between 2008 and 2011. A comparison group was identified as students enrolled in a course section with a student who used tutoring services. The treatment group was then divided into two based upon the number of times they utilized tutoring services. “Novice” users are those who used the services once and “regular” users are those who used services two or more times. Covariates available for estimating propensity scores are gender, ethnicity, military status, English second language learner, educational level for mother and father, age at the beginning of the course, employment level at college enrollment, income level at college enrollment, number of transfer credits, and GPA at the start of the course.

names(tutoring)
##  [1] "treat"      "Course"     "Grade"      "Gender"     "Ethnicity" 
##  [6] "Military"   "ESL"        "EdMother"   "EdFather"   "Age"       
## [11] "Employment" "Income"     "Transfer"   "GPA"        "GradeCode" 
## [16] "Level"      "ID"         "treat2"

The courses represented here are structured such that the variation from section-to-section is minimal. However, the differences between courses is substantial and therefore we will utilize partial exact matching so that all matched students will have taken the same course.

table(tutoring$treat, tutoring$Course, useNA="ifany")
##          
##           ENG*101 ENG*201 HSC*310
##   Control     349     518      51
##   Treat1       22      36      76
##   Treat2       31      32      27

The first step of analysis is to estimate the propensity scores. The trips function will estimate three propensity score models, \(PS_1\), \(PS_2\), and \(PS_3\) as described above. Note that when specifying the formula the dependent variable, or treatment indicator, is not included. The trips function will replace the dependent variable as it estimates the three logistic regression models.

formu <- ~ Gender + Ethnicity + Military + ESL + EdMother + EdFather + 
Age + Employment + Income + Transfer + GPA
tutoring.tpsa <- trips(tutoring, tutoring$treat, formu)

Figure 10.1 is a triangle plot that depicts the propensity scores from the three models. Since each student has two propensity scores, their scores are connected with a line. The black line in Figure 10.1 represents one matched triplet estimated below.

plot(tutoring.tpsa)

The default for trimatch is to use the maximumTreat method retaining each treatment unit once with treatment one units matched more than once only if the corresponding treatment two unit would not be matched otherwise.

tutoring.matched <- trimatch(tutoring.tpsa, exact=tutoring[,c("Course")])

Setting the method parameter to NULL will result in caliper matching. All matched triplets within the specified caliper are retained. This will result in the largest number of matched triplets.

tutoring.matched.caliper <- trimatch(tutoring.tpsa, 
exact=tutoring[,c("Course")], method=NULL)

Lastly, we will use the OneToN method to retain a 2-to-1-to-n and 3-to-2-n matches.

tutoring.matched.2to1 <- trimatch(tutoring.tpsa, 
exact=tutoring[,c("Course")], method=OneToN, M1=2, M2=1)
tutoring.matched.3to2 <- trimatch(tutoring.tpsa, 
exact=tutoring[,c("Course")], 
method=OneToN, M1=3, M2=2)
print(plot(tutoring.matched, rows=c(50), line.alpha=1, draw.segments=TRUE))
Traingle Plot

Figure 10.1: Traingle Plot

10.1 Examining Unmatched Students

The different methods for retaining matched triplets address the issue of overrepresentation of treatment units. In this example there four times as many control units as treatment units (the ratio is larger when considering the treatments separately). These methods fall on a spectrum where each treatment unit is used minimally (maximumTreat method) or all units are used (caliper matching). (Rosenbaum 2012) suggests testing hypothesis more than once and it is our general recommendation to utilize multiple methods. Functions to help present and compare the results from multiple methods are provided and discussed below.

The unmatched function will return the rows of students who were not matched. The summary function will provide information about how many students within each group were not matched. As shown below, the caliper matching will match the most students. In this particular example, in fact, the only substantial difference in the unmatched students is with the control group. All methods fail to match 37 treatment one students. This is due to the fact that there is not another student within the specified caliper that match exactly on the course.

summary(unmatched(tutoring.matched))
## 892 (78.1%) of 1142 total data points were not matched.
## Unmatched by treatment:
##     Control      Treat1      Treat2 
## 834 (90.8%)  39 (29.1%)  19 (21.1%)
summary(unmatched(tutoring.matched.caliper))
## 638 (55.9%) of 1142 total data points were not matched.
## Unmatched by treatment:
##     Control      Treat1      Treat2 
## 580 (63.2%)  39 (29.1%)  19 (21.1%)
summary(unmatched(tutoring.matched.2to1))
## 1000 (87.6%) of 1142 total data points were not matched.
## Unmatched by treatment:
##     Control      Treat1      Treat2 
## 870 (94.8%)  97 (72.4%)  33 (36.7%)
summary(unmatched(tutoring.matched.3to2))
## 928 (81.3%) of 1142 total data points were not matched.
## Unmatched by treatment:
##     Control      Treat1      Treat2 
## 831 (90.5%)    75 (56%)  22 (24.4%)

10.2 Checking Balance

The eventual strength of propensity score methods is dependent on how well balance is achieved. (J. E. Helmreich and Pruzek 2009) introduced graphical approaches to evaluating balance. We provide functions that extend that framework to matching of three groups. Figure 10.2 is a multiple covariate balance plot that plots the absolute effect size of each covariate before and after adjustment. In this example, the figure suggests that reasonable balance has been achieved across all covariates and across all three models since effect sizes are smaller than the unadjusted in most cases and relatively small.

print(multibalance.plot(tutoring.tpsa) + ggtitle("Covariate Balance Plot"))
Multiple Covariate Balance Plot of Absolute Standardized Effect Sizes Before and After Propensity Score Adjustment

Figure 10.2: Multiple Covariate Balance Plot of Absolute Standardized Effect Sizes Before and After Propensity Score Adjustment

Figure 10.3 is the results of the balance.plot function. This function will provide a bar chart for categorical covariates and box plots for quantitative covariates, individually or in a grid.

bplots <- balance.plot(tutoring.matched, tutoring[,all.vars(formu)], 
        legend.position="none", x.axis.labels=c("C","T1","T1"), x.axis.angle=0)
print(plot(bplots, cols=3, byrow=FALSE))
Covariate Balance Plots

Figure 10.3: Covariate Balance Plots

## NULL

10.3 Phase II: Estimating Effects of Tutoring on Course Grades

In phase two of propensity score analysis we wish to compare our outcome of interest, course grade in this example, across the matches. A custom merge function is provided to merge an outcome from the original data frame to the results of trimatch. This merge function will add three columns with the outcome for each of the three groups.

matched.out <- merge(tutoring.matched, tutoring$Grade)
names(matched.out)
##  [1] "Treat1"      "Treat2"      "Control"     "D.m3"        "D.m2"       
##  [6] "D.m1"        "Dtotal"      "Treat1.out"  "Treat2.out"  "Control.out"
head(matched.out)
##   Treat1 Treat2 Control        D.m3        D.m2         D.m1     Dtotal
## 1    368     39     331 0.007053754 0.001788577 1.039322e-02 0.01923555
## 2    158    279     365 0.003373585 0.009530680 1.071188e-02 0.02361614
## 3    899    209     100 0.001929173 0.013633300 9.183572e-03 0.02474604
## 4    692    596    1055 0.023785086 0.010292177 1.862058e-03 0.03593932
## 5    616    209     208 0.020203398 0.016562521 3.168865e-05 0.03679761
## 6     28    852     154 0.007501996 0.014214100 1.776640e-02 0.03948249
##   Treat1.out Treat2.out Control.out
## 1          4          4           0
## 2          4          4           4
## 3          4          3           4
## 4          4          3           4
## 5          4          3           0
## 6          4          4           2

Although the merge function is convenient for conducting your own analysis, the summary function will perform the most common analyses including Friedman Rank Sum test and repeated measures ANOVA. If either of those tests produce a p value less than the specified threshold (0.05 by default), then the summary function will also perform and return Wilcoxon signed rank test and three separate dependent sample t-tests see (Austin 2010) for discussion of dependent versus independent t-tests.

s1 <- summary(tutoring.matched, tutoring$Grade)
names(s1)
## [1] "PercentMatched"       "friedman.test"        "rmanova"             
## [4] "pairwise.wilcox.test" "t.tests"
s1$friedman.test
## 
##  Friedman rank sum test
## 
## data:  Outcome and Treatment and ID
## Friedman chi-squared = 27.904, df = 2, p-value = 8.726e-07
s1$t.tests
##               Treatments         t  df      p.value sig  mean.diff     ci.min
## 1  Treat1.out-Treat2.out -2.791302 117 6.132601e-03  ** -0.3220339 -0.5505191
## 2 Treat1.out-Control.out  3.817862 117 2.168534e-04 ***  0.7033898  0.3385190
## 3 Treat2.out-Control.out  6.922623 117 2.539540e-10 ***  1.0254237  0.7320670
##        ci.max
## 1 -0.09354868
## 2  1.06826071
## 3  1.31878047

The print method will accept multiple object returned by summary so to combine them into a single table output. Note that each parameter must be named and that name will be used to identify the row containing those results.

s2 <- summary(tutoring.matched.caliper, tutoring$Grade)
s3 <- summary(tutoring.matched.2to1, tutoring$Grade)
s4 <- summary(tutoring.matched.3to2, tutoring$Grade)

print("Max Treat"=s1, "Caliper"=s2, "2-to-1"=s3, "3-to-2"=s4)
##      Method Friedman.chi2   Friedman.p     rmANOVA.F    rmANOVA.p    
## 1 Max Treat      27.90354 8.726175e-07 ***  23.84506 3.758176e-10 ***
## 2   Caliper     112.99646 2.904890e-25 *** 108.21057 2.374092e-45 ***
## 3    2-to-1      18.95541 7.653924e-05 ***  15.54945 1.097932e-06 ***
## 4    3-to-2      32.27083 9.828281e-08 ***  25.03567 1.538341e-10 ***
boxdiff.plot(tutoring.matched, tutoring$Grade, 
             ordering=c("Treat2","Treat1","Control")) + 
    ggtitle("Maximum Treatment Matching")
boxdiff.plot(tutoring.matched.caliper, tutoring$Grade, 
             ordering=c("Treat2","Treat1","Control")) +
    ggtitle("Caliper Matching")
boxdiff.plot(tutoring.matched.2to1, tutoring$Grade, 
             ordering=c("Treat2","Treat1","Control")) +
    ggtitle("2-to-1-to-n Matching")
Boxplot of DifferencesBoxplot of DifferencesBoxplot of Differences

Figure 10.4: Boxplot of Differences

Another useful visualization for presenting the results is the Loess plot. In Figure 10.5 we plot the propensity scores on the x-axis and the outcome (grade in this example) on the y-axis. A Loess regression line is then overlaid.4 Since there are three propensity score scales, the plot.loess3 function will use the propensity scores from the model predicting treatment one from treatment two. Propensity scores for the control group are then imputed by taking the mean of the propensity scores of the two treatment units that control was matched to. It should be noted that if a control unit is matched to two different sets of treatment units, then that control unit will have two propensity scores. Which propensity score scale is utilized can be explicitly specified using the model parameter.

loess3.plot(tutoring.matched.caliper, tutoring$Grade, ylab="Grade", 
            points.alpha=.1, method="loess")
Loess Plot for Caliper Matching

Figure 10.5: Loess Plot for Caliper Matching