Category Archives: Research Methods / Methodological Features and Effect Sizes

RESEARCH Methods / Methodological Features and Effect Sizes

As evidence-based reform becomes increasingly important in educational policy, it is becoming essential to understand how research design might contribute to reported effect sizes in experiments evaluating educational programs. The purpose of this article is to examine how methodological features such as types of publication, sample sizes, and research designs affect effect sizes in experiments. A total of 645 studies from 12 recent reviews of evaluations of reading, mathematics, and science programs were studied. The findings suggest that effect sizes are roughly twice as large for published articles, small-scale trials, and experimenter-made measures, than for unpublished documents, large-scale studies, and independent measures, respectively.  In addition, effect sizes are significantly higher in quasi-experiments than in randomized experiments. Explanations for the effects of methodological features on effect sizes are discussed, as are implications for evidence-based policy.

Technical Report

Published Report

Cheung, A., & Slavin, R. (2016). How methodological features affect effect sizes in education. Educational Researcher, 45 (5), 283-292.

Rigorous evidence of program effectiveness has become increasingly important with the 2015 passage of the Every Student Succeeds Act (ESSA). One question that has not yet been fully explored is whether program evaluations carried out or commissioned by developers produce larger effect sizes than evaluations conducted by independent third parties. Using study data from the What Works Clearinghouse, we find evidence of a “developer effect,” where program evaluations carried out or commissioned by developers produced average effect sizes that were substantially larger than those identified in evaluations conducted by independent parties. We explore potential reasons for the existence of a “developer effect” and provide evidence that interventions evaluated by developers were not simply more effective than those evaluated by independent parties. We conclude by discussing plausible explanations for this phenomenon as well as providing suggestions for researchers to mitigate potential bias in evaluations moving forward.

Technical Report

Published Report

Wolf, R., Morrison, J.M., Inns, A., Slavin, R. E., & Risman, K. (2020). Average effect sizes in developer-commissioned and independent evaluations. Journal of Research on Educational Effectiveness. DOI: 10.1080/19345747.2020.1726537

Large-scale randomized studies provide the best means of evaluating practical, replicable approaches to improving educational outcomes. This article discusses the advantages, problems, and pitfalls of these evaluations, focusing on alternative methods of randomization, recruitment, ensuring high-quality implementation, dealing with attrition, and data analysis. It also discusses means of increasing the chances that large randomized experiments will find positive effects, and interpreting effect sizes.

Technical Report

Published Report

Slavin, R., & Cheung, A. (2017). Lessons learned from large-scale randomized experiments.Journal of Education for Students Placed at Risk. Doi: http://dx.doi.org/10.1080/10824669.2017.1360774

Program effectiveness reviews in education seek to provide educators with scientifically valid and useful summaries of evidence on achievement effects of various interventions. Different reviewers have different policies on measures of content taught in the experimental group but not the control group, called here treatment-inherent measures. These are contrasted with treatment-independent measures of content emphasized equally in experimental and control groups. The What Works Clearinghouse (WWC) averages effect sizes from such measures with those from treatment-independent measures, while the Best Evidence Encyclopedia (BEE) excludes treatment-inherent measures. This article contrasts effect sizes from treatment-inherent and treatment-independent measures in WWC reading and math reviews to explore the degree to which these measures produce different estimates. In all comparisons, treatment-inherent measures produce much larger positive effect sizes than treatment-independent measures. Based on these findings, it is suggested that program effectiveness reviews exclude treatment-inherent measures, or at least report them separately.

Technical Report

Published Report

Slavin, R.E., & Madden, N.A. (2011). Measures inherent to treatments in program effectiveness reviews. Journal of Research on Educational Effectiveness, 4 (4), 370-380.

Research in fields other than education has found that studies with small sample sizes tend to have larger effect sizes than those with large samples.  This article examines the relationship between sample size and effect size in education. It analyzes data from 185 studies of elementary and secondary mathematics programs that met the standards of the Best-Evidence Encyclopedia. As predicted, there was a significant negative correlation between sample size and effect size. The differences in effect sizes between small and large experiments were much greater than those between randomized and matched experiments. Explanations for the effects of sample size on effect size are discussed.

Technical Report

Published Report

Slavin, R.E., & Smith, D. (2009).  The relationship between sample sizes and effect sizes in systematic reviews in education.  Educational Evaluation and Policy Analysis, 31 (4), 500-506.