Lack of systematic selection of interventions and lack of specificity of treatment effects. Due to a variety of circumstances detailed earlier, the
Follow Through programs were not systematically developed or selected according to any type of uniform criteria. Given more time, sponsors may have been able to better identify the types of treatment effects that an observer might expect to occur under controlled conditions. More importantly, program sponsors might also have been required to show those specific facets of their interventions (e.g., particular pedagogical techniques) which would produce the intended effects. Despite these flaws, the sponsors agreed on being subject to the same evaluation instruments. Unfortunately, the instruments shed little light on what about the ineffective programs made them so unsuccessful. The converse is also true. Since structured programs tended to show better effects than the unstructured ones, it would appear logical that efforts might have been made to identify commonalities among the effective structured programs. These shared characteristics might have informed the development of additional effective programs or made the ineffective approaches better. Starting in 1982, funding was in fact reduced for those programs that were identified as successful in
Follow Through, perhaps on the presumption that funding would be better diverted to assisting failed programs. Ultimately, programs that had lesser empirical validation were nonetheless recommended for dissemination along with the successful models.
Lack of random assignment.
Random assignment of subjects into
treatment and
control groups is the ideal method of attributing change in a sample to an intervention and not to some other effect (including the pre-existing capabilities of students, teachers, or school systems). However, for a variety of practical reasons, this procedure was not done in
Follow Through. Instead, sites were selected "opportunistically", based on their readiness to participate in the evaluation, and on their unique circumstances of need. As Stebbins
et al. point out, the treatment groups were often the neediest children. To randomly select some of the most disadvantaged children (many of whom participated in Head Start prior to
Follow Through) out of the evaluation would certainly have been negatively perceived by community members. Stebbins
et al. point out that there were "considerable variations in the range of children served"; yet despite the presence of "many of the problems inherent in field social research...evaluations of these planned variations provides us with an opportunity to examine the educational strategies under real life conditions as opposed to contrived and tightly controlled laboratory conditions".
Narrowness of instruments. Adams and Engelmann note that many critics have suggested that more instruments should have been used in the
Follow Through evaluation. Egbert agrees with Adams and Engelmann that the data collection efforts were extensive. Despite the agreement among model sponsors on a uniform set of instruments to evaluate the effectiveness of their models—that model sponsors believed their programs achieved gains on more intrinsic, less measurable indicators of performance, such as increased self-worth or greater parental involvement. To the extent that these desired outcomes occurred, and benefited the lives of students in ways that might never be measurably through quantitative means, those aspects of many models were successful. Both the House
et al. critique and others (cited in Wisler) express concerns about the inadequacy of the instruments used to measure self-esteem the
Follow Through evaluation (i.e., the Intellectual Achievement Responsibility Scale (IARS) and the Coopersmith Self-Esteem Inventory). But it was better, according to many researchers, to measure outcomes imperfectly rather than not to measure them at all. Thus, while "perfect" measures of desired outcomes might never exist, one should not let the perfect be the enemy of the good—in other words, one could call into question the efficacy of conducting any experiment at all on the basis that some bias or imperfection exists. '
Was Follow Through
a social or scientific program?' After an initial period, new regulations mandated that 80 percent of
Follow Through funding was to be allocated to the provision of service while 20 percent was to be used for knowledge production. The regulations themselves suggest that
Follow Through was principally "a federally funded education program which contained a built-in research component". An inevitable conflict exists when one attempts to operationalize a federal program in education that possesses both service delivery and research and development objectives. Rivlin
et al. point out that "the byzantine complexity of the public policymaking process makes the conduct of social experiments extremely difficult". Given the reduction in funding, the decision to engage in an effort to evaluate the effectiveness of various interventions in an empirical experiment appears appropriate and straightforward. However, if the change is not reflected in Congressional legislation or communicated clearly at the local level, issues of implementation and conflict with deeply held values inevitably result. There is much evidence that indicates confusion about the intent of the
Follow Through evaluation at the administrative level.
Issues of local control. The planned variation aspect of
Follow Through was thought to be beneficial—perhaps superior—to other forms of experimentation (e.g., selection of sites based on randomized assignment) because it would give local communities and schools an element of ownership integral to the successful implementation of the models. Despite the planned variation design, local communities in many sites were nevertheless deeply critical of the program. In some ways, criticism of
Follow Through had preceded directly from Head Start. Ostensibly, the social service purpose and goals of the Head Start program were clearer than those of the
Follow Through evaluation. Nevertheless, community leaders had felt that Head Start did not give enough decision-making responsibility to parents and community members. Local interests wanted to make curricular decisions, including the changing of facets of some program models Evans cautioned that "educational communities and contexts vary", which can have a direct effect on the implementation of a model. More problematic, however, is Elmore's and Hill's assertions that the
Follow Through models interfered with local teaching methods and practices. As Elmore writes, "for
Follow Through, the problem was how to implement program variations in a system where most day-to-day decisions about program content are made at the school or classroom level". Rhine
et al. suggest that it is difficult to get teachers to modify their behavior. And if the objective of changing behavior is achieved, teachers feel little ownership on the model—a decidedly dubious investment. What inevitably seems to happen is that teachers reject programs outright, while others "surrender to the program".
The "fact-value dichotomy".
Ernest R. House, co-author of the 1978 critique of the
Follow Through evaluation, penned an article about what he calls the "
fact-value dichotomy" in social experimentation and educational research: "the belief that facts refer to one thing and values refer to something totally different". House elucidates the writings of Donald Campbell, a researcher in the field of evaluation. House noted that, according to Campbell, facts cannot exist outside the framework of one's values because inevitably, an investigation that uncovers a certain fact is either consistent with the researcher's internal values or against them. What results is a difficult choice: the researcher must either reject the fact, or modify his or her value to accommodate the fact. Campbell also believed, according to House, that values—as opposed to facts—could be chosen rationally. House agrees with Campbell's assertion in part, but departs from Campbell in that he believes that facts and values cannot exist in isolation; rather, they "blend together in the conclusions of evaluation studies, and, indeed, blend together throughout evaluation studies". House suggests that the reader envision facts and values as existing on a continuum from "bute facts to bare values". Accordingly, rarely do "fact claims" or "value claims" fall entirely at one end of the spectrum or the other. House provides examples: "Diamonds are harder than steel" might fall at the left of the spectrum, while "Cabernet is better than Chardonnay" falls to the right. In conclusion, House proposes an entirely a new method of empirical investigation called "deliberative democratic evaluation". In it, evaluators arrive at "unbiased claims" through "inclusion of all relevant stakeholder perspectives, values, and interests in the study; extensive dialogue between the evaluator and stakeholders...and extensive deliberation to reach valid conclusions in the study". House decries the use of entirely rational methods when applied to evaluations; indeed, he recommends a degree of subjectiveness, because evaluations like
Follow Through cannot exist outside deeply held values. Hill writes: "There is seldom anyone at the local level whose commitment to an externally-imposed curricular innovation, planning process, or financial management scheme springs spontaneously from deeply held personal values." House argues that all decision-making that stems from evaluations in education is the result of a compromise. Watkins argues that
Follow Through resulted in a clash over values based on different beliefs about how children learn, which can be boiled down to "natural growth" or "unfolding" theories versus. theories of "changing behavior". Watkins asserts that most education experts today do not judge programs by their relative effectiveness with different student populations, but rather by their "congruence with prevailing philosophies of education". == References==