Search Views


Browse Archives

Views

A Warning on Measuring Learning Outcomes

January 26, 2007

Share This Story

FREE Daily News Alerts

Advertisement

Among the recommendations contained in the report that Secretary of Education Margaret Spellings’ Commission on the Future of Higher Education issued last September were these:

  • The results of student learning assessments, including value added measurements that indicate how students’ skills have improved over time, should be made available to students and reported in the aggregate publicly.
  • The collection of data from public institutions allowing meaningful interstate comparison of student learning should be encouraged and implemented in all states.

I appreciate the commission’s focus on student learning and its assessment. But my experience and my reading and conduct of research on these topics lead me to argue against the use of standardized tests of general intellectual skills to compare the effectiveness of colleges and universities.

Secretary Spellings currently is undertaking a variety of initiatives designed to implement the commission’s recommendations. In addition, several national organizations, including the Educational Testing Service and a partnership involving the National Association of State Universities and Land Grant Colleges and the American Association of State Colleges and Universities, are working to identify or develop “student learning assessments, including value added measurements” that will facilitate “meaningful interstate comparison.” 

I have devoted much of my career to helping faculty identify and develop ways to assess student learning and institutional effectiveness, then use assessment findings to improve students’ learning and educational experiences, I have conducted my own research on assessment and have studied that of many others and have established a reputation as an advocate of appropriate (i.e., valid and reliable) assessment that can improve student learning. Thus I have more than a passing interest in these current developments.

For a decade beginning in the mid-1980s I coordinated the University Tennessee at Knoxville’s response to Tennessee’s Performance Funding initiative, which required us to test thousands of freshmen and seniors and calculate gain, or "value added." Given the large numbers of students involved, we were able to try out several standardized tests of general intellectual skills (ACT’s COMP and CAAP; CBASE; and the Academic Profile, the ETS precursor to MAPP) as well as to test seniors who had taken the same exam as freshmen. In addition, my associate Gary Pike and I, along with other colleagues in various disciplines at Tennessee, undertook a program of research on the reliability and validity of the tests and on the reliability of value added calculations. 

Our research confirmed findings and conclusions dating to the 1960s reached by such respected measurement scholars as Lee Cronbach, Frederic Lord, Robert Linn, and Robert Thorndike. Some generalizations based on these findings may be helpful to others as we confront once again the challenge to find valid measures of college students’ learning and score gain that permit institutional comparisons.

While standardized tests can be helpful in initiating faculty conversations about assessment, our research casts serious doubt on the validity of using standardized tests of general intellectual skills for assessing individual students, then aggregating their scores for the purpose of comparing institutions.

Standardized tests of general intellectual skills (writing, critical thinking, etc.):

  • test primarily entering ability (e.g., when the institution is the unit of analysis, the correlation between scores on these tests and entering ACT/SAT scores is quite high, ranging from .7 to  .9), therefore differences in test scores reflect individual differences among students taking the test more accurately than they illustrate differences in the quality of education offered at different institutions.
  • are not content neutral, thus disadvantage students specializing in some disciplines.
  • contain questions and problems that do not match the learning experiences of all students at any given institution.
  • measure at best 30% of the knowledge and skills faculty want students to develop in the course of their general education experiences.
  • cannot be given to samples of volunteers if scores are to be generalized to all students and used in making important decisions such as the ranking of institutions on the basis of presumed quality.
  • cannot be required of some students at an institution and not of others—yet making the test a requirement is the only way to ensure participation by a sample over time.

If standardized tests of general intellectual skills are required of all students,

  • and if an institution’s ranking is at stake, faculty may narrow the curriculum to focus on test content.
  • student motivation to perform conscientiously becomes a significant concern.
  • extrinsic incentives (pizza, stipends) do not ensure conscientious performance over time.
  • ultimately, a requirement to achieve a minimum score on the test, with consequences, is needed to ensure conscientious performance.  And if a senior achieves less than the minimum score, does that student fail to graduate despite meeting other requirements?

For nearly 50 years measurement scholars have warned against pursuing the blind alley of value added assessment.  Our research has demonstrated yet again that the reliability of gain scores and residual scores -- the two chief methods of calculating value added -- is negligible (i.e., 0.1).
  
We conclude that standardized tests of generic intellectual skills do not provide valid evidence of institutional differences in the quality of education provided to students.  Moreover, we see no virtue in attempting to compare institutions, since by design they are pursuing diverse missions and thus attracting students with different interests, abilities, levels of motivation, and career aspirations. 

If it is imperative that those of us concerned about assessment in higher education identify standardized methods of assessing student learning that permit institutional comparisons, I propose two alternatives:

1. electronic portfolios that can illustrate growth over time in generic as well as discipline-based skills and are not distorted by a student having a bad day and performing poorly on a 3-hour snapshot of what s/he has learned in college.  Portfolios can be scored reliably using rubrics developed by groups of faculty. Then scores can be aggregated to provide the numbers decision-makers want to compare.

2. measures based in academic disciplines that show how students can use discipline-based knowledge, as well as generic skills, in their chosen fields and as informed citizens with specialized expertise.

In short, a substantial and credible body of measurement research tells us that standardized tests of general intellectual skills cannot furnish meaningful information on the value added by a college education nor can they provide a sound basis for inter-institutional comparisons. In fact, the use of test scores to make comparisons can lead to a number of negative consequences, not the least of which is homogenization of educational experiences and institutions. The wide variety of opportunities for higher education has heretofore been one of the great strengths of higher education in the United States.

Trudy W. Banta is a professor of higher education and vice chancellor for planning and institutional improvement at Indiana University-Purdue University at Indianapolis.

See all postings »
Advertisement
Advertisement

Matching Jobs

Comments on A Warning on Measuring Learning Outcomes

  • CLA
  • Posted by james ragonnet , Professor of English (rhetoric) at Springfield College on October 25, 2008 at 11:15am EDT
  • Learned Colleague,
    I need your opinion.

    Representing Springfield College -- a new member of "The CLA Consortium," I just returned from Raleigh, NC where I attended a CLA conference-workshop.

    When I reviewed and scored a CLA retired or sample Performance Test -- a scenario dealing with crime prevention -- I was rather amused. (Had I gotten some sleep the night before and on my flight, I would have been shocked.)

    Admittedly, I examined and scored only part of the test -- "The Performance Task." However, I found it too demanding for most college students. I informed the CLA presenters -- who've done post-doctoral work but who've done little or no classroom teaching -- that the CLA exam will surely make a great assessment tool -- not for students -- but for new faculty hires!

    What are seasoned classroom professors saying, if anything, about whether the CLA Exam is too rigorous for college students? (I dearly hope that my institution never decides to give the CLA to the faculty as a pre-condition of their continued employment.)

    I'm not concerned about the other objections CLA critics have raised. Rather I'm asking whether the CLA is properly geared to assess realistically and fairly a student's cognitive and communicative skill sets?

    My best...jim ragonnet

  • Testing
  • Posted by L. Smith on November 13, 2008 at 11:15pm EST
  • Testing is not the answer. All it does is give the "teacher" a basis for determining a grade. And, we all know that grading and grades are circumspect. Rather, a more true measure of learning is when the learner (i.e., the "student" using traditional and aniquated terminology and stereotyping) wants to know more about a topic or issue. This expression of desire for more is an affirmation that the learner has mastered current concepts and material and now wants to move on. In this scenario no test nor grade is necessary. What should be necessary is for the provider (i.e., the "teacher") to have the next level or dimension of concepts and materials readily available to present and apply once the learner expresses the desire to move on.
    What we need is a system that is designed to cater to this basal learning behavior and can be applied in real time. Take a look at the definitive treatment "Education in America -- What's to Be Done?" developed by Trigon-International. This commission report presents an end-to-end solution that is actionable and affordable.

  • And what of the employers?
  • Posted by L.L. Berry on January 26, 2007 at 8:16am EST
  • Nice, well-meaning words. Notes from the employment front-line --

    * Is the work in the portfolio, truly the student's? Is she/he prepared to reproduce it, on the spot?

    * As to "standards" and "rubrics" in disciplines -- thanks, we'll pass on group-think and autonomic responses.

    We're more concerned about basic skills (e.g., writing, reading, math) and common sense (table manners, politeness, work ethic, time management).

  • Death to Value-Added
  • Posted by Clifford Adelman , Senior Associate at Institute for Higher Ed Policy on January 26, 2007 at 8:50am EST
  • Right on, Trudy! Back in November, I did a back-page "Commentary" in EdWeek with the same critique of the Spellings' Commission mindless embrace of whatever crossed their heads as measures of college "outcomes," using the same "blind alley" metaphor---which came to us courtesy of Jonathan Warren, who created the grandmother of the current College Learning Assessment in the 1970s. Warren, who worked for the former ETS Berkeley office, built a very creative short-answer unrestricted response exam, Academic Competences in General Education, that was used in its various Beta versions, by 140 AASCU institutions. As an Associate Dean at one of those at the time, I was responsible for bringing it into experimental design assessment of an innovative freshman program. As observed in that EdWeek article, we took minor pre/post improvements and inflated them to cosmic value-added significance. Warren himself saw the folly of this, and the ACGE exam itself never made it through the test audit gates due to predictable unreliability in the mid-ranges of scoring. None of this means that unrestricted response tests such as the ACGE or College Learning Assessment are bad tests. In fact, I love 'em---but not in corporate "value added" frameworks and evaluation processes, rather as criterion-referenced measures of where students ought to get at whatever moment in an undergraduate career one finds them appropriate.

  • Posted by Glenn Bogart on January 26, 2007 at 10:05am EST
  • It shouldn't surprise anyone, other than government bureaucrats, that standardized (IQ) testing doesn't work very well in measuring the value of a college education. Native intelligence is just that, and it doesn't change much. And to the extent that it does seem to increase, it's probably due to an improvement in basic reading and math skills brought about by practice or remedial studies, not by learning, say, political science. What's being measured in an IQ test is not knowledge, and knowledge is the main thing a college or university is selling. That, and a good time.

    Putting more programs on a computer makes it more useful, but it doesn't make the computer any "smarter."

    Now, if we really want to measure "value added" by a traditional college education, we need to be looking at job placement rates, like the private career schools have been doing for years. Such measurements and comparisons are not without their own problems, but if one assumes that there is a relationship between knowledge and employability, then at least we would be measuring something that education can impart, and not something that it can't.

  • Well said
  • Posted by Lou Reibling , Retired V/P at Schoolcraft on January 26, 2007 at 10:35am EST
  • Trudy, Thanks for the insight. I knew your appreciation and academic endeavors on testing needed to be said. I do believe that your accent on electronic portfolios is the ideal way for recognizing life long education.

  • A balanced approach to assessment
  • Posted by Sean Keesler , Project Manager, The LIving SchoolBook at Syracuse University on January 26, 2007 at 11:05am EST
  • At Syracuse University's School of Education we are using Sakai and the Open Source portfolio as a tool to develop a platform that will allow the balance between coursework and testing and student centered portfolios to as a balanced approach to assessment and program improvement. I worry about too much of either approach at this stage, but a blend of assessment strategies certainly would seem to be best suited for capturing the breadth of student growth and effectiveness of the institution.

  • Posted by Frank on January 26, 2007 at 12:00pm EST
  • Standardized exams work quite well in the sciences and engineering. While these exams can't predict creative abilities. I have yet to come across a grad student with a deficient background in requisite knowledge that did well in the formation of a plausible hypothesis or design of sound experiments. Like many chemistry graduate programs we subject all incoming graduate students to a set of American Chemical Society exams.

    http://www4.uwm.edu/chemexams/

    I also use these exams in my undergraduate courses. Assessment of student learning is impossible without these exams. We should require students in the sciences and engineering to take standardize exams throughout their undergraduate experience as they will be required to take certification exams in their post-baccalaureate careers anyway.

  • Subject Testing
  • Posted by Jonathan Dresner on January 26, 2007 at 1:21pm EST
  • We've been under some pressure here to use ETS subject tests on our seniors. When I say "under pressure," I mean that -- against all expectations -- the administration is willing to pay for the testing without taking it out of our budget, and was pretty insistent even after we pointed out the serious disjunctions between our program and the ETS instrument.

  • Missing the Point
  • Posted by Robert W Tucker , President` at InterEd, Inc. on January 26, 2007 at 1:45pm EST
  • Dr. Banta's comments are predicated on contemporary measurement science and, indirectly, learning sciences; too many among the professoriate teach and evaluate as if there had been no scientific progress in either discipline in the past 150 years. This stance is uninspiring since the professoriate, in general, has the capacity and moral obligation to support good science in all disciplines and to remain objective wherever it might be applied. Banta's claims regarding validity reflect sound generalizations from decades of well conceived and well executed scientific research. As such, they can be called to question based on their conceptual foundations, interpretations of empirical data, or with countermanding empirical findings. Saying, "I don't agree because a standardized test worked well for me in situation X" fails all of those tests (to say nothing of begging the question of "worked well"). If you have scientific evidence that Banta's generalizations are unsound, let's see it. We have been studying many of these same issues for more than 25 years and I would say that, if anything, Trudy is understating the case against standardized testing for these purposes. Its a lot of fun to develop assessments that embed context and authenticity; a side benefit is that doing so can lead to better teaching and learning.

  • Standardized exams work well in S&T
  • Posted by Frank on January 26, 2007 at 2:55pm EST
  • By "works well" in the sciences or specifically chemistry means either you know thermodynamics, reaction kinetics, organic mechanisms, statistics, how to interpret mass spectra or you don't. If you are to be competent chemist in industry, academia, or government labs at the BS, MS PhD levels you must have requisite knowledge. At some point any university's grads will be measured by against others in the competition for jobs.

    Would you consult a physician who failed their board exam? How safe do you feel if you know the new 1/4 mile twin span bridge in your county was designed by a team of engineers who failed the PE exam repeatedly? How seriously do you take the legal advice of someone who failed their bar exam?

    Standardize exams are a good idea for SOME disciplines.

  • Tests in Science Fields
  • Posted by Stephanie on January 26, 2007 at 3:55pm EST
  • Just a note to several comments on the relevance of tests in the sciences - please reread the second suggestion. Discipline-specific tests of actual knowledge and skills of graduates are supported. You are NOT offering a counter to her argument; what you are saying is consistent with it.

  • Standardized Exams in S&T
  • Posted by Frank on January 26, 2007 at 4:55pm EST
  • Yes, my point was to make it perfectly clear that discipline specific exams in S&T (technology) are appropriate and really nothing else.

    We need to have nation-wide standardized exams in S&T applied ASAP as my personal experience is that the university GPA has become a meaningless value of late.

  • Standardized tests
  • Posted by Rob Rittenhouse , CS Faculty at McMurry University on January 26, 2007 at 6:56pm EST
  • Not all of S&T wants to do standardized tests. We've reviewed the Computer Science MFAT and don't believe it reflects our program or what we believe CS students should know.

    A portfolio approach might be better but evaluating portfolios is very time consuming.

  • Posted by James L. Secor , Ph.D. at Sun Yat-sen University on January 27, 2007 at 7:50am EST
  • As to "common sense"...there is no such thing. What we consider common sense is learned sense, definitely cultural, often classist.

    It's nice to know all of this about S&T, where I was situated before changing to the arts, but how to test in the humanities and arts? How to figure one's gains? How to figure, even more importantly, the role of the university versus the role of any particular professor? And how to assess the student who learned despite the teaching, despite the poor situation? Indeed, how do we assess the role played by direct experience (extra-collegiately)?

    Could it be that we are looking in the wrong place to come up with standards of measurment?

    Job placement and education do not correlate because one must always take into consideration social conditions, things which "we" cannot control...like, unemployment, need, availability, likes/dislikes of hiring bodies, politics.

    Thought: considering the success of the MMPI and its variants, could not this style/form be adapted to educational needs? Sometimes, as with alternative teaching techniques, it is a good idea to NOT focus on what is wanted (being taught) but on the characteristic that is common.

    If you don't know where you're going, if you don't know what you want, how can you test for it? "An accepted level of educational attainment" (or whatever phrase) is totally meaningless.

  • From the employers, Pt. II
  • Posted by L.L. Berry on January 27, 2007 at 8:15am EST
  • " .. scientific evidence that Banta’s generalizations are unsound, let’s see it .."

    How about a study on the number of resumes with spelling and grammer errors in them?

    A study on number of interviewees who fail to meet minimum performance standards for math, spelling, and grammer?

    A study on number of interviewees who fail to meet minimum standards for professional conduct (e.g., poor time management, ill-mannered, self-entitled).

    Singing Kumbaya is one thing. Competing with the European and Asian economies is quite another.

  • Employers II
  • Posted by Nicholas R. Santilli , Director, Planning and Assessment at John Carroll University on January 27, 2007 at 11:25am EST
  • I suggest you check your spelling prior to throwing stones.

    When did universities become responsible for teaching table manners, and being polite and respectful? These were values formed in my home. I expect my students to come to my institution with these skills fully formed.

  • Slightly OT, but the question came up
  • Posted by DocA , VP Academic Affairs on January 28, 2007 at 12:00am EST
  • I may comment on the underlying assessment issue later (Trudy and Frank are both right on), but for now, just one though in reply from my perspective of VP for Academic Affairs at a small, liberal arts oriented institution. (A few survive.)Just when did we give up our responsibilities in helping to form our students' abilities to live examined lives; respectful of others (I think that includes manners and courtesy) and giving considered attention to their own performance and competence, which just might include spelling and being somewhere on time if you promise to do so. We don't have the whole resposibility, but I'm not so sure that we have the right to simply ignore our own role. These things were never taught only in the home; they always involved the full interaction with all learning experiences. (Oh, and re. the portfolio thing...you have a lot better chance of having some confidence that a student prepared an electronic portfolio these days than you do that they wrote any given paper. It's just that nobody knows quite what to do with a portfolio, much less umpity thousand of them. Sigh...)

  • Of course
  • Posted by L.L. Berry on January 29, 2007 at 8:00am EST
  • U.S. students don't need SATs, GREs, GMATs, MCATs, LSATs. The U.S. is this great super-power that dominates over other economies. And the U.S. is not a debtor nation to communist powers. And employers will never administer an on-the-spot exam of basic skills.

    And everyone wonders why those who think so highly of the non-standardized approach to evaluation -- since it is so superior -- why they don't start their own private consulting services? And make them and their clients millionaires?

    And when those students go to other countries, they don't need their manners corrected. And their grammer.

  • Underyling Conceptions of 'Quality'
  • Posted by Robert W Tucker , President at InterEd, Inc. on January 29, 2007 at 1:21pm EST
  • I am heartened by the spirit of the debate herein and would add for your consideration that all Conceptions of 'Quality' appeal (as an analytic fact, not a contingent outcome) to the idea of "suitability to purpose" (STP). What we mean by 'higher education' has evolved significantly over the past 100 years. A process that was once open only to the very smart and the very rich is now made available to more than 80% of the ability distribution. Given STP, it follows that we must understand the new and changing purposes of the new and changing stakeholders (largest among them the students) before it is even minimally logical to discuss constructs subsumed by conceptions of quality.

    That said, some of the discussion I see herein presupposes ideas of quality that are tightly (even rigidly) held by we academic types that are not shared or even appreciated by the largest stakeholder group. Is it not arrogant of us to impose our views on them when, in fact, we are the service providers for hire? Despite the fact that we live in a democracy in the 21st century, I still hear murmurings from those who would prefer to see the professoriate as a Mandarin class.

  • Posted by Mike Tamada , Prof. Banta's essay should be circulated widely at Occidental College on January 30, 2007 at 4:10am EST
  • I hope that this essay, or others like it, get circulated widely because Prof. Banta makes excellent points.

    To address some comments about portfolios:
    "Oh, and re. the portfolio thing...you have a lot better chance of having some confidence that a student prepared an electronic portfolio these days than you do that they wrote any given paper."

    Yes, that's one of the advantages of portfolios over tests such as the CLA. Every school that I know of that has administered the CLA has reported problems getting students to even take the test. The biased samples of CLA test-takers are a more potentially severe problem than verifying the authenticity of portfolios.

    "It’s just that nobody knows quite what to do with a portfolio, much less umpity thousand of them."

    Two answers here: for assessment purposes, you don't need or want to look at ALL of the portfolios. You simply select a random sample of them. Hence unlike the CLA and other tests, there's no problem with sample bias. As for what to do with the portfolios, that's an evolving process, but consortia of schools providing outside readers to each other, to read each others' portfolios using rubrics, is a possibility which comes readily to mind. Some work has already been done in this area, and a small handful of schools have all along had outside readers for their senior theses or honors theses. (Not that we'd want every senior thesis to be read by an outside reader, but it's an example of direct assessment of student work.)

  • Portfolios
  • Posted by Rob Rittenhouse , CS Faculty at McMurry University on January 30, 2007 at 12:45pm EST
  • Mike Tamada makes some good points.

    One quibble:
    "As for what to do with the portfolios, that’s an evolving process, but consortia of schools providing outside readers to each other, to read each others’ portfolios using rubrics, is a possibility which comes readily to mind."

    Who pays for this? Are faculty supposed to be investing significant amounts of time in an activity that does little or nothing for them professionally (actually this is one of the biggest gripes faculty have with assessment generally)? This in an environment of rising complaints about the costs of higher education? It's one thing to do this occasionally as part of a program review but another to be expected to do this on an annual basis. (I expect to be flamed by those who don't think faculty work hard -- mine do).

    That said, portfolios are useful for students. They give the student something to show to an employer who wants to know what they can do. This is familiar to those in the fine arts but it does apply to other fields as well (computer programming for one). Students should be motivated to do this (which they aren't with the CLA).

  • Posted by Mike Tamada at Occidental College on February 1, 2007 at 12:10pm EST
  • "Who pays for this? Are faculty supposed to be investing significant amounts of time in an activity that does little or nothing for them professionally"

    To be sure, few schools or profs will *want* to spend time assessing portfolios. But, if we can get portfolios to be used as a "direct measure" of student learning, rather than tests such as CLA, MAPP, etc., then schools and profs will be doing it all right.

    Because they will HAVE TO. The Feds have made it clear that the accreditors have to require direct measures of student learning (ie. not indirect measures, ie. not students' self-reported assessments of their education).

    Accreditation is typically an activity whose cost runs into at least six figures, when you count the time of administrators and faculty. So yes, this scenario of using portfolios is not cheap -- but neither is re-accreditation.

    My guess is that this assessment of portfolios wouldn't have to be done every single year, although that might not be a bad idea. Once the infrastructure is set up (admittedly, not a trivial task), then snagging say 10 faculty for a day of portfolio assessment could be done at a cost of several thousand dollars. For comparison, administering the CLA costs $6,300 or more.

  • Portfolio (and other) Assessment
  • Posted by Sione Aeschliman , Instructional Assessment Specialist at Central Oregon Community College on February 1, 2007 at 9:45pm EST
  • "Are faculty supposed to be investing significant amounts of time in an activity that does little or nothing for them professionally (actually this is one of the biggest gripes faculty have with assessment generally)?"

    I find it disheartening to hear faculty say that assessment doesn't do anything for them. Yet it's all too common a misconception. Student learning assessment is not something that is done after teaching and learning are over. It is not just for public accountability. Meaningful assessment improves student learning. It is a professional development activity: conducting learner-centered assessment helps faculty understand what their students do or don't understand, what they can or can't do, how they learn and what obstacles there are to learning. What could be more relevant to faculty?

  • Assessment and analysis is QA
  • Posted by Kelly Deal on March 9, 2007 at 2:45pm EST
  • Faculty are responsible for delivering high quality instruction and education. Students are responsible for utilizing what the faculty and institution provides them. Assessment of learning is difficult to be sure, however, there are a number of valid approaches. Student outcomes are certainly not perfectly correlated with instructor effectiveness but there is a correlation. In many studies it rises to r=0.7. This is sufficient to identify _patterns_ in teaching style, course content, delivery mode, student attributes...etc that are more likely to result in student learning. In another article someone made a comment student learning is unlike creating a bottle of wine or scotch and certainly opposite to manufacturing a widget. Learning is very subjective and ellusive in many respects but successful academic progress certainly holds substantive and identifiable characteristics.

    Cheers,

    Kelly