Jan. 26, 2007

A Warning on Measuring Learning Outcomes

Among the recommendations contained in the report that Secretary of Education Margaret Spellings’ Commission on the Future of Higher Education issued last September were these:

The results of student learning assessments, including value added measurements that indicate how students’ skills have improved over time, should be made available to students and reported in the aggregate publicly.
The collection of data from public institutions allowing meaningful interstate comparison of student learning should be encouraged and implemented in all states.

Looking for a job?

See all 206 new postings

Browse all job listings:

Faculty: 2,554
Administrative: 2,163
Executive: 162
FEATURED EMPLOYERS

Secretary Spellings currently is undertaking a variety of initiatives designed to implement the commission’s recommendations. In addition, several national organizations, including the Educational Testing Service and a partnership involving the National Association of State Universities and Land Grant Colleges and the American Association of State Colleges and Universities, are working to identify or develop “student learning assessments, including value added measurements” that will facilitate “meaningful interstate comparison.”

I have devoted much of my career to helping faculty identify and develop ways to assess student learning and institutional effectiveness, then use assessment findings to improve students’ learning and educational experiences, I have conducted my own research on assessment and have studied that of many others and have established a reputation as an advocate of appropriate (i.e., valid and reliable) assessment that can improve student learning. Thus I have more than a passing interest in these current developments.

For a decade beginning in the mid-1980s I coordinated the University Tennessee at Knoxville’s response to Tennessee’s Performance Funding initiative, which required us to test thousands of freshmen and seniors and calculate gain, or “value added.” Given the large numbers of students involved, we were able to try out several standardized tests of general intellectual skills (ACT’s COMP and CAAP; CBASE; and the Academic Profile, the ETS precursor to MAPP) as well as to test seniors who had taken the same exam as freshmen. In addition, my associate Gary Pike and I, along with other colleagues in various disciplines at Tennessee, undertook a program of research on the reliability and validity of the tests and on the reliability of value added calculations.

Our research confirmed findings and conclusions dating to the 1960s reached by such respected measurement scholars as Lee Cronbach, Frederic Lord, Robert Linn, and Robert Thorndike. Some generalizations based on these findings may be helpful to others as we confront once again the challenge to find valid measures of college students’ learning and score gain that permit institutional comparisons.

While standardized tests can be helpful in initiating faculty conversations about assessment, our research casts serious doubt on the validity of using standardized tests of general intellectual skills for assessing individual students, then aggregating their scores for the purpose of comparing institutions.

Standardized tests of general intellectual skills (writing, critical thinking, etc.):

test primarily entering ability (e.g., when the institution is the unit of analysis, the correlation between scores on these tests and entering ACT/SAT scores is quite high, ranging from.7 to.9), therefore differences in test scores reflect individual differences among students taking the test more accurately than they illustrate differences in the quality of education offered at different institutions.
are not content neutral, thus disadvantage students specializing in some disciplines.
contain questions and problems that do not match the learning experiences of all students at any given institution.
measure at best 30% of the knowledge and skills faculty want students to develop in the course of their general education experiences.
cannot be given to samples of volunteers if scores are to be generalized to all students and used in making important decisions such as the ranking of institutions on the basis of presumed quality.
cannot be required of some students at an institution and not of others—yet making the test a requirement is the only way to ensure participation by a sample over time.

If standardized tests of general intellectual skills are required of all students,

and if an institution’s ranking is at stake, faculty may narrow the curriculum to focus on test content.
student motivation to perform conscientiously becomes a significant concern.
extrinsic incentives (pizza, stipends) do not ensure conscientious performance over time.
ultimately, a requirement to achieve a minimum score on the test, with consequences, is needed to ensure conscientious performance. And if a senior achieves less than the minimum score, does that student fail to graduate despite meeting other requirements?

For nearly 50 years measurement scholars have warned against pursuing the blind alley of value added assessment. Our research has demonstrated yet again that the reliability of gain scores and residual scores — the two chief methods of calculating value added — is negligible (i.e., 0.1).

We conclude that standardized tests of generic intellectual skills do not provide valid evidence of institutional differences in the quality of education provided to students. Moreover, we see no virtue in attempting to compare institutions, since by design they are pursuing diverse missions and thus attracting students with different interests, abilities, levels of motivation, and career aspirations.

If it is imperative that those of us concerned about assessment in higher education identify standardized methods of assessing student learning that permit institutional comparisons, I propose two alternatives:

1. electronic portfolios that can illustrate growth over time in generic as well as discipline-based skills and are not distorted by a student having a bad day and performing poorly on a 3-hour snapshot of what s/he has learned in college. Portfolios can be scored reliably using rubrics developed by groups of faculty. Then scores can be aggregated to provide the numbers decision-makers want to compare.

2. measures based in academic disciplines that show how students can use discipline-based knowledge, as well as generic skills, in their chosen fields and as informed citizens with specialized expertise.

In short, a substantial and credible body of measurement research tells us that standardized tests of general intellectual skills cannot furnish meaningful information on the value added by a college education nor can they provide a sound basis for inter-institutional comparisons. In fact, the use of test scores to make comparisons can lead to a number of negative consequences, not the least of which is homogenization of educational experiences and institutions. The wide variety of opportunities for higher education has heretofore been one of the great strengths of higher education in the United States.

Trudy W. Banta is a professor of higher education and vice chancellor for planning and institutional improvement at Indiana University-Purdue University at Indianapolis.

Want it on paper? Print this page.
Know someone who’d be interested? Forward this story.
Want to stay informed? Sign up for free daily news e-mail.

Comments

And what of the employers?

Nice, well-meaning words. Notes from the employment front-line —

* Is the work in the portfolio, truly the student’s? Is she/he prepared to reproduce it, on the spot?

* As to “standards” and “rubrics” in disciplines — thanks, we’ll pass on group-think and autonomic responses.

We’re more concerned about basic skills (e.g., writing, reading, math) and common sense (table manners, politeness, work ethic, time management).

L.L. Berry, at 8:16 am EST on January 26, 2007

Death to Value-Added

Right on, Trudy! Back in November, I did a back-page “Commentary” in EdWeek with the same critique of the Spellings’ Commission mindless embrace of whatever crossed their heads as measures of college “outcomes,” using the same “blind alley” metaphor—-which came to us courtesy of Jonathan Warren, who created the grandmother of the current College Learning Assessment in the 1970s. Warren, who worked for the former ETS Berkeley office, built a very creative short-answer unrestricted response exam, Academic Competences in General Education, that was used in its various Beta versions, by 140 AASCU institutions. As an Associate Dean at one of those at the time, I was responsible for bringing it into experimental design assessment of an innovative freshman program. As observed in that EdWeek article, we took minor pre/post improvements and inflated them to cosmic value-added significance. Warren himself saw the folly of this, and the ACGE exam itself never made it through the test audit gates due to predictable unreliability in the mid-ranges of scoring. None of this means that unrestricted response tests such as the ACGE or College Learning Assessment are bad tests. In fact, I love ‘em—-but not in corporate “value added” frameworks and evaluation processes, rather as criterion-referenced measures of where students ought to get at whatever moment in an undergraduate career one finds them appropriate.

Clifford Adelman, Senior Associate at Institute for Higher Ed Policy, at 8:50 am EST on January 26, 2007

It shouldn’t surprise anyone, other than government bureaucrats, that standardized (IQ) testing doesn’t work very well in measuring the value of a college education. Native intelligence is just that, and it doesn’t change much. And to the extent that it does seem to increase, it’s probably due to an improvement in basic reading and math skills brought about by practice or remedial studies, not by learning, say, political science. What’s being measured in an IQ test is not knowledge, and knowledge is the main thing a college or university is selling. That, and a good time.

Putting more programs on a computer makes it more useful, but it doesn’t make the computer any “smarter.”

Now, if we really want to measure “value added” by a traditional college education, we need to be looking at job placement rates, like the private career schools have been doing for years. Such measurements and comparisons are not without their own problems, but if one assumes that there is a relationship between knowledge and employability, then at least we would be measuring something that education can impart, and not something that it can’t.

Glenn Bogart, at 10:05 am EST on January 26, 2007

Well said

Trudy, Thanks for the insight. I knew your appreciation and academic endeavors on testing needed to be said. I do believe that your accent on electronic portfolios is the ideal way for recognizing life long education.

Lou Reibling, Retired V/P at Schoolcraft, at 10:35 am EST on January 26, 2007

A balanced approach to assessment

At Syracuse University’s School of Education we are using Sakai and the Open Source portfolio as a tool to develop a platform that will allow the balance between coursework and testing and student centered portfolios to as a balanced approach to assessment and program improvement. I worry about too much of either approach at this stage, but a blend of assessment strategies certainly would seem to be best suited for capturing the breadth of student growth and effectiveness of the institution.

Sean Keesler, Project Manager, The LIving SchoolBook at Syracuse University, at 11:05 am EST on January 26, 2007

Standardized exams work quite well in the sciences and engineering. While these exams can’t predict creative abilities. I have yet to come across a grad student with a deficient background in requisite knowledge that did well in the formation of a plausible hypothesis or design of sound experiments. Like many chemistry graduate programs we subject all incoming graduate students to a set of American Chemical Society exams.

http://www4.uwm.edu/chemexams/

I also use these exams in my undergraduate courses. Assessment of student learning is impossible without these exams. We should require students in the sciences and engineering to take standardize exams throughout their undergraduate experience as they will be required to take certification exams in their post-baccalaureate careers anyway.

Frank, at 12:00 pm EST on January 26, 2007

Subject Testing

We’ve been under some pressure here to use ETS subject tests on our seniors. When I say “under pressure,” I mean that — against all expectations — the administration is willing to pay for the testing without taking it out of our budget, and was pretty insistent even after we pointed out the serious disjunctions between our program and the ETS instrument.

Jonathan Dresner, at 1:21 pm EST on January 26, 2007

Missing the Point

Dr. Banta’s comments are predicated on contemporary measurement science and, indirectly, learning sciences; too many among the professoriate teach and evaluate as if there had been no scientific progress in either discipline in the past 150 years. This stance is uninspiring since the professoriate, in general, has the capacity and moral obligation to support good science in all disciplines and to remain objective wherever it might be applied. Banta’s claims regarding validity reflect sound generalizations from decades of well conceived and well executed scientific research. As such, they can be called to question based on their conceptual foundations, interpretations of empirical data, or with countermanding empirical findings. Saying, “I don’t agree because a standardized test worked well for me in situation X” fails all of those tests (to say nothing of begging the question of “worked well"). If you have scientific evidence that Banta’s generalizations are unsound, let’s see it. We have been studying many of these same issues for more than 25 years and I would say that, if anything, Trudy is understating the case against standardized testing for these purposes. Its a lot of fun to develop assessments that embed context and authenticity; a side benefit is that doing so can lead to better teaching and learning.

Robert W Tucker, President` at InterEd, Inc., at 1:45 pm EST on January 26, 2007

Standardized exams work well in S&T

By “works well” in the sciences or specifically chemistry means either you know thermodynamics, reaction kinetics, organic mechanisms, statistics, how to interpret mass spectra or you don’t. If you are to be competent chemist in industry, academia, or government labs at the BS, MS PhD levels you must have requisite knowledge. At some point any university’s grads will be measured by against others in the competition for jobs.

Would you consult a physician who failed their board exam? How safe do you feel if you know the new 1/4 mile twin span bridge in your county was designed by a team of engineers who failed the PE exam repeatedly? How seriously do you take the legal advice of someone who failed their bar exam?

Standardize exams are a good idea for SOME disciplines.

Frank, at 2:55 pm EST on January 26, 2007

Tests in Science Fields

Just a note to several comments on the relevance of tests in the sciences — please reread the second suggestion. Discipline-specific tests of actual knowledge and skills of graduates are supported. You are NOT offering a counter to her argument; what you are saying is consistent with it.

Stephanie, at 3:55 pm EST on January 26, 2007

Standardized Exams in S&T

Yes, my point was to make it perfectly clear that discipline specific exams in S&T (technology) are appropriate and really nothing else.

We need to have nation-wide standardized exams in S&T applied ASAP as my personal experience is that the university GPA has become a meaningless value of late.

Frank, at 4:55 pm EST on January 26, 2007

Standardized tests

Not all of S&T wants to do standardized tests. We’ve reviewed the Computer Science MFAT and don’t believe it reflects our program or what we believe CS students should know.

A portfolio approach might be better but evaluating portfolios is very time consuming.

Rob Rittenhouse, CS Faculty at McMurry University, at 6:56 pm EST on January 26, 2007

As to “common sense"...there is no such thing. What we consider common sense is learned sense, definitely cultural, often classist.

It’s nice to know all of this about S&T, where I was situated before changing to the arts, but how to test in the humanities and arts? How to figure one’s gains? How to figure, even more importantly, the role of the university versus the role of any particular professor? And how to assess the student who learned despite the teaching, despite the poor situation? Indeed, how do we assess the role played by direct experience (extra-collegiately)?

Could it be that we are looking in the wrong place to come up with standards of measurment?

Job placement and education do not correlate because one must always take into consideration social conditions, things which “we” cannot control...like, unemployment, need, availability, likes/dislikes of hiring bodies, politics.

Thought: considering the success of the MMPI and its variants, could not this style/form be adapted to educational needs? Sometimes, as with alternative teaching techniques, it is a good idea to NOT focus on what is wanted (being taught) but on the characteristic that is common.

If you don’t know where you’re going, if you don’t know what you want, how can you test for it? “An accepted level of educational attainment” (or whatever phrase) is totally meaningless.

James L. Secor, Ph.D. at Sun Yat-sen University, at 7:50 am EST on January 27, 2007

From the employers, Pt. II

” .. scientific evidence that Banta’s generalizations are unsound, let’s see it ..”

How about a study on the number of resumes with spelling and grammer errors in them?

A study on number of interviewees who fail to meet minimum performance standards for math, spelling, and grammer?

A study on number of interviewees who fail to meet minimum standards for professional conduct (e.g., poor time management, ill-mannered, self-entitled).

Singing Kumbaya is one thing. Competing with the European and Asian economies is quite another.

L.L. Berry, at 8:15 am EST on January 27, 2007

Employers II

I suggest you check your spelling prior to throwing stones.

When did universities become responsible for teaching table manners, and being polite and respectful? These were values formed in my home. I expect my students to come to my institution with these skills fully formed.

Nicholas R. Santilli, Director, Planning and Assessment at John Carroll University, at 11:25 am EST on January 27, 2007

Slightly OT, but the question came up

I may comment on the underlying assessment issue later (Trudy and Frank are both right on), but for now, just one though in reply from my perspective of VP for Academic Affairs at a small, liberal arts oriented institution. (A few survive.)Just when did we give up our responsibilities in helping to form our students’ abilities to live examined lives; respectful of others (I think that includes manners and courtesy) and giving considered attention to their own performance and competence, which just might include spelling and being somewhere on time if you promise to do so. We don’t have the whole resposibility, but I’m not so sure that we have the right to simply ignore our own role. These things were never taught only in the home; they always involved the full interaction with all learning experiences. (Oh, and re. the portfolio thing...you have a lot better chance of having some confidence that a student prepared an electronic portfolio these days than you do that they wrote any given paper. It’s just that nobody knows quite what to do with a portfolio, much less umpity thousand of them. Sigh...)

DocA, VP Academic Affairs, at 12:00 am EST on January 28, 2007

Of course

U.S. students don’t need SATs, GREs, GMATs, MCATs, LSATs. The U.S. is this great super-power that dominates over other economies. And the U.S. is not a debtor nation to communist powers. And employers will never administer an on-the-spot exam of basic skills.

And everyone wonders why those who think so highly of the non-standardized approach to evaluation — since it is so superior — why they don’t start their own private consulting services? And make them and their clients millionaires?

And when those students go to other countries, they don’t need their manners corrected. And their grammer.

L.L. Berry, at 8:00 am EST on January 29, 2007

Underyling Conceptions of ‘Quality’

I am heartened by the spirit of the debate herein and would add for your consideration that all Conceptions of ‘Quality’ appeal (as an analytic fact, not a contingent outcome) to the idea of “suitability to purpose” (STP). What we mean by ‘higher education’ has evolved significantly over the past 100 years. A process that was once open only to the very smart and the very rich is now made available to more than 80% of the ability distribution. Given STP, it follows that we must understand the new and changing purposes of the new and changing stakeholders (largest among them the students) before it is even minimally logical to discuss constructs subsumed by conceptions of quality.

That said, some of the discussion I see herein presupposes ideas of quality that are tightly (even rigidly) held by we academic types that are not shared or even appreciated by the largest stakeholder group. Is it not arrogant of us to impose our views on them when, in fact, we are the service providers for hire? Despite the fact that we live in a democracy in the 21st century, I still hear murmurings from those who would prefer to see the professoriate as a Mandarin class.

Robert W Tucker, President at InterEd, Inc., at 1:21 pm EST on January 29, 2007

I hope that this essay, or others like it, get circulated widely because Prof. Banta makes excellent points.

To address some comments about portfolios: “Oh, and re. the portfolio thing...you have a lot better chance of having some confidence that a student prepared an electronic portfolio these days than you do that they wrote any given paper.”

Yes, that’s one of the advantages of portfolios over tests such as the CLA. Every school that I know of that has administered the CLA has reported problems getting students to even take the test. The biased samples of CLA test-takers are a more potentially severe problem than verifying the authenticity of portfolios.

“It’s just that nobody knows quite what to do with a portfolio, much less umpity thousand of them.”

Two answers here: for assessment purposes, you don’t need or want to look at ALL of the portfolios. You simply select a random sample of them. Hence unlike the CLA and other tests, there’s no problem with sample bias. As for what to do with the portfolios, that’s an evolving process, but consortia of schools providing outside readers to each other, to read each others’ portfolios using rubrics, is a possibility which comes readily to mind. Some work has already been done in this area, and a small handful of schools have all along had outside readers for their senior theses or honors theses. (Not that we’d want every senior thesis to be read by an outside reader, but it’s an example of direct assessment of student work.)

Mike Tamada, Prof. Banta’s essay should be circulated widely at Occidental College, at 4:10 am EST on January 30, 2007

Portfolios

Mike Tamada makes some good points.

One quibble:"As for what to do with the portfolios, that’s an evolving process, but consortia of schools providing outside readers to each other, to read each others’ portfolios using rubrics, is a possibility which comes readily to mind.”

Who pays for this? Are faculty supposed to be investing significant amounts of time in an activity that does little or nothing for them professionally (actually this is one of the biggest gripes faculty have with assessment generally)? This in an environment of rising complaints about the costs of higher education? It’s one thing to do this occasionally as part of a program review but another to be expected to do this on an annual basis. (I expect to be flamed by those who don’t think faculty work hard — mine do).

That said, portfolios are useful for students. They give the student something to show to an employer who wants to know what they can do. This is familiar to those in the fine arts but it does apply to other fields as well (computer programming for one). Students should be motivated to do this (which they aren’t with the CLA).

Rob Rittenhouse, CS Faculty at McMurry University, at 12:45 pm EST on January 30, 2007

“Who pays for this? Are faculty supposed to be investing significant amounts of time in an activity that does little or nothing for them professionally”

To be sure, few schools or profs will *want* to spend time assessing portfolios. But, if we can get portfolios to be used as a “direct measure” of student learning, rather than tests such as CLA, MAPP, etc., then schools and profs will be doing it all right.

Because they will HAVE TO. The Feds have made it clear that the accreditors have to require direct measures of student learning (ie. not indirect measures, ie. not students’ self-reported assessments of their education).

Accreditation is typically an activity whose cost runs into at least six figures, when you count the time of administrators and faculty. So yes, this scenario of using portfolios is not cheap — but neither is re-accreditation.

My guess is that this assessment of portfolios wouldn’t have to be done every single year, although that might not be a bad idea. Once the infrastructure is set up (admittedly, not a trivial task), then snagging say 10 faculty for a day of portfolio assessment could be done at a cost of several thousand dollars. For comparison, administering the CLA costs $6,300 or more.

Mike Tamada, Occidental College, at 12:10 pm EST on February 1, 2007

Portfolio (and other) Assessment

“Are faculty supposed to be investing significant amounts of time in an activity that does little or nothing for them professionally (actually this is one of the biggest gripes faculty have with assessment generally)?”

I find it disheartening to hear faculty say that assessment doesn’t do anything for them. Yet it’s all too common a misconception. Student learning assessment is not something that is done after teaching and learning are over. It is not just for public accountability. Meaningful assessment improves student learning. It is a professional development activity: conducting learner-centered assessment helps faculty understand what their students do or don’t understand, what they can or can’t do, how they learn and what obstacles there are to learning. What could be more relevant to faculty?

Sione Aeschliman, Instructional Assessment Specialist at Central Oregon Community College, at 9:45 pm EST on February 1, 2007

Assessment and analysis is QA

Faculty are responsible for delivering high quality instruction and education. Students are responsible for utilizing what the faculty and institution provides them. Assessment of learning is difficult to be sure, however, there are a number of valid approaches. Student outcomes are certainly not perfectly correlated with instructor effectiveness but there is a correlation. In many studies it rises to r=0.7. This is sufficient to identify _patterns_ in teaching style, course content, delivery mode, student attributes...etc that are more likely to result in student learning. In another article someone made a comment student learning is unlike creating a bottle of wine or scotch and certainly opposite to manufacturing a widget. Learning is very subjective and ellusive in many respects but successful academic progress certainly holds substantive and identifiable characteristics.

Cheers,

Kelly

Kelly Deal, at 2:45 pm EST on March 9, 2007

Jobs Related to A Warning on Measuring Learning Outcomes

or search for jobs directly.

Part-Time Faculty in Apparel Design & Merchandising
Eastern Kentucky University

Eastern Kentucky University, located in Richmond, Madison County, Kentucky near the Heart of the Bluegrass, is a ... see job

Electrical and Computer Engineering Faculty – SP523111
University of Kentucky

See for yourself what makes UK one great place to work. see job

Assistant Professor, Comparative Politics — East Asia/Southeast Asia
University of Colorado

Posting Description: The Department of Political Science at the University of Colorado at Boulder invites ... see job

Department of Biology Department Head, Ogden College of Science and Engineering
Western Kentucky University

The Department of Biology at Western Kentucky University invites nominations and applications for the position of Head of the ... see job

Assistant Professor
University of Miami

The University of Miami is committed to educating and nurturing students, creating knowledge, and providing service to our ... see job

Sp Coll Asst V
Princeton University

Position Summary: The Assistant to the Curator of Rare Books is the key agent aiding the Curator in the ... see job

RESEARCH ASSISTANT — John Hancock Center (32113)
Tufts University

Tufts University is partnering with Save the Children U.S. Programs to develop and implement a two-year nutrition and ... see job

Nurse Practitioner/Nursing Supervisor
University of California, Riverside

The University of California Riverside invests in your future through employee training and career development, access to ... see job

Physician — 3230Z
Saint Louis University

Saint Louis University is a Jesuit Catholic University. Through teaching, research, health care and community service, Saint ... see job

Faculty — Physical Therapy — 8HSPT01 (Houston)
Texas Woman’s University

The Texas Woman’s University (TWU) School of Physical Therapy invites applications for a faculty position on the Houston ... see job