SOME USEFUL DEFINITIONS
Accountability: a characteristic of an educational system whereby the schools, school districts, state government, or federal government are held responsible for the achievement of students. The term may also be applied to holding students responsible for a certain level of achievement for promotion or graduation.
- Before-the-fact accountability: a process whereby schools determine the content and skills to be taught and choose effective educational practices for instruction. Before-the-fact accountability is concerned with assuring the quality of educational inputs.
- After-the-fact accountability: a process whereby students are tested to determine their achievement. The results of testing are used to reward or punish school districts or others held responsible for the achievement of the students. The idea is that the results of testing are to stimulate school improvement. After-the-fact accountability is concerned with measuring outputs.
Assessment or test: a tool used to determine the knowledge, reasoning abilities, skills, and/or feelings of students. There are a number of terms important in discussing assessments:
- Criterion-referenced assessment: one which rates how thoroughly a student has mastered a specific skill or area of knowledge. It does not allow for comparisons to be made readily between the achievement of one student and the achievement of others. Typically a criterion-referenced test is more subjective, relying on someone to observe and rate student activities or products. Performance assessments are criterion-referenced tests. [compare norm-referenced assessment].
- Essay examination: one in which a student must prepare extended written answers. The student may be asked to answer a question or to provide an explanation of the solution to a problem. For example, an open ended mathematics problem, where students are asked to show or explain their work, could be classified as an essay test. The responses are read by one or more raters who then judge the quality of the essay. An essay examination is a performance test. Short essay tests (writing samples) have been attached to more traditional achievement tests for a number of years as one component used to assess writing skill.
- High-stakes assessment: one that has major consequences for an individual or institution. For example, an examination, the results of which determine whether a person will graduate from high school, be admitted to college, or obtain a professional license, is a high-stakes test for that person. When an examination is given to students and their aggregate scores are used to determine how well a school district is educating its students, this is a high-stakes assessment for the school district.
- Local item dependence: a situation where response on one question or task influences the nature of the response on another. Traditional multiple-choice test items are usually designed carefully to be independent of one another. The success on one test item is not influenced by success on another. Performance assessments may use "chained problems," where success in answering one item is needed to be successful in answering the next one.
- Norm-referenced assessment: one which provides scores obtained by giving the test to a sample of people (the norm group) to which the scores of other students can be compared when they take the test. Norm-referenced tests compare the achievement of one student or the students of a school, school district, or state with the norm score. The ACT and SAT are norm-referenced examinations. [compare criterion-referenced test]
- Oral communication assessment: one in which a student is asked to respond orally to a question. On a day-to-day basis, this is the most widely used method to diagnose problems individual students or classrooms of students are having in learning content or skills.
- Performance assessment: one in which (a) the respondent (examinee) carries out a specific activity under the watchful eye of an evaluator, who makes judgments about the quality of achievement demonstrated (think of the scores the judges hold up at the conclusion of an Olympic athlete's performance) or (b) the work completed by the student is judged in order to evaluate the skills used to create the product (a term paper, a painting, etc. are examples). Other popular labels for performance assessments are authentic assessments, alternative assessments, response on demand assessments, exhibitions, demonstrations, student work samples, and performance portfolios. As presently popularized, most high-stakes performance assessments being produced at state and national levels:
- Use open-ended tasks.
- Have a stated (claimed) focus on complex skills.
- Employ context sensitive strategies (may assume test takers have familiarity with the question posed because of the environment in which they have been reared). This is an important consideration in judging the fairness of such tests for at-risk or disadvantaged students.
- Use compound problems requiring several performances and significant student time. Tasks may be "chained," with performance of one task influencing whether success can occur in completing the next task (see local-item dependence).
- May contain tasks which require the student to document products of group efforts as well as individual performances. Attribution of work to an individual student becomes an obvious issue when group activities form part of the assessment.
- Require a significant amount of class time to complete (some assessment items may require several class periods).
- Are scored so as to capture not just the "right answer," but also the reasonableness of the procedures used to carry out or solve the problem.
- Portfolio: a collection of student work. This can be an informal collection of student work (test papers, artwork, etc.) used by teachers to show them what their child is doing in school. It can also refer to a high-stakes, formal portfolio of performance assessments (and their associated scores), which a student must complete for promotion or graduation. When school district, state, and federal policy makers are referring to portfolios, they are often referring to high-stakes portfolios.
- Reliability: the ability of a test to produce consistent scores for the same students at different times. If test items on a test must be different each year the test is given in order to prevent cheating or teaching to the test, the differing forms of the test should be able to produce the same scores from the same students. Reliable tests have little random error in their measurements.
- Score reliability: ability of the same completed test to elicit the same test score when it is scored on different occasions.
- Rater reliability: agreement between test graders or scorers. Scoring of performance assessments is more subjective than with other assessments. Two or more raters (scorers) must score the assessment and the scores must agree or be averaged. To the extent that raters produce different scores when they grade the same test, one gets "background noise" in the scores so that there is more random error in scores and decreases reliability. Raters of performance assessments may use rubrics, sets of guidelines, to try to decrease variation in scoring between different raters. Training raters to be consistent with one another and keeping scoring consistent between raters when thousands of tests must be scored becomes difficult.
- Standardized test or objective test: a test that contains well defined questions of proven validity and which produces reliable scores. Such tests are commonly paper-and-pencil selected response examinations containing multiple-choice items, true-or-false items, or matching exercises. They may contain short fill-in-the-blanks items. Such tests may also contain performance assessment items (e.g. a writing sample); however the performance assessment items require a short time to complete and can be reliably scored. The time spent on the performance assessment items is limited to assure that enough selected response items can be administered during the allotted testing time to yield good score reliability and produce good generalizability of test results [see generalizability under the definition for validity].
- Validity: the idea that a test measures a particular domain.
- Content validity: evidence to show a test truly measures a particular content domain. The content domain is the body of knowledge, skills or abilities being examined by a test.
- Construct validity: evidence that a test is truly measuring a particular construct.
- Constructs: mental processes (for example, synthesizing information) which are not directly observable, but which one theorizes a test measures. A product produced by a student may be used to reach a conclusion about the student's abilities to perform the construct. For example, a written report may be used to determine how well a student synthesizes information from different sources.
- Face validity: the subjective reaction of a test preparer that a test measures a particular content domain. Face validity is different from content (true) validity because it does not require independent evidence that a test measures what it says it is measuring. For example, a performance assessment in mathematics might seem to be a good assessment of desirable mathematics skills (it has face validity for assessing mathematics skills). However, the scores on the performance assessment might turn out to correlate more highly with scores assessing writing ability than with traditional tests of mathematics knowledge and concepts. In this case the performance assessment, which has good face validity may have dubious validity for the content (mathematics) it was to have assessed.
- Generalizability: the idea that the score on the test items selected for a test reflects what a child knows about a subject or how well he performs the skills which the test items are supposed to be assessing. All tests are sampling devices and all have time limits for completion. Generalizability requires that enough test items are administered to truly assess a student's achievement in mastering a particular domain. Generalizability is an important component of validity. For example, if a student has completed a 3 month course in beginning algebra, one would like to know that the test items used in an assessment at the end of the course truly cover the spectrum of knowledge and skills the student should have learned during the entire course, not just the first or last week.
Outcome: "the way something turns out; result; consequence" (Webster's New World College Dictionary). The term can have numerous meanings when applied to education:
- Utilitarian definition: Various groups of people can desire different overall outcomes from our schools based upon the results they desire. For example, law makers want law abiding citizens. Business CEOs want employable workers. Some may see K-12 education as preparation for technical training or college. Schooling may also be seen as a route to general enlightenment and maturation. Most agree on three distinct purposes for schooling--for work, for public affairs, for private culture.
- Total Quality Management (TQM) or Continuous Quality Improvement (CQI) definition: This definition of outcome derives from practices which have been used by business and large public institutions to improve the quality of their products or services &/or the efficiency with which they produce products or services. The products and/or services are the outcomes of the efforts of the business or institution. The idea is that outcomes should have a minimum acceptable quality. The quality of the outcomes the business or institution is presently producing are compared against what is expected. Ways are sought continuously to improve practices in order to improve the outcomes. From the perspective of education, a TQM/CQI approach would require establishing content and skill standards and setting guidelines for expected outcome performance, e.g. the level of achievement in reading skill to be expected of all students by the end of second grade. Measurement of the performance would be the outcome. If the TQM/CQI system is working properly, when performance is below expectations for the outcome, the schools would determine specific plans for remedying the problem, e.g. if too few students are reading adequately at the end of second grade, the school would search for a method proven to improve reading achievement, implement the identified program, then determine the effect of this change upon how well second graders are reading, etc.
- Outcome-based education (OBE) definition: OBE refers to an educational movement popularized in the mid to late 1980s. The writings of educationist William Spady were influential in the evolution of this paradigm for education. Its defining characteristic is the "restructuring" of all aspects of school governance and instruction by applying two principles. First, the entire educational enterprise is to be driven by stated outcomes, which are abstract, often vague statements about behaviors (performances) which all students are supposed to satisfy. The management of each site (school) serving students is to be independent of others in a school district; the only accountability required of the sites is that they assure all students reach the outcomes. Students may take as long as necessary to reach the outcomes. Students who reach particular outcomes are said to pass through "gateways," and can then proceed further. Second, student alignment with outcomes is to be assessed by extensive use of performance assessments in the classroom and by high-stakes performance assessments which the student must complete for promotion or graduation. The emphasis on performance assessments amounts to a call for intensive use of individual and group projects as the principle mode of instruction; teachers tend to become "facilitators" rather than directors of student learning. The projects (performances) needed to satisfy the outcomes are cataloged in formal portfolios, which to a large extent come to replace traditional grades for student work in various subjects. Examples of OBE outcomes include: "Every graduate of the XYZ school district will demonstrate he or she is (1) a person who accepts the challenge of living and shapes change, (2) a learner who creates knowledge and solves problems, (3) a family member or friend who collaborates with others for mutual well-being, (4) a steward who promotes and protects people and the environment, (5) a creative, complex and perceptive thinker, and (6) a person who makes informed decisions be examining options and anticipating the consequences of outcomes [examples (1) through (5) are taken directly from a Wisconsin school district document while (6) is outcome #3 listed among the Wisconsin State Outcomes written by the Wisconsin DPI in 1994]. The "three Ps" of OBE are intensive use of performance assessments, projects, and portfolios.
Standard: "something established for use as a rule or basis of comparison in measuring or judging capacity, quantity, content, extent, value, quality, etc.; criterion set for usages and practices; level of excellence, attainment, etc. regarded as a measure of adequacy" (Webster's New World College Dictionary). Educationists couple the word with modifiers which indicate specific definitions:
- Content and skills standards: These are often referred to simply as "content standards" or simply "standards." They are explicit statements about what we want students to know and to be able to do.
- Performance standards or proficiency standards: These are statements about how well students must master particular knowledge and skills. In many ways, well written content standards will inherently describe what student performance is acceptable. The term can have slightly different meanings depending upon where the accountability for performance is applied within the educational system:
- For a student, a performance standard is an understanding of how well he or she must perform. Grades, given for differing levels of achievement, on tests and in courses are an example of student performance standards. A student also may need to achieve a minimal score on a high-stakes assessment in order to be promoted or to graduate. Or a student may be mandated to take a certain number of courses in core academic areas.
- For teachers, performance standards tend to become goals of instruction. Teachers may seek to have all students achieve a particular level with respect to well specified knowledge and skills. Students can also be challenged to achieve at higher levels than certain minimal ones required to receive a passing grade, to be promoted, etc. For example, a reading standard for second grade students requiring students to apply a knowledge of consonants and consonant blends might be viewed as so crucial that teachers must use instructional methods assuring that every child masters this skill. However, a teacher might be given more latitude in interpreting a standard requiring students to have a knowledge of sentence structure in reading. The teacher could have students learn about nouns and verbs rather than many other intricacies of sentence structure. Also, certain benchmarks for reading skill might be established for second grade students, but teachers could encourage students who readily reached this benchmark to read more difficult material.
- For schools and school districts, performance standards become a way to look at how consistently students enrolled in their classrooms reach particular levels of achievement. For example, there might be the expectation that 90% of students would be reading at a particular level by the end of third grade.
- For state school administrators and policy makers, performance standards tend to become a way to compare the overall achievement of students in different schools or school districts. Comparisons of student achievement in schools where students have the same demographic backgrounds could be compared. The achievement of the state's students can be compared to the achievement of students in other states or nations. Policy makers may set expectations for their state's schools. For example, "no student shall be promoted from the third grade unless the student has passed an objective reading test."
- Opportunity to learn, delivery, or input standards: a term with many possible meanings, all centered around what resources must be made available to or by schools. They may mandate certain programs, which all schools must offer. They could also mandate particular instructional practices or specify "approved facts," which all students must be taught. Opportunity to learn standards were part of the original federal Goals 2000 legislation, and they were seen as potentially very intrusive and readily corruptible by the political agendas of special interest groups. During the 1995-6 session of the U.S. Congress, all references to opportunity to learn standards in Goals 2000 statutes were deleted by amendments and the National Education Standards and Improvement Council (NESIC), which had been given carte blanche to develop opportunity to learn or input standards, was defunded.
P.R.E.S.S., P.O. Box 26913, Milwaukee, WI 53226
E-mail: presswis@execpc.com
http://www.execpc.com/~presswis/
Return to PRESS Home Page