The primary advantage of norm-reference tests is that they can provide information on how an individual's performance on the test compares to others in the reference group. A serious limitation of norm-reference tests is that the reference group may not represent the current population of interest. As noted by the
Oregon Research Institute's
International Personality Item Pool website, "One should be very wary of using canned 'norms' because it isn't obvious that one could ever find a population of which one's present sample is a representative subset. Most 'norms' are misleading, and therefore they should not be used. Far more defensible are local norms, which one develops oneself. For example, if one wants to give feedback to members of a class of students, one should relate the score of each individual to the means and standard deviations derived from the class itself. To maximize informativeness, one can provide the students with the frequency distribution for each scale, based on these local norms, and the individuals can then find (and circle) their own scores on these relevant distributions." Norm-referencing does not ensure that a test is valid (i.e. that it measures the construct it is intended to measure). Another disadvantage of norm-referenced tests is that they cannot measure progress of the population as a whole, only where individuals fall within the whole. Rather, one must measure against a fixed goal, for instance, to measure the success of an educational reform program that seeks to raise the achievement of all students. With a norm-referenced test, grade level was traditionally set at the
level set by the middle 50 percent of scores. By contrast, the National Children's Reading Foundation believes that it is essential to assure that virtually all children read at or above grade level by third grade, a goal which cannot be achieved with a norm-referenced definition of grade level. Norms do not automatically imply a standard. A norm-referenced test does not seek to enforce any expectation of what test takers should know or be able to do. It measures the test takers' current level by comparing the test takers to their peers. A rank-based system produces only data that tell which students perform at an average level, which students do better, and which students do worse. It does not identify which test takers are able to correctly perform the tasks at a level that would be acceptable for employment or further education. The ultimate objective of grading curves is to minimize or eliminate the influence of variation between different instructors of the same course, ensuring that the students in any given class are assessed relative to their peers. This also circumvents problems associated with utilizing multiple versions of a particular examination, a method often employed where test administration dates vary between class sections. Regardless of any difference in the level of difficulty, real or perceived, the grading curve ensures a balanced distribution of academic results. However, curved grading can increase competitiveness between students and affect their sense of faculty fairness in a class. Students are generally most upset in the case that the curve lowered their grade compared to what they would have received if a curve was not used. To ensure that this does not happen, teachers usually put forth effort to ensure that the test itself is hard enough when they intend to use a grading curve, such that they would expect the average student to get a lower raw score than the score intended to be used at the average in the curve, thus ensuring that all students benefit from the curve. Thus, curved grades cannot be blindly used and must be carefully considered and pondered compared to alternatives such as criterion-referenced grading. Furthermore, constant misuse of curved grading can adjust grades on poorly designed tests, whereas assessments should be designed to accurately reflect the learning objectives set by the instructor. ==See also==