Thursday, July 9, 2020

PSAT Concordances and National Merit

When College Board published the initial PSAT concordance in PSAT/NMSQT Understanding Scores 2015 in January 2016, it  labeled the tables  preliminary. It appears that they are on the verge of publishing the â€Å"final† version. Word was that it would arrive on June 3, but the College Board website has not been updated. A reader interested in the impact on National Merit cutoffs was kind enough to send along an advance copy last week. This analysis is based on that copy. Should the final version prove different, I will publish an update as soon as possible. The new document  is not a Rosetta stone for National Merit Semifinalist  cutoffs for the class of 2017. In fact, it does a poorer job in comparing scores at the top  percentiles than did the January concordance.  Had the May version been released with PSAT scores, it would have set cutoff expectations far too high. [The concordances are not used by NMSC to determine cutoffs; they are only useful for making estimates.] If you are expecting to glean new information about NMSF cutoffs from the latest document, I’d urge you to resume your regularly scheduled work. If you want to dive into some of the ways of interpreting the new data  and ways in which the conflicting PSAT and SAT concordances are at odds, please read on. Understanding (May update) is notable in what did not change. All of the percentile information is the same as that released in January. Frankly, there was no expectation that College Board would update the percentiles. It has said that it intends to stick with the research study percentiles for both the PSAT and SAT. When October 2016 PSAT results are returned, College Board will include true October 2015 percentiles. Until that time, PSAT percentiles are flawed and should be treated as nothing more than gross indicators. The reported standard error of measurement has gone up modestly in the latest version of Understanding (from 42 points to 46 points on total score). It quickly became obvious that the updated concordance tables are not the result of a separate, in-depth equipercentile concordance of old PSAT and new PSAT. Rather, they represent the new SAT concordances lopped off at end of the PSAT score ranges. This is true for test scores, section scores, and Selection Index. Below is the concordance of the 60 to 240 Selection Index scores (600 to 2400 CR+M+W SAT scores) to new PSAT and SAT scores. Note that the green and orange lines — PSAT (May update) and SAT — follow the exact same path until the PSAT hits a wall at its maximum score of 1520. A note on comparing total scores and Selection Index. One difficulty in using National Merit data to judge the accuracy of the concordances is that there is no concordance of old Selection Index to new Selection Index. Instead, we must substitute the concordance of the old Selection Index to the new total score. Because the Selection Index weights the verbal component at twice the math component, the split matters. A 1440 made up of 720 EBRW and 720 Math has a new Selection Index of 216. A 1440 made up of 680 EBRW and 760 Math has a Selection Index of 212. The good news is that the gap between verbal and math scores has narrowed with the new test. On average, in fact, EBRW scores are higher on the PSAT than the Math scores. This is unprecedented. At the upper tail, though, we still see more high Math scores. For the convenience of presenting the information graphically, we will assume that EBRW and Math are split evenly. A note on the challenge of test scales and edge cases. Measuring student performance  across a wide score range is a challenging task. Test makers try to avoid overly long exams, but they must present enough questions at appropriate difficulty levels to accomplish their goals. Sometimes, a score range promises more than it can deliver. Even though the PSAT covered easier material than the SAT and was a shorter test, a perfect score on a section was still an 80 and a perfect test a 240. Correcting this scale overreach  is one of the reasons why it made sense to reconsider PSAT scoring and establish it on a 160-760 range (48-228 Selection Index). To some degree, scores are stretched or truncated on virtually every standardized test with fixed endpoints. A perfect raw score on a harder SAT form might actually equate to an 820 were the scale allowed to run free. Instead, it is truncated at 800. Without the demand of a fixed endpoint, a perfect raw score on an easier form might be more accurately presented as a 780. It must be stretched to an 800. The artifacts of this kneading can be seen when looking at the raw-to-scaled score tables of the old PSAT. There was often a precipitous drop-off in score with the first missed question or two or three. These drops are the tell-tale signs of a scale that has been stretched to the breaking point. Raw ScoreMathWriting 3980 388078 377673 367372 357170 346968 336865 The new PSAT concordance dead ends just when things are getting really interesting on the SAT. A 1520 on the new SAT is concordant with a 2210 on the old. A 1600 compares to a 2390. Over that 80 point range, the concordant values increase by 180 points — the steepest rise across the entire scale. I find this rapid rise — likely due to the stretching that often occurs near the endpoints of a score range —  about as unlikely as the hard cap the PSAT runs into at 1520 / 221. The wild ride  of the blue line at the low end of the range is an example of what can happen when data is thin or when scores are forced to fit specific endpoints. The concordance lurches  upward because few students achieve the lowest scores on the new PSAT. The lack of a guessing penalty means that even modestly test savvy students can score in the 500s with ease.  The May version implies  a much smoother roll-on. Its hard to know which version is right. Given the scoring rules of the new test, the quick-off-the-line nature of the January concordance seems far more likely.   Without more technical information, it’s not clear that the lower ranges should have even been reported. Inappropriately compared scores can be worse than no comparison at all. This is why you will not see an ACT to SAT concordance go below an 11 ACT Composite. College Board, in contrast, accounts for every point value along the new and old score ranges in its old-to-redesigned concordances. â€Å"In evaluating the results of an equipercentile concordance, it is vital to note that the equipercentile function does not directly take into account the number of observations at individual score points. [Concordance] results might be unstable at score points where very few examinees score, even though we might have a very large sample size overall.† Mary Pommerich, Concordance: The Good, the Bad, and the Ugly. Linking and Aligning Scores and Scales. Neil Dorans, Mary Pommerich Paul Holland (Eds.). (2007). How well do the concordances match the actual data? The SAT concordance has far more long-term import than the PSAT/NMSQT concordance, but the latter has the advantage of an entire cohort having already taken the exam. The SAT will not be in the same situation until spring  2017.   College Board has released little actual PSAT data (most is from sample studies), but a number of critical points have leaked out. The high end of the score range has the greatest import for National Merit purposes. This also means that we can flip things around and use what we know about National Merit and the National Hispanic Recognition Program  to assess the accuracy of the concordance versions. Some of the problems faced are familiar.  Given that any new PSAT or SAT concordance was derived from pilot studies of modest sizes and questionable provenance, it’s not surprising that there would be questions about the alignment of low and high scores. The edge cases are particularly impacted by low sample sizes.  When looking at the case of  National Merit Semifinalists, it seems unlikely that College Board was able to sufficiently sample the population. NMSF cutoffs in the most competitive states are nothing if not edge cases. Even the heartiest of sample sizes starts to go grow thin as scores approach 223, 224, or 225 on the new PSAT. For Selection Index scores in the 180 200 range on the old PSAT, the May concordance performs similarly to or better than  the January concordance. As scores increase, however, the two concordances diverge. Hitched to the SAT concordance, the May version does not behave well as it approaches the top of the PSAT range. A critical point is the Commended level for National Merit because it reflects a very similar group of students year-to-year. For the October 2014 test (class of 2016), the cutoff was 202. For the October 2015 test (class of 2017), the cutoff is  209. The January concordance (blue) would have predicted a new cutoff of approximately 207. The May concordance would imply a cutoff of 213. Advantage January. Still higher up the scale, partial  data in Texas and California suggest National Merit Semifinalist cutoffs for the class of 2017 in the 218-220 and 219-221  range for those states (which had cutoffs of  220 and 223 for the class of 2016). The January concord ance fits well with these estimates. The May final concordance would predict NMSF cutoffs of 226 and 228 for Texas and California. These figures are comically out of tune with actual results. If the newer concordance were to be believed, students with perfect scores would take all of the National Merit Semifinalist spots in New Jersey, Massachusetts, California, Virginia, Maryland, and District of Columbia. Another seven states would have Semifinalists with 1500 total scores (approximately 225 Selection Index) or higher. These results are not  correct. The new concordance overestimates — probably by several hundred percent — the number of students receiving perfect scores, for example. How different are the January and May PSAT concordances? New PSAT Total ScoreOld PSAT SI Jan ConcordanceOld PSAT SI May Concordance Difference 152023922118 150023321716 148022621313 146022020911 14402142068 14202092027 14002061997 13001911829 12001721675 11001541513 10001381362 The table  above uses the old PSAT Selection Index as a measuring stick to show how the difference between the two concordances grows at the high end of the score range. Compass research shows that the earlier concordance is more trustworthy in this range. Nothing is Real What confuses many interpreters is the assumption that the May concordance — being final — is  based on actual PSAT results. This is far from the case. No College Board concordance for the new PSAT or SAT is based on operational data! There are a number of sound, practical reasons for this. For concordance results to be useful, they must be generalizable. Strong concordances such as the old SAT to ACT linkage are built on large numbers of students taking tests at similar times under similar, operational conditions. March SAT students  do not fit this mold  because they are not representative of the full cohort of the class of 2017. Those testers also having  recent  old SAT scores would be an even less apt group. There would also be a strong bias given the fact that the new SAT would, in all cases, have been taken after the old SAT. Similarly, no students were taking operational old and new PSATs in a way that would meet the appropriate  conditions. Instead, College Board had to create study samples that it hoped would  model overall  results. In some cases this involved convincing schools to offer sample  tests to their students outside of the actual test dates. In other cases, this involved paying college students to take tests. Its not hard to imagine the logistical and statistical challenges involved in these sorts of studies. Yet College Board seemed to do a good job with this in the January concordance. Where it has taken a wrong turn is in letting the SAT concordance drive the bus.  The old PSAT is not the old SAT. The new PSAT is not the new SAT. The concordances should be based on the conditions specific to each exam. In the big picture, a PSAT concordance has a limited audience. Although useful for school districts and states for longitudinal research, concordant PSAT scores will not be used by any college for admission purposes. National Merit Scholarship Corporation has no need of a concordance in determining cutoffs. By the time next year’s scores come out, virtually no sophomore or junior will have old PSAT scores in need of comparison. Nor is it mission critical for College Board that PSAT scores be directly comparable to ACT scores. All of this points to why — once the SAT concordance was ready — the PSAT concordance would be swallowed up whole. It was as much a political decision as a psychometric one. Were College Board to leave the PSAT concordance independent of the SAT concordance, there would be ongoing confusion about which concordance was definitive. A polite way of describing the PSAT concordance is â€Å"aspirational.† It conforms with the most desirabl e outcome, not necessarily the most accurate one. â€Å"Since the redesigned PSAT/NMSQT is scaled to the redesigned SAT, some small adjustments to the PSAT/NMSQT concordance may be necessary once the scaling of the redesigned SAT is final in March 2016.† College Board  Guide to Concordance, October 2015 The College Board quote above seems like a reasonable hedge about PSAT results. The problem is that it makes little logical  sense. In order to distinguish the new, vertical scale established in May, let me abbreviate it as scale-VM.   No PSAT/NMSQT has ever been given or scored under scale-VM. The October 2015 PSAT was created and graded prior to scale-VM. The January concordance was created prior to scale-VM. The old SAT was obviously created before scale-VM. Every single data point was fixed before the scale-VM was â€Å"finalized.† One of the points of a concordance is that it avoids concerns over scales. College Board didnt have to worry about what grading standards or scaling were used by ACT during their joint concordance study in 2005  because student scores determined the results.  The notion of using scale-VM as the basis of a PSAT concordance is a political fiction. It does not fit the actual data; it merely clarifies which concordance College Board has ch osen as its future. Why such a fuss over a conversion table? My research started with the simple mission of seeing what information the new concordance might provide about National Merit scores. I was asked a question about the new tables, and I sought an answer. As I dug deeper, though, I grew uncomfortable with the way the new concordance was being presented. There may not be a more successful concordance than that between the old SAT and the ACT. Its history goes back more than 50 years and has involved some of the top experts in equating and linkages at College Board, ETS, and ACT. At each update, the concordance grew more sophisticated and the sample more representative. The 1997 concordance included over one-hundred thousand students. For the 2005 concordance, three times as many students were considered. Cooperation — always a sticking point with concordance of different tests by different companies — improved over time (although rarely without some kicking and screaming). The technical underpinnings have been openly researched and discussed. The correlations — and the differences — between the tests were understood. I mention all of this as a counterpoint to what we currently face with the new PSAT and SAT concordances. It is disconcerting to go back to the â€Å"bad old days† when concordances were prepared in secret and results simply presented as tables with inputs and outputs. We do not even know how well the old tests correlate with the new, let alone  answers to how the studies were performed, what participants were involved, and what level of error was accepted. I do not see evidence that the May concordance provides students with better information than did the January concordance. For above average scorers, the new concordance paints an overly pessimistic picture of how new scores compare with old. Most students are already on an SAT or ACT path (both for some), so the PSAT is in the rearview mirror. National Merit hopefuls are advised to ignore the May concordance. A repackaged SAT concordance is not a useful tool in analyzing 2015 PSAT/NMSQT scores.