So about that actionable data…
One of the frequently-offered reasons for the Big Standardized Tests is that they are supposed to provide information that will allow classroom teachers to “inform instruction,” to tweak our instruction to better prepare for the test better educate our students. Let me show you what that really means in Pennsylvania.
Our BS Tests are called the Keystones (we’re the Keystone State– get it?). They are not a state requirement yet– the legislature has blinked a couple of times now and kicked that can down the road. Because these tests are norm-referenced aka graded on a curve, using them as a graduation requirement is guaranteed to result in the denial of diplomas for some huge number of Pennsylvania students. However, many local districts like my own, make them a local graduation requirement in anticipation of the day when the legislature has the nerve to pull the trigger (right now 2019 is the year it all happens). The big difference with a local requirement is that we can offer an alternative assessment; our students who never pass the Keystones must complete the Binder of Doom– a huge collection of exercises and assessment activities that allow them to demonstrate mastery. It’s no fun, but it beats not getting a diploma because you passed all your classes but failed on bad standardized test.
Why do local districts attach stakes to the Keystones? Because our school rating and our individual teacher ratings depend upon those test results.
So it is with a combination of curiosity and professional concern that I try to find real, actionable data in the Keystone results, to see if there are things I can do, compromises I can make, even insights I can glean from breaking that data down.
The short answer is no. Let me walk you through the long answer. (We’re just going to stick to the ELA results here).
The results come back to the schools from the state in the form of an enormous Excel document. It has as many lines as there are students who took the test, and the column designations go from A to FB. They come with a key to identify what each column includes; to create a document that you can easily read requires a lot of column hiding (the columns with the answer to “Did this student pass the test” are BP, BQ and BR.
Many of the columns are administrivia– did this student use braille, did the student use paper or computer, that sort of thing. But buried in the columns are raw scores and administrative scores for each section of the test. There are two “modules” and each “module” includes two anchor standards segments. The Key gives a explanation of these:
I can also see raw scores broken down by multiple choice questions and “constructed” answers. The constructed answers can get a score of 1 through 10.
Annnnnnnnd that’s it.
You might think that a good next step would be to look at student results broken down by questions with those questions tagged to the particular sub-standard they purport to measure. That’s not happening. In fact, not only are these assessment anchors not broken down, but if you go to the listing of Pennsylvania Core Standards (because we are one of those states that totally ditched renamed Common Core), you will see that L.F.1 etc only sort of correspond to specific Core Standards.
You might also think that being able to see exactly what questions the students got wrong would allow me to zero in on what I need to teach more carefully or directly, but of course, I am forbidden to so much as look at any questions from the test, and if I accidentally see one, I should scrub it from my memory. Protecting the proprietary materials of the test manufacturer is more important than giving me the chance to get detailed and potentially useful student data from the results.
You’ll also note that “reading for meaning” is assessed based on no more than six or seven questions (I don’t know for a fact that it’s one point per question, but the numbers seem about right based on student reports of test length– not that I’ve ever looked at a copy of the test myself, because that would be a Terrible Ethical Violation).
So that’s it. That’s my actionable data. I know that my students got a score by answering some questions covering one of four broad goals. I don’t know anything about those questions, and I don’t know anything about my students’ answers. I can compare how they do on fiction vs. non-fiction, and for what it’s worth, only a small percentage shows a significant gap between the two scores. I can see if students who do well in my class do poorly on the test, or vice-versa. I can compare the results to our test prep test results and see if our test prep test is telling us anything useful (spoiler alert– it is not).
But if you are imagining that I look at test results and glean insights like “Man, my students need more work on interpreting character development through use of symbolism or imagery” or “Wow, here’s a list of terms I need to cover more carefully” or “They’re just now seeing how form follows function in non-fiction writing structures”– well, that’s not happening.
In the world of actionable data, the Keystones, like most of the Big Standardized Tests, are just a big fat couch potato, following a design that suggests their primary purpose is to make money for the test manufacturing company. Go ahead and make your other arguments for why we need to subject students to this annual folly, but don’t use my teaching as one of your excuses, because the BS Test doesn’t help me one bit.