Earlier this week NCARB posted an update on the ARE 5.0 Community page dedicated to updating candidates on the number of administrations currently taken and scheduled for each ARE 5.0 division. The magic number we are looking for is 600 administrations, which is how many administrations NCARB needs to establish a cut score for each division. The cut score isn't anything new to the ARE. After each update to the ARE it is necessary to establish a new cut score in order to successfully evaluate a candidate's competency.
NCARB is making a valiant attempt to be transparent about the process and why it is necessary (which is quite a change from when it was done in 2004). They've written on their blog about it, offered $100 to any candidate who is one of the first 600 people to take the test in each division, and even offered to extend the $100 offer to anyone who simply took the new exams before the end of January so you didn't have to fret about whether or not you would be in the first 600 (answering questions about how many tests had been taken seemed to take up a lot of the NCARB moderators' time in November and December last year on the ARE 5.0 Community).
With the update indicating that they've reached the magic 600 administrations threshold for the PPD division, NCARB is moving forward with the process and they've announced that they plan to have the cut score set and pass/fail notices should be sent out to candidates by mid-February. That being said, and with all the information NCARB has put out concerning the process, there still seems to be some confusion as to what the cut score actually is, and how it is set. Since I'm one of the candidates left waiting for cut scores to be set on my last two divisions, I decided to dig into the process a little more and see what I could find out.
A cut score is just a simple way of determining what score a testing candidate must achieve to be considered minimally competent. If you score higher than the cut score, you are deemed to be competent in the subject matter and you pass the test. If you score lower than the cut score, you are not deemed to be competent in the subject matter and you fail the test.
NCARB has mentioned that they use the Modified Angoff method for establishing the cut score, but they drop it into casual conversation like we all understand what they mean by it. The essential question for establishing a cut score for each question on the exam is what percentage of minimally competent test candidates would answer this question correctly. 0% would mean that a minimally competent candidate would always answer the question incorrectly, and 100% would mean that a minimally competent candidate would always answer it correctly. Although 0% and 100% would never be appropriate rankings for a test question. For example, statistically, a candidate would be able to guess on a multiple choice question with four possible answers and answer correctly 25% of the time. Conversely, minimal competency does not equal perfection so requiring candidates to answer a question correctly all the time would also not be appropriate.
Don't like that you failed to pass the ARE ... you can basically blame William Angoff
The Modified Angoff method asks the subject matter experts to independently rate the difficulty of the test questions. Then the experts are allowed to compare their rankings to those of the other experts and reevaluate their rankings. For example if half of the experts rank a question around 50% (meaning half of minimally competent candidates will answer it correctly) but the other half rank it around 80%, they could deliberate and decide if they need to adjust their rankings and bring them toward a consensus. One point I'm unclear on is whether the panel of experts must find and agree on a consensus. If they are allowed to differ slightly in their rankings, the actual cut score for the question would then be based on an average of all the experts' rankings. For example: A question with 5 experts ranking it as follows -- 50%, 54%, 50%, 48%, and 53% -- the average would be 51% indicating that 51% of competent candidates would answer it correctly.
After the rankings are established among the experts, they are then allowed to review actual scores from tests that have been administered. This is where the necessary 600 administrations come in. The experts are then allowed to modify their rankings again based on this data. For example, if the rankings would indicate that 80% of candidates should answer a question correctly, but the actual administrations are showing that only 50% have done so ... the experts may wish to revise their ranking. However, I don't believe that this means they have to revise them in order to match the results of the actual administrations. The question might be one that is important enough for determining competency that the experts agree that a larger majority of competent candidates should be able to answer it correctly.
The cut score process is not the process of setting a curve
This opportunity for the experts to revise their rankings based on actual test data is what I believe NCARB is pointing to in the process when they say that by being an early tester you have the opportunity to "influence the cut score." I also think that this statement from NCARB is why many candidates believe that the cut score is established like a professor might base their grades on a curve or class average. However, I hesitate to acknowledge that the early testers might have that much influence, and I'm quick to point out that the cut score is not based on a curve. As I understand the process, the experts are the ones that ultimately determine competency (albeit through a rigorous process), not some average of testing candidates' scores. I imagine that in reality the effect of the test results from the early administrations probably highlight poorly written test questions more than they exert an influence over the experts' opinions. If anyone can shed more light on this though, I'd love to hear more.
So after all the questions are ranked, the cut score is simply a calculation based upon those questions for each administration. For example, a simple 10-question test of questions ranked as follows would result in an average of those scores and the cut score of 65%:
So a minimally competent candidate would need to answer 65% or more of the questions correctly in order to pass and be deemed competent in the subject matter.
This also gets interesting when you have a pool of questions that is greater than the number needed for any particular exam administration. Take for example an exam with 100 questions, you might have 400 questions from which 100 are randomly selected. So one candidate's exam might be measurably harder than another's because they have more questions that are ranked as more difficult. However, with the cut score being based on the individual questions in the administration rather than a simple cut off score for the exam as a whole, the harder exam will require a cut score that is lower than the easier exam
To illustrate, take another set of 10 questions to compare to the example I posted above. It has some easier questions but overall the questions are harder:
In this example the cut score is now 61%. Extrapolate this from a 10-question test to one with 100 questions, and answering 63 questions correctly will result in passing the harder test, but answering the same number of questions correctly on the easier test would result in a failing the easier test ... even though the score is the same, 63%.
An ellipsis [...] is used to signal an omission, an unfinished thought, aposiopesis, or brief awkward silence. Architectural ellipses are those aspects of the profession we (perhaps intentionally) omit, gloss over, or let dwindle in silence. Generally applied this blog should encompass many aspects of the profession. Yet, as an intern architect I'll focus primarily on the architectural ellipses that occur in the internship process.