Archinect

Architectural Ellipsis

... Intern Architect ...

  • anchor

    ARE 5.0 Cut Scores

    Everyday Intern
    Jan 19, '17 7:14 PM EST

    Earlier this week NCARB posted an update on the ARE 5.0 Community page dedicated to updating candidates on the number of administrations currently taken and scheduled for each ARE 5.0 division. The magic number we are looking for is 600 administrations, which is how many administrations NCARB needs to establish a cut score for each division. The cut score isn't anything new to the ARE. After each update to the ARE it is necessary to establish a new cut score in order to successfully evaluate a candidate's competency. 

    NCARB is making a valiant attempt to be transparent about the process and why it is necessary (which is quite a change from when it was done in 2004). They've written on their blog about it, offered $100 to any candidate who is one of the first 600 people to take the test in each division, and even offered to extend the $100 offer to anyone who simply took the new exams before the end of January so you didn't have to fret about whether or not you would be in the first 600 (answering questions about how many tests had been taken seemed to take up a lot of the NCARB moderators' time in November and December last year on the ARE 5.0 Community).

    With the update indicating that they've reached the magic 600 administrations threshold for the PPD division, NCARB is moving forward with the process and they've announced that they plan to have the cut score set and pass/fail notices should be sent out to candidates by mid-February. That being said, and with all the information NCARB has put out concerning the process, there still seems to be some confusion as to what the cut score actually is, and how it is set. Since I'm one of the candidates left waiting for cut scores to be set on my last two divisions, I decided to dig into the process a little more and see what I could find out.

    A cut score is just a simple way of determining what score a testing candidate must achieve to be considered minimally competent. If you score higher than the cut score, you are deemed to be competent in the subject matter and you pass the test. If you score lower than the cut score, you are not deemed to be competent in the subject matter and you fail the test. 

    NCARB has mentioned that they use the Modified Angoff method for establishing the cut score, but they drop it into casual conversation like we all understand what they mean by it. The essential question for establishing a cut score for each question on the exam is what percentage of minimally competent test candidates would answer this question correctly. 0% would mean that a minimally competent candidate would always answer the question incorrectly, and 100% would mean that a minimally competent candidate would always answer it correctly. Although 0% and 100% would never be appropriate rankings for a test question. For example, statistically, a candidate would be able to guess on a multiple choice question with four possible answers and answer correctly 25% of the time. Conversely, minimal competency does not equal perfection so requiring candidates to answer a question correctly all the time would also not be appropriate.

    Don't like that you failed to pass the ARE ... you can basically blame William Angoff

    The Modified Angoff method asks the subject matter experts to independently rate the difficulty of the test questions. Then the experts are allowed to compare their rankings to those of the other experts and reevaluate their rankings. For example if half of the experts rank a question around 50% (meaning half of minimally competent candidates will answer it correctly) but the other half rank it around 80%, they could deliberate and decide if they need to adjust their rankings and bring them toward a consensus. One point I'm unclear on is whether the panel of experts must find and agree on a consensus. If they are allowed to differ slightly in their rankings, the actual cut score for the question would then be based on an average of all the experts' rankings. For example: A question with 5 experts ranking it as follows -- 50%, 54%, 50%, 48%, and 53% -- the average would be 51% indicating that 51% of competent candidates would answer it correctly.

    After the rankings are established among the experts, they are then allowed to review actual scores from tests that have been administered. This is where the necessary 600 administrations come in. The experts are then allowed to modify their rankings again based on this data. For example, if the rankings would indicate that 80% of candidates should answer a question correctly, but the actual administrations are showing that only 50% have done so ... the experts may wish to revise their ranking. However, I don't believe that this means they have to revise them in order to match the results of the actual administrations. The question might be one that is important enough for determining competency that the experts agree that a larger majority of competent candidates should be able to answer it correctly. 

    The cut score process is not the process of setting a curve

    This opportunity for the experts to revise their rankings based on actual test data is what I believe NCARB is pointing to in the process when they say that by being an early tester you have the opportunity to "influence the cut score." I also think that this statement from NCARB is why many candidates believe that the cut score is established like a professor might base their grades on a curve or class average. However, I hesitate to acknowledge that the early testers might have that much influence, and I'm quick to point out that the cut score is not based on a curve. As I understand the process, the experts are the ones that ultimately determine competency (albeit through a rigorous process), not some average of testing candidates' scores. I imagine that in reality the effect of the test results from the early administrations probably highlight poorly written test questions more than they exert an influence over the experts' opinions. If anyone can shed more light on this though, I'd love to hear more.

    So after all the questions are ranked, the cut score is simply a calculation based upon those questions for each administration. For example, a simple 10-question test of questions ranked as follows would result in an average of those scores and the cut score of 65%:

    • Question 1: 60%
    • Question 2: 70%
    • Question 3: 75%
    • Question 4: 60%
    • Question 5: 50%
    • Question 6: 85%
    • Question 7: 55%
    • Question 8: 75%
    • Question 9: 70%
    • Question 10: 50%

    So a minimally competent candidate would need to answer 65% or more of the questions correctly in order to pass and be deemed competent in the subject matter.

    This also gets interesting when you have a pool of questions that is greater than the number needed for any particular exam administration. Take for example an exam with 100 questions, you might have 400 questions from which 100 are randomly selected. So one candidate's exam might be measurably harder than another's because they have more questions that are ranked as more difficult. However, with the cut score being based on the individual questions in the administration rather than a simple cut off score for the exam as a whole, the harder exam will require a cut score that is lower than the easier exam

    To illustrate, take another set of 10 questions to compare to the example I posted above. It has some easier questions but overall the questions are harder:

    • Question 1: 50%
    • Question 2: 60%
    • Question 3: 75%
    • Question 4: 55%
    • Question 5: 50%
    • Question 6: 65%
    • Question 7: 65%
    • Question 8: 60%
    • Question 9: 80%
    • Question 10: 50%

    In this example the cut score is now 61%. Extrapolate this from a 10-question test to one with 100 questions, and answering 63 questions correctly will result in passing the harder test, but answering the same number of questions correctly on the easier test would result in a failing the easier test ... even though the score is the same, 63%.



     
    • 19 Comments

    • It seems NCARB is still interested in greasing the wheels so they can get the cut score set for ARE 5.0 divisions as soon as possible. As announced yesterday (1/24/17), NCARB is adding an incentive by offering a 'seat credit' back to any early testers who take and fail any division. So you'll still get $100 gift card for testing, and if you fail, you don't have to pay to take the test again. 

      Early testing ends for PPD at the end of January, but they've extended a guarantee that these early testing incentives will be given to anyone who tests in any of the remaining divisions through the end of February. You'll still get the incentives if you happen to fall into the first 600 administrations or not as long as you take the division before the end of February.

      Jan 25, 17 7:42 pm  · 
       · 
      Wilma Buttfit

      Thanks for the update! I took a test today. There was a spelling error in the first question, oh NCARB, I love you.

      Jan 25, 17 8:07 pm  · 
       · 

      There was a big problem today with NCARB's server crashing and causing people to not be able to finish, or even start their exam. Apparently they've got it up and working again, so anyone testing tomorrow should be ok. 

      More info on ARE 5.0 Community threads here, here, and here.

      This sucks for those people who took the day off work to test today and basically wasted it dealing with this issue. It looks like NCARB will allow them to take the test again once they've been contacted to reschedule at no additional fee, and they'll still get the early testing incentives ... but they can't refund candidates a day's worth of work.

      Jan 30, 17 7:02 pm  · 
       · 
      Wilma Buttfit

      I took one this morning so the servers are good. I have plenty to say about the quality of the test though, namely lack of quality. They need to proofread that crap. This'll probably get me in trouble, how dare I criticize them. :)

      Jan 31, 17 3:18 pm  · 
       · 

      This'll probably get me in trouble, how dare I criticize them. :)

      tintt, I know the feeling, but then I remember I'm doing most of my criticizing from behind an anonymous screen name, and there are plenty of people doing more than I am from behind their own name tied into the NCARB system, and then I think I'll be ok. Have you been following any of the threads on the ARE 5.0 Community? Some of the posters seem like they are about ready to #occupyNCARB and one has made what seems to be an argument for boycotting ARE 5.0. Relax people.

      Generally speaking, I think NCARB is getting enough criticism for the issues with the new version, I don't feel like I have to add much. My hat is off to the NCARB staff that moderate the ARE 5.0 Community forum. Mistakes happen and technology isn't infallible, but from what I've seen NCARB has been pretty responsive in correcting issues. Hearing stories of what happened in the past with cut scores and other things, NCARB has made a lot of improvements over the years to be faster, more responsive, and more transparent. They aren't perfect so some criticism is probably still warranted, but I also think people are a little too impatient with them.

      Jan 31, 17 6:24 pm  · 
       · 
      Wilma Buttfit

      I haven't been following the 5.0 community, no. I just read some it now and I do empathize with those who had their systems crash mid-test and if it happened to me I would expect compensation for having missed a day of work for nothing, and I mean more than just a free retake. Silver lining - it was a free practice test for them? Ha. I kinda wish the computer would have gone defunct halfway through so I could have just gone home and studied more. My problems with the test were more like, uh, hey NCRAP... temperature is not measured in inches and uh, when you use less expensive material substitutes on a building you don't add cost, pretty sure you substract it. Those things were not *exactly* what I experienced, but gives you an idea. I guess I am paying for the priv to proofread their test? 

      Jan 31, 17 7:12 pm  · 
       · 
      Wilma Buttfit

      I did start laughing at a few questions. As usual, the ARE is not a competency exam but a "can u guess what we are asking? Cuz we don't really know either... but just answer it, help us out."

      And now people are going to be even more reluctant to take them... 

      Jan 31, 17 7:17 pm  · 
       · 

      I also empathize with those that will have to retake it due to circumstances outside of their control. I almost had to reschedule my last test in 4.0 due to a problem at the testing center. I was getting pretty pissed off just waiting while others were getting seated on time for whatever tests they were taking. In the end I was able to take the test, I was delayed by over an hour, but at least the time I took off work wasn't a complete waste.

      I'm right there with you on paying for the privilege to proofread too. I take a little comfort in the fact that I'm getting $100 back for each test so in a way they're reimbursing me for proofreading. I get reimbursed for passing the exams from my firm, so in the end I'll make $200. Almost makes me want to take the other 5.0 exams I already have credit for just so I can make some more money. 

      Jan 31, 17 7:47 pm  · 
       · 
      Wilma Buttfit

      EI, did you witness any graphics that were illegible? As in fuzzy pixels? I had a few.

      Jan 31, 17 9:08 pm  · 
       · 

      Yes, I had a few as well. I didn't think that it hindered my ability to understand and answer the question though. Which in NCARB world probably means they hit it out of the park.

      I am really disappointed in the legibility of the references for structural that NCARB includes in the exam. Again, I don't think it hurt me the times I needed to use the references, but here is an example of some beam equations directly from the NCARB Demonstration Exam ...

      NCARB says they are exactly the same as they were for 4.0. I don't have any experience with SS in 4.0 to compare it to, but even if they are the same, this is unacceptable. How hard can it be to provide this content in a way that doesn't result in a fuzzy, rasterized image?

      Feb 1, 17 11:53 am  · 
       · 
      Wilma Buttfit

      Yeah, I agree it wasn't enough to cause a failure but it does cause fatigue. I don't recall fuzzy stuff in 4.0 questions. It just isn't professional. I guess I have too high of expectations for a group that is going to be the gatekeepers for a 'profession' but I don't feel like lowering my standards. NCARB seems to be run by a bunch of high schoolers. Sorry to say something so negative but I think it needs be said over and over. The other thing is that using a standardized test for determining competency in architecture is probably not appropriate. So making it really awkward to make it hard seems wrong. Other tests are hard if they are hard, not because you can't read the text or there are errors in the questions. 

      Feb 1, 17 12:48 pm  · 
       · 

      tintt, I'm curious to hear how you felt about the time allotted to the exams. I'm seeing a lot of people complaining that they didn't have enough time to finish, but I didn't think it was that bad. I didn't feel rushed to get through the test or review my answers. I ended both the 5.0 exams I took with around 30 minutes left on the clock.

      Feb 2, 17 4:18 pm  · 
       · 
      Wilma Buttfit

      I finished PPD with plenty of time left, maybe 2.5 hours. For PDD, I had about an hour to spare. 

      Feb 2, 17 4:22 pm  · 
       · 

      2.5 hours left ... impressive. I took my sweet time on PPD apparently. We were about the same on PDD.

      On PDD I had about an hour left after I was done answering and reviewing the questions I had marked, then I decided to go through all the questions one last time. Partially to get an idea of how much lag/loading time there was. Also to quickly skim and verify I had answered all the "mark multiple options" questions with the correct number of options. It took about 20 minutes to go through the entire test. I estimate I spent about 5 secs on average (maybe more) skimming each question. Coincidentally, that works out to be about a 5 sec average loading time between each question (maybe less). There were one or two questions that took a long time to load, but overall I don't think it hindered my ability to test. It was annoying, yes ... but didn't hinder me.

      Feb 3, 17 1:38 pm  · 
       · 
      Wilma Buttfit

      PDD was a breeze, practically no calcs. Maybe I got an easy one. PDD was quite difficult. I had a lot of calcs, fuzzy graphics, and multiple step problems to solve. I used all my scratch paper on that one too and had never used more than 2 sheets on any other test. I didn't review any questions except the ones I marked for revie which eas, 3-4 on PPD and 7-8 on PDD. I'm sure revisting those questions didn't make a difference because if I didn't know it the first time, I still didn't know it. The case studies didn't bother me at all. I did experience delays in loading the references for them, it was annoying, a few seconds each time, but not unacceptable. Some people say it took up to 30 seconds to load, I didn't experience that. The case studies made much more sense than the vignettes from 4.0. and I feel are a great improvement in the test. I would take all of my q's in case study form if I could. I always ran the clock down on the vignettes just from trying to pan and zoom and use the other "tools".

      Feb 3, 17 3:12 pm  · 
       · 
      Wilma Buttfit

      PPD* was the easy one (oops.)

      Feb 3, 17 4:00 pm  · 
       · 

      Latest news is that the cut score for one form of the PPD exam has been set, and they are working on the other forms for this exam. But the mid-February date for score reports has been pushed back to Feb 28th (barring any complications). Apparently, for this first set of reports, NCARB wants to make sure they dot all the I's and cross all the T's before sending them out. 

      Keep in mind that NCARB's initial estimate was that the cut score process would only delay receiving scores 6-10 weeks. Feb 28th would be 17 weeks by my count.

      Interpreting all of this means that you can probably expect the first score reports to be released at the beginning of April. At least that's the way I'm approaching it; I'm setting my expectations extremely low. That way I can be pleasantly surprised when they are released earlier. 

      Feb 10, 17 12:07 pm  · 
       · 
      Wilma Buttfit

      I don't care. I'm not itching to get my results back anyways. Funny to suggest that NCARB is into dotting i's and crossing t's. HA! We all know better. Keeping low expectations is an important life skill to develop, right?

      Feb 10, 17 4:15 pm  · 
       · 

      Dotting and crossing was my interpretation ... perhaps it was too bold a suggestion. I think their words were more like check and double check fields are populated correctly, and making sure I don't get your score report by mistake. 

      Feb 10, 17 5:39 pm  · 
       · 

      Block this user


      Are you sure you want to block this user and hide all related comments throughout the site?

      Archinect


      This is your first comment on Archinect. Your comment will be visible once approved.

    • Back to Entry List...
  • ×Search in:
 

About this Blog

An ellipsis [...] is used to signal an omission, an unfinished thought, aposiopesis, or brief awkward silence. Architectural ellipses are those aspects of the profession we (perhaps intentionally) omit, gloss over, or let dwindle in silence. Generally applied this blog should encompass many aspects of the profession. Yet, as an intern architect (now architect) I'll focus primarily on the architectural ellipses that occur in the internship process (and beyond).

Authored by:

Recent Entries