Revisiting the feedback from the last element, here's how the teachers scored the various student responses when given no direction in how to award partial credit
Did you see that? DID YOU SEE THAT? So let's assume there were 9 more questions on this unit test. You could have a student in one class scoring a 38%. If they walked to a different building they mght score an 89%. That's not quite what I imagine when I hear the words obective. This is the other big thing about the 'Standard' in standards based grading... it means that there is a shared vision of what proficiency looks like. You can't teacher proof this folks. you have to talk about what the evidence looks like. When you reduce the number of categories on a scoring scale you reduce those subjective factors and increase you chances of having some semblance of inter-rater reliability (of course you could still be way off on the validity of the judgement, but one thing at a time).
Let's take the two problems below. Both have to do with bouyancy, density, water displacement, etc. The second one is in a different league. If I have an assessment that is made up of items that look like the one with the two boxes and the teacher across the hall has ones that like the boat problem, the percentages don't mean the same thing. If a student scores 12 out of 20 on the second test and 19 out of 20 on the first test, I still feel better about the student that scored 60% because the evidence (i.e. the complexity and difficulty of the task) is way more robust.