Thursday, March 5, 2015

Validity of Standardized Tests and Solutions for Change in Education

I wrote the following paper for my Evaluation of Student Learning class on the topic: Assessment of Validity in Standardized Tests.

Validity of Standardized Tests and Solutions for Change

Validity of Assessments
Authenticity arguments improve our assessment instruments to be both reliable and valid (Winke, 2011; Chapelle, 1999). Reliability means that the assessment results are reproducible and repeatable (Davies, 2011; Chapelle, 1999). Validity, according to Chapelle (1999), is the overall quality and acceptance of an assessment, including concurrent validity—which measures the same skills and knowledge as other assessments—and predictive validity to predict future performance or skill development. 

Standardized testing may or may not be valid. Wiggins (1993) believes that conventional test design assumptions are false because they are based on being able to break knowledge down into elements and being able to know a particular concept in absolutely every context. Standardized tests do not assess whether all students everywhere have the same “knowledge” because genuine intellectual performance is individualized (Wiggins, 1993). A test’s reliability, concurrent validity, and predictive validity can be quantitatively measured, though the statistics of these tests shows a narrow perspective; teacher’s opinions are an important component in determining exam validity (Winke, 2011). For a particular language exam, teachers disagreed with the time dedicated for testing, inappropriate length and difficulty of the test, and the singling out and social labeling of ELL students (Winke, 2011). Many teachers said, “the test stressed and frustrated some students, made them feel inadequate, humiliated, or embarrassed, or led them to question their self-worth” (Winke, 2011). Valid tests must be developmentally appropriate, fair, feasible, and practical for students as well as statistically reliable, concurrently valid, and predictably valid (Winke, 2011). 

On the contrary, Slomp et al. (2014) argues that content, concurrent, and predictive validity evidence have failed; we now call upon construct and consequential validity evidence. In standardized tests, complex constructs, such as writing ability, are less likely to be assessed completely (Slomp et al., 2014). Studies found that “class time was diverted from regular instruction to focus on test preparation [for] low-level skill-and-drill work rather than higher-order literacy skills; teachers fell substantially behind in their course material; and teachers felt compelled to prepare students for the test even though they questioned its usefulness and validity” (p. 294). Test standardization stunts the growth of innovative teachers but reinforces veteran teacher strategies that lack diversity (Slomp et al., 2014). Tests constrained writing as a construct by ignoring the importance of differentiated assessment (collecting multiple evidences of student learning over time), limited pedagogical diversity by encouraging convergent thinking, and marginalized students and teachers by undermining diversity in the classroom (Slomp et al., 2014).

Not everyone agrees on what it means to have valid assessment. Davies (2011) says that validity is the extent to which the evidence from several sources aligns with the learning objective. While the above articles argued the validity of standardized tests, a more practical approach can be taken in classrooms:
"Evidence of learning needs to be diverse because it requires performance and self-assessment or reflection to demonstrate application and the ability to articulate understandings. This means that written work or test results can never be enough. Observing application of knowledge, listening to students articulate understandings, and engaging students in demonstrating acquisition of knowledge can be valid evidence." (Davies, 2011)
Davies (2011) argues that triangulation of evidence increases both reliability and validity. Triangulation involves observations, conversations, and collecting products (Davies, 2011). In triangulation, standardized tests only make up a small portion of summative assessment under the category of collecting products, and therefore are not entirely valid.

My Personal Connections
My grade 9 Social Studies teacher instructed our class heavily on Russian history, the implications of communism in the USSR, the industrial revolution, and the implications of capitalism on North American society. I remember studying dutifully; I could have been assessed as understanding all course content with impeccable detail. Yet, when I wrote my PAT (Provincial Achievement Test) it focused on applying ideologies to made-up economic situations. Everyone was frustrated: our teacher had thought that she had prepared us as well as she could, while my classmates and I felt like we had not been assessed on what we had learned. The standardized test caused a lot of anxiety and confusion. I ended up feeling that school put an immense pressure on answering as many questions correct as possible, making me stop caring about what I had learned. Was the time taken to learn content and facts wasted? While I value the history that I learned, I certainly felt as though Alberta Education did not.

Many of my teachers since have revolved their instruction around exam content. Ironically, I have not learned subject matter as in-depth in such courses when compared to my excellent teacher with passionate and engaging differentiated instruction. I have found that both high school students and teachers put far too much emphasis on test results. I placed a huge priority on my grade 12 Social Studies diploma, but afterward intentionally forgot all course content. If I had been re-assessed later, I doubt I would have received honors. I do not even remember what we covered. I do remember getting an 83% in grade 9 and a 95% in grade 12, but I believe that I learned more in the former and was not validly assessed in either. I do not feel that standardized testing promoted the longevity of my learning.

As a tutor, I have worked with students who experience test anxiety. In tutoring sessions, one girl seemed like a 70 percentile student and yet her quiz and exam grades were failing. I encouraged her to ask her teacher for an oral interview. As a student teacher, I watched my teacher associate give oral re-tests. Without the writing component and formal situation, many students were able to verbally explain their understandings. Yet we expect students to perform well on standardized tests even if it disadvantages them. Why do this?

My Conclusions on Test Validity
I do not think that standardized tests are valid forms of assessment. Concurrent validity is meaningless to me because I do not care if a test can measure the same knowledge as another test—that way of thinking justifies a cyclic loop of poor tests. I like the idea of predictive validity to gain insight into future skill development, but why would I want to label my students’ future success based upon past exams? 

I believe in Winke (2011)’s consideration of teachers’ professional opinions as important in assessment validity. Standardized tests encourage teachers to focus on getting students to achieve high marks instead of focusing on student learning and assessing students fairly. As Slomp et al. (2014) suggested, I think that teachers are pressured to focus on testable skills, narrowing their assessment and limiting student choice in demonstrating their knowledge. Not every student has an equal opportunity to perform well on standardized tests, which can have negative psychological impacts on students. 

Valid assessment must include more collections of evidence than just examination statistics. I identify with Davies (2011) triagulation of observational evidence (such as watching the scientific method being applied during an experiment), conversational evidence (class discussions, student-teacher interviews, peer feedback, written conversations, and group work records), and collection of products (summative projects, exams, quizzes, and assignments, as well as formative assessment notebooks, journals, photos, student worksheets, graphs, and work-in-progress portfolios). To me, triangulation of evidence is the most valid form of assessing a student’s knowledge and understanding—something that standardized test results cannot fully convey.

Implications For My Assessment Practice
I plan on teaching high school physics, which means that I unfortunately cannot prevent standardized diploma exams from influencing my students’ grades. To work around this barrier, I intend to incorporate PAT-like questions into my regular assignments. While I will still require students to show all of their work, I can structure the practice problems in either multiple choice or numerical response formats. I hope that familiarity with the standardized format will alleviate some anxiety in this situation. I can validly assess student coursework through triangulation of evidence from multiple sources such as homework checks (for observed participation), experimental lab reports, assignments, projects, quizzes, unit tests, and presentations. 

If I am pressured for higher student test achievement, I could compact my year-plan timeline to include a full week of review and test preparation at the end of the semester. I want to avoid this because I do not wish to transfer excess pressure onto my students. As well, this time could be better used to delve deeper into the key learning objectives. I will try to avoid teaching for the test, and instead strive to focus my attention on the learning processes of my students. This approach could arguably help my students do better on exams anyway, since their learning experiences have been richer. I feel that validity must help educators improve their assessment instruments, and that frequent assessment and feedback will enable learning for higher achievement in my future students.


As always, you can find me at:

Twitter @agracemartin https://twitter.com/AGraceMartin
Facebook Author and Teacher https://www.facebook.com/agracemartin


References
Chapelle, C. A. (1999). Validity in language assessment. Annual Review of Applied Linguistics, 19, 254–272. doi: 10.1017/S0267190599190135. 

Davies, A. (2011). Making classroom assessment work. [Book]. Solution Tree. 555 North Morton Street, Bloomington, IN 47404.

Slomp, David H.; Corrigan, Julie A.; Sugimoto, Tamiko. (2014). A Framework for Using Consequential Validity Evidence in Evaluating Large-Scale Writing Assessments: A Canadian Study. Research in the Teaching of English, 48(3), 276-302.

Wiggins, G. (1993). Assessment: Authenticity, context, and validity. Phi Delta Kappan, 75(3), 200-08.

Winke, P. (2011). Evaluating the Validity of a High-Stakes ESL Test: Why Teachers' Perceptions Matter. Tesol Quarterly, 45(4), 628-660. 

Visiting an English Language Learner Program

The following is a reflection assignment from my Social Context of Education class in which I visited a middle school program for English Language Learners (ELL) also called students with English as a Second Language (ESL).

Visiting an English Language Learner Program


When I visited the middle school’s ELL program, the first thing that I noticed was the clearly amicable relationships that the teacher and educational assistant had with the students. The teacher knew what activities each student was interested in. I noticed that a student was using the Internet on the teacher’s computer, and the teacher asked, “When did you start playing that game?” To me, this demonstrated the teacher’s authentic awareness and personalized connection with the students.

This relationship seemed to make the students feel comfortable and at liberty to joke around. The teacher had a relaxed manner with an easy sense of humour. When students talked out of turn, they were not berated, punished, or even singled out. The teacher would simply say, “Hmm, I wonder why I’m hearing voices but no hands.” He did not once tell students before class to speak English only, allowing them to talk with their friends in any language they chose. In class, however, he said, “Please try using English—even if you are talking across the room when you shouldn’t be.” I liked this style of classroom management because it encouraged practice of both the English language and North American school norms. If I were to apply this strategy in a classroom without ELL students, I would adapt it to say something positive, like: "please use proper grammar" or "be polite even when you are talking out of turn."

While sitting at the back of the classroom, one of my fellow student teachers made a comment about a few of the girls being too young to have nose piercings. I sincerely hoped that none of the ELL students heard this comment, and I whispered to my colleague that it is culturally and religiously significant in South East Asian countries for girls to get piercings on the left-hand side of their nose. This further reminded me of the incredible importance of knowing your students’ backgrounds to avoid offensive or uncomfortable situations due to misunderstanding. 

The teacher started the school day with a daily writing page. He wrote a prompt on the board: “if you were principal of our school, what would you change? What would stay the same?” I noticed that the teacher first asked if anyone knew what a principal was. It would not have occurred to me to ask this question, but the teacher knew his ELL students might not recognize the word at first glance. One student did not know the meaning of “change,” so the E.A. quickly said, “to make different.” I thought that it must be an acquired skill to give a simple definition quickly, and that a teacher might need some practice to do so. I was struck by the fact that I take communication in the English language for granted, especially with what I would deem to be “simple” words. I am not accustomed to defining common words, which could be a challenge for me if I have an ELL student in my future class. I am more than willing to practice this however, because simply vocabulary can also benefit a wider range of students with differing reading levels.

I liked the daily jobs, in which one student had to read the calendar, and one student had to read the weather forecast for the day. This got two students a day speaking in front of the class in English. I thought that it was a great idea to have another board with common language on it that the teacher changed each day. He wrote, “What a horrendous morning! Today Is Moonday February 2rd, 2015. Tomorrow iS Tusday and. Yesterbay was Suhday. Different animals had tales…” This looked like an effective daily exercise to correct mistakes and spelling of very common words. 
As the class corrected the spelling together, one said that it was the wrong “tale.” Immediately the teacher said, “Oh wow! You know what that means! We need to get out our book for what? That’s right, homophones!” I liked this strategy because it is clearly a fun way to sprinkle in homophones into daily instruction. The students got excited about homophone time. They drew a monkey next to the word tail. Pictorially, such visuals seem to be the best way for ELL students to differentiate between homophones. After drawing pictures, the students came up with simple sentences to practice their writing.

For a while I wondered why the teacher asked students to color their pictures. It was a very time-consuming task. Surely the time could be better spent? Then I realized that it was the end of the period. Coloring provided a break without changing the class. Also, slower-working students were allowed to continue coloring while the class worked on their corrections on the board. I later asked the teacher and he commented that all of his ELL students are immigrants who have not acquired skills that we normally do in elementary school, like coloring and using scissors. He has to be very patient and offer his students a lot of time to complete their work.

Then the students were split into groups. The four groups were reading, playing apples to apples, playing boggle, and reading games on the iPods. Each group went to each station and rotated. It was beneficial for me to work with a small group because I was able to answer questions and get a good feeling for their level of comprehension.

Before teatime, the teacher held interviews (meaning that he sat at the front of the classroom to talk with students by asking them questions, then allowing the students to interview him). It was an approach to encourage student conversations in English and also gave the daily job students time to serve tea. I appreciated that the teacher took time to get to know his students and learn that they enjoy having tea when talking in a circle. This clearly made a strong, culturally significant connection with them. He told us that for him, the biggest part of his job is identifying his students’ needs and modifying his instruction to meet those needs

In my future science teaching, I can help ELL students by giving definitions in two ways: one as the science textbook definition, and one using simpler words to be more commonly understood. When I teach science courses, I can offer multiple ways to say the same thing, meaning that I can teach a concept using different words. On examinations, I can differentiate tests to have the same question worded in different ways and give ELL students more time to complete their work. This approach contributes to a universal design for learning (UDL) that can also benefit students with learning disabilities as well as ELL students.

In the future, I can give more time for ELL students to write notes down. I think that one way to approach this is through a flipped classroom. Students can watch a lecture video at home at their own pace, pausing and rewinding the video until they understand the content. In class, I will then have more time to answer questions that arose from the video and give students time to collaborate with their peers while working on practice problems or other projects.

I think that the most important implication for my future teaching is for me to know my students and do background research into their cultural values. Truly the first step in the teaching profession is to build relationships.

As always, you can find me at:
My website www.agracemartin.com
Twitter @agracemartin https://twitter.com/AGraceMartin
Facebook Author and Teacher https://www.facebook.com/agracemartin

Tumblr: agracemartin.tumblr.com