Rater Errors in Rating Processandthe Needsto Be Identified Among Student Raters

Nurul Fariena  Asli; Mohd Effendi Ewan  MohdMatore; Melor Md  Yunos

Authors

Nurul Fariena Asli
Mohd Effendi Ewan MohdMatore
Melor Md Yunos

Keywords:

rater errors, performance-based assessment, MFRM

Abstract

In assessing performance-based language assessment, rater behaviour is one of the contributing factors in measurement error which is derived from rater error that become a threat in a rating process. The emphasis on students’ self-directed learning after the implementation of CEFR in the Malaysian English language syllabus has required students to be able to assess their progress where the validity and reliability of the scores should be unquestionable. However, since it is a new practice, there is a lack of awareness of the need to identify rater behaviour among students. Therefore, this paper aims to discuss the different types of rater errors that occur in a rating process which will highlight the importance of the errors to be identified among students in secondary schools in Malaysia. Other than that, it is also aimed to propose a conceptual framework of rater-mediated assessment using the Many Facet Rasch Model (MFRM) that can be used in understanding the rater errors. The implication of this conceptual paper is teachers will gain insights into the factors that will become a threat for students to be good rater. Apart from that, the conceptual framework of rater mediated-assessment using MFRM will assist teachers to understand the relationship between three facets which are task, examinee, and rater with the outputs produced by MFRM. Future research should delve into factors that contribute to student’s rater errors which undoubtedly affecting their judging in a rating process based on the conceptual framework of rater mediated assessment using MFRM.

Downloads

Download data is not yet available.

References

Abu Kassim, N.L. (2011). Judging behaviour and rater errors: An application of the many-facet rasch model. GEMA Online Journal of Language Studies,11(3): 179–197.

Ahmadi Shirazi, M. (2019). For a greater good: bias analysis in writing assessment. SAGE,9(1): 1–14. http://journals.sagepub.com/doi/10.1177/2158244018822377.

Alla Baksh, M.A., Mohd Sallehhudin, A.A., Tayeb, Y.A. & Norhaslinda, H. (2016). Washback effect of school-based english language assessment: A case-study on students’ perceptions. Pertanika Journal of Social Sciences and Humanities,24(3): 1087–1104.

Azman, H. (2016). Implementation and challenges of English language education reform in Malaysian primary schools. 3L: Language, Linguistics, Literature,22(3): 65–78.

Brennan, R. L. (2011). Generalizability theory and classical test theory. Applied Measurement in Education, 24(1), 1–21. https://doi.org/10.1080/08957347.2011.532417

Council of Europe. (2001). Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge University Press, 1–264. https://rm.coe.int/1680459f97.

Davis, L. (2018). Analytic, holistic, and primary trait marking scales. The TESOL Encyclopedia of English Language Teaching.1–6. John Wiley & Sons, Inc.

Dickinson, P. & Adams, J. (2017). Values in evaluation – The use of rubrics. Evaluation and Program Planning,65: 113–116. https://dx.doi.org/10.1016/j.evalprogplan.2017.07.005.

Díez-Bedmar, M.B. & Byram, M. (2019). The current influence of the CEFR in secondary education: teachers’ perceptions. Language, Culture and Curriculum,32(1): 1–15. https://doi.org/10.1080/07908318.2018.1493492.

Eckes, T. (2009). Many-facet Rasch measurement. In S. Takala (Ed.), Reference supplement to the manual for relating language examinations to the Common European Framework of Reference for Languages: Learning, teaching, assessment (Section H). Strasbourg, France: Council of Europe/Language Policy Division.

Eckes, T. (2012). Operational rater types in writing assessment: Linking rater cognition to rater behavior. Language Assessment Quarterly,9(3): 270–292.

Engelhard, G. (1994). Examining rater errors in the assessment of written composition with a

Many-Faceted Rasch Model. Journal of Educational Measurement, 31(2), 93–112. https:

//doi:10.1111/j.1745-3984.1994.tb00436.x

Engelhard, G. & Wind, S.A. (2018). Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Routledge.

Figueras, N., Kaftandjieva, F. & Takala, S. (2013). Relating a reading comprehension test to the CEFR levels: A case of standard setting in practice with focus on judges and items. Canadian Modern Language Review,69(4): 359–385. https://doi/10.3138/cmlr.1723.359.

Foley, J. (2019). Issues on the initial impact of CEFR in Thailand and the region. Indonesian Journal of Applied Linguistics,9(2): 359–370.

Goodwin, S. (2016). A Many-Facet Rasch analysis comparing essay rater behavior on an academic English reading/writing test used for two purposes. Assessing Writing,30: 21–31. http://dx.doi.org/10.1016/j.asw.2016.07.004.

Green, A. (2018). Linking Tests of English for Academic Purposes to the CEFR: The Score User’s Perspective. Language Assessment Quarterly,15(1): 59–74. https://doi.org/10.1080/15434303.2017.1350685.

Holzknecht, F., Huhta, A. & Lamprianou, I. (2018). Comparing the outcomes of two different approaches to CEFR-based rating of students’ writing performances across two European countries. Assessing Writing,37(April 2017): 57–67. https://doi.org/10.1016/j.asw.2018.03.009.

Humphry, S. & Heldsinger, S. (2019). Raters ’ perceptions of assessment criteria relevance.

Assessing Writing,41: 1–13. https://doi.org/10.1016/j.asw.2019.04.002.

Idris, M. & Abdul Raof, A.H. (2016, November 25-27). Modest ESL learners rating behavior during self and peer assessment practice. Language Testing Forum, Reading, United Kingdom.

Isbell, D.R. (2017). Assessing C2 writing ability on the certificate of english language proficiency: Rater and examinee age effects. Assessing Writing,34(April): 37–49.

Ishak, W. I. W., & Mohamad, M. (2018). The implementation of Common European Framework of References (CEFR): What are the effects towards LINUS students’ achievements? Creative Education, 9, 2714-2731. https://doi.org/10.4236/ce.2018.916205

Kayapinar, U. (2014). Measuring Essay Assessment: Intra-Rater and Inter-Rater Reliability.

Eurasian Journal of Educational Research,(57): 113–135.

Kondo-Brown, K. (2002). A FACETS analysis of rater bias in measuring Japanese second language writing performance. Language Testing, 19(1), 3-31.

Le, H.T.T. (2018). Impacts of the Cefr-Aligned Learning Outcomes Implementation on Assessment Practice. Hue University Journal of Science: Social Sciences and

Humanities,127(6B): 87.

Linacre, J. M. (1989). Many-facet Rasch measurement. MESA Press.

Linacre, J. M. (2014). Facets computer program for many-facet Rasch measurement, version

71.4. Beaverton, Oregon, Winsteps.com

Lumley, T. & Mcnamara, T.F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing,12(1): 54–71.

McNamara, T.F. (1996). Measuring Second Language Performance.Pearson Education Limited.

McNamara, T.F.(2014). 30 Years on — Evolution or Revolution ? Language Assessment Quarterly,11(2): 226–232.

Mohd Noh, M.F. & Mohd Matore, M.E.E. (2020). Rating Performance among Raters of Different Experience Through Multi-Facet Rasch Measurement (MFRM) Model. Journal of Measurement and Evaluation in Education and Psychology,11(2): 1–16.

Myford, C.M. & Wolfe, E.W. (2003). Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part I. Journal of Applied Measurement,4(4): 368–422.

Myford, C.M. & Wolfe, E.W. (2004). Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part II. Journal of Applied Measurement,5(2): 189–227.

Ohta, R., Plakans, L.M. & Gebril, A. (2018). Integrated writing scores based on holistic and multi-trait scales : A generalizability analysis. Assessing Writing,38(August): 21–36. https://doi.org/10.1016/j.asw.2018.08.001.

Özdemir-Yllmazer, M. & Özkan, Y. (2017). Speaking assessment perceptions and practices of English teachers at tertiary level in the Turkish context. Language Learning in Higher Education,7(2): 371–391.

Panadero, E., Broadbent, J., Boud, D. & Lodge, J.M. (2018). Using formative assessment to influence self- and co-regulated learning : the role of evaluative judgement. European Journal of Psychology of Education 1–43.

Polat, M. (2018). Defining sevre graders through Many Faceted Rasch Measurement. European Journal of Foreign Language Teaching,3(4): 186–198.

Şahan, Ö. &Razı, S. (2020). Do experience and text quality matter for raters’ decision-making behaviors? Language Testing, 1–22.

Schaefer, E. (2008). Rater bias patterns in an EFL writing assessment. Language Testing

Van Huy Nguyen & M. Obaidul Hamid (2020): The CEFR as a national language policy in Vietnam: insights from a sociogenetic analysis.Journal of Multilingual and Multicultural Development. https://doi.org/10.1080/01434632.2020.1715416.

Wang, J., Engelhard, G. & Wolfe, E.W. (2016). Evaluating Rater Accuracy in Rater-Mediated

Assessments Using an Unfolding Model. Educational and Psychological Measurement,76(6): 1005–1025.

Weiqiang Wang (2016): Using rubrics in student self-assessment: student perceptions in the English as a foreign language writing context.Assessment & Evaluation in Higher Education.https://doi.org/10.1080/02602938.2016.1261993.

Weigle, S.C. (2010). Scoring procedures for writing assessment. (J. Charles Alderson & L. F. Bachman, Eds.)Assessing Writing. Cambridge: Cambridge University Press.(original work published 2002).

Wigglesworth, G. (1993). Exploring bias analysis as a tool for improving rater consistency in assessing oral interation. Language Testing,10(3): 305–319.

Wind, S.A. & Engelhard, G. (2012). Examining rating quality in Writing assessment: Rater agreement, error, and accuracy. Journal of Applied Measurement,13(4): 321–335.

Wu, S.M. & Tan, S. (2016). Managing rater effects through the use of FACETS analysis: the case of a university placement test. Higher Education Research and Development35(2): 380–394.

Zeng, Y. & Fan, T. (2017). Developing reading proficiency scales for EFL learners in China.

Language Testing in Asia,7(8): 1–21. https://doi.org/10.1186/s40468-017-0039-y.

Zheng, Y., Zhang, Y. & Yan, Y. (2016). Investigating the practice of The Common European Framework of Reference for Languages (CEFR) outside Europe: a case study on the assessment of writing in English in China. British Council. University of Southampton.

Zuraidah, M.D. (2015). English Language Education Reform in Malaysia: The Roadmap 2015- 2025. Ministry of Education Malaysia.

Rater Errors in Rating Processandthe Needsto Be Identified Among Student Raters

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

You are free to:

Under the following terms:

Notices:

Indexing Databases