Item and test analysis by language groups for an Eighth grade biology test in Etidopia: a comparison of irt and ctt models

No Thumbnail Available

Date

2011-05

Journal Title

Journal ISSN

Volume Title

Publisher

Addis ababa univeresty

Abstract

This study carried out an analysis of item- and test-level data from the Grade 8 Biology Test of the Ethiopian Third National Learning Assessment (ETNLA). A total of 10,795 students sat for the biology test ill 2007, of these 9,552 were used for the study. The test was originally prepared in English and was then translated into three language versions (Afan Oromo, Somali and Tigrigna). The main purpose was to see how the items worked across language groups. A two Parameter Logistic Model (2PLM) based on Item Response Theory was used to investigate latent traits and the main statistics generated were IRT ability scores and IRT parameter estimates (difficulty level and discrimination index). Item Characteristic Curves (ICC) and Item Person Dual Plots were generated for all 40 items by language groups. Based on the IRT ability scores, language groups were compared using one-way anova and recursive partitioning analysis. Item and test statistics were also computed following Classical Test Theory (CTT) model and results were compared with that of IRT The Item Characteristic Curves (ICC) differed from the expected ogive shape and varied across language groups. The Test Information Function (TIF) also varied across language groups indicating the test as a whole and items in particular did not work the same way for the subgroups. A recursive partitioning analysis result based on IRT ability scores showed 20% (R 2 =0.20, F(3. 9518), P < .001) of the variations in achievement score was accounted by differences in language of instruction. The variance explained using CTT procedure was 13.4% (R 2 =0.134, F(3, 9548). P < .001). The number of problem items (items which were too difficult and or with very low discrimination power) by language group hased on CTT were: Somali (19), Afan Oromo, (J 2), English (10) and Tigrigna (8). The highest test score (20) was for Tigrigna, followed by Afan Oromo (18). The English language group students scored the least (15). The performance of Somali language group students were about equal to that of English group ones. The finding show that there were a number oj items which did not work the same way across the Jour language groups which make them as language Differential Item Functioning (DIF) suspects. Based on the findings it is recommended that in the future detailed item and test analysis following the IRT model shouLd be employed across subgroups 011 the pilot as well as on the operational tests. This will help to Jurther explore DIF ill future administrations oj the test in order to determine whether these patterns represent real differences in achievement levels or a systematic bias that is inappropriately impacting on the scores of particular student groups.

Description

Keywords

Biology, Item Analysis, IRT, CTT, Language DIF

Citation