Item and test analysis by language groups for an Eighth grade biology test in Etidopia: a comparison of irt and ctt models
No Thumbnail Available
Date
2011-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis ababa univeresty
Abstract
This study carried out an analysis of item- and test-level data from the Grade 8
Biology Test of the Ethiopian Third National Learning Assessment (ETNLA). A total
of 10,795 students sat for the biology test ill 2007, of these 9,552 were used for the
study. The test was originally prepared in English and was then translated into three
language versions (Afan Oromo, Somali and Tigrigna). The main purpose was to see
how the items worked across language groups. A two Parameter Logistic Model
(2PLM) based on Item Response Theory was used to investigate latent traits and the
main statistics generated were IRT ability scores and IRT parameter estimates
(difficulty level and discrimination index). Item Characteristic Curves (ICC) and Item
Person Dual Plots were generated for all 40 items by language groups. Based on the
IRT ability scores, language groups were compared using one-way anova and
recursive partitioning analysis. Item and test statistics were also computed following
Classical Test Theory (CTT) model and results were compared with that of IRT The
Item Characteristic Curves (ICC) differed from the expected ogive shape and varied
across language groups. The Test Information Function (TIF) also varied across
language groups indicating the test as a whole and items in particular did not work
the same way for the subgroups. A recursive partitioning analysis result based on IRT
ability scores showed 20% (R 2 =0.20, F(3. 9518), P < .001) of the variations in
achievement score was accounted by differences in language of instruction. The
variance explained using CTT procedure was 13.4% (R 2 =0.134, F(3, 9548). P < .001).
The number of problem items (items which were too difficult and or with very low
discrimination power) by language group hased on CTT were: Somali (19), Afan
Oromo, (J 2), English (10) and Tigrigna (8). The highest test score (20) was for
Tigrigna, followed by Afan Oromo (18). The English language group students scored
the least (15). The performance of Somali language group students were about equal
to that of English group ones. The finding show that there were a number oj items
which did not work the same way across the Jour language groups which make them
as language Differential Item Functioning (DIF) suspects. Based on the findings it is
recommended that in the future detailed item and test analysis following the IRT
model shouLd be employed across subgroups 011 the pilot as well as on the operational
tests. This will help to Jurther explore DIF ill future administrations oj the test in
order to determine whether these patterns represent real differences in achievement
levels or a systematic bias that is inappropriately impacting on the scores of
particular student groups.
Description
Keywords
Biology, Item Analysis, IRT, CTT, Language DIF