Title: | Multilevel Mantel-Haenszel Statistics for Differential Item Functioning Detection |
---|---|
Description: | Clustered or multilevel data structures are common in the assessment of differential item functioning (DIF), particularly in the context of large-scale assessment programs. This package allows users to implement extensions of the Mantel-Haenszel DIF detection procedures in the presence of multilevel data based on the work of Begg (1999) <doi:10.1111/j.0006-341X.1999.00302.x>, Begg & Paykin (2001) <doi:10.1080/00949650108812115>, and French & Finch (2013) <doi:10.1177/0013164412472341>. |
Authors: | Shenghai Dai [aut, cre], Brian F. French [aut], W. Holmes Finch [aut], Andrew Iverson [aut] |
Maintainer: | Shenghai Dai <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1 |
Built: | 2025-01-21 03:40:12 UTC |
Source: | https://github.com/cran/DIFplus |
This function creates contigency tables by strata for each item. Both dichotomous and polytomous item responses are allowed. It also handles missing responses and returns a cleaned data set with no missing data.
ContigencyTables (Response.data, Response.code=c(0,1), Group, group.names=NULL, Stratum=NULL, Cluster=NULL, missing.code="NA", missing.impute="LW", print.information=TRUE)
ContigencyTables (Response.data, Response.code=c(0,1), Group, group.names=NULL, Stratum=NULL, Cluster=NULL, missing.code="NA", missing.impute="LW", print.information=TRUE)
Response.data |
A scored item responses matrix in the form of matrix or data frame. This matrix should not include any other variables (group, stratum, cluser, etc.). |
Response.code |
A numerical vector of all possible item responses. By default, Response.code=c(0,1). |
Group |
The variable of group membership (e.g., gender). Its length should be equal to the sample size of the item response matrix. |
group.names |
Names for each defined group (e.g., c('Male','Female')). This argument is optional. By default, group.names=NULL. If not provided, group names of "Group.1, Group.2, etc." will be automatically generated. |
Stratum |
The matching variable. By default, Stratum=NULL. If not provided, the observed total score will be used. |
Cluster |
The cluster variable. Its length should be equal to the sample size of the item response matrix. By default, Cluster=NULL. This variable will not be used to generate contigency tables. It will be included in the returned data set for DIF analysis. |
missing.code |
Indication of how missing values were defined in the data. By default, missing.code="NA". |
missing.impute |
The approach selected to handle missing item responses. By default, missing.impute="LW", indicating the list-wise
deletion will be used. Other options include: "PM" (person mean or row mean imputation),"IM" (item mean or column mean imputation),
"TW" (two-way imputation), "LR" (logistic regression imputation), and EM (EM imputation). Check the package "TestDataImputation"
(https://cran.r-project.org/package=TestDataImputation) for more details. |
print.information |
Indicator of whether function running information is printed on screen. By default, print.information=TRUE. |
This function creats contigency tables.
A list of strata statistcs, contigency tables, etc.
Strata.stats |
Summary statistics for each item: n.valid.strata, n.valid.category, and also sample sizes for each stratum across items. |
c.table.list.all |
A list that contains all contigency tables across items and strata. |
c.table.list.valid |
A list that contains only valid contigency tables across items and strata. Strata that have missing item response categories or zero marginal means are removed. |
data.out |
A cleaned data set with variables "Group", "Group.factor","Cluster", "Stratum", and all item responses (with missing data handled). |
#Specify the item responses matrix data(data.adult) Response.data<-data.adult[,2:13] #Run the function with specifications c.table.out<-ContigencyTables(Response.data, Response.code=c(0,1), Group=data.adult$Group, group.names=NULL, Stratum=NULL, Cluster=NULL, missing.code="NA", missing.impute= "LW",print.information = TRUE) #Obtain results c.tables.all<-c.table.out$c.table.list.all c.tables.valid<-c.table.out$c.table.list.valid c.table.out$Strata.stats data.use<-c.table.out$data.out
#Specify the item responses matrix data(data.adult) Response.data<-data.adult[,2:13] #Run the function with specifications c.table.out<-ContigencyTables(Response.data, Response.code=c(0,1), Group=data.adult$Group, group.names=NULL, Stratum=NULL, Cluster=NULL, missing.code="NA", missing.impute= "LW",print.information = TRUE) #Obtain results c.tables.all<-c.table.out$c.table.list.all c.tables.valid<-c.table.out$c.table.list.valid c.table.out$Strata.stats data.use<-c.table.out$data.out
This data example contains binary (0/1) responses of 684 participants to 12 items. Particpants were classified into 34 clusters and 2 groups.
data("data.adult")
data("data.adult")
A data frame with 684 observations on the following 14 variables.
Cluster
The cluster variable
I1
Item 1
I2
Item 2
I3
Item 3
I4
Item 4
I5
Item 5
I6
Item 6
I7
Item 7
I8
Item 8
I9
Item 9
I10
Item 10
I11
Item 11
I12
Item 12
Group
Binary group membership variable
A data set with 14 variables: (1) binary (0/1) responses of 684 participants to 12 items; (2) a cluster indicator variable; and (3) a group indicator variable.
data(data.adult) ## maybe str(data.adult) ; plot(data.adult) ...
data(data.adult) ## maybe str(data.adult) ; plot(data.adult) ...
This data example contains binary (0/1) responses of 684 participants to 12 items. Particpants were classified into 10 clusters, 2 groups, and 3 strata.
data("data.adult.revised")
data("data.adult.revised")
A data frame with 684 observations on the following 15 variables.
Cluster
The cluster variable
I1
Item 1
I2
Item 2
I3
Item 3
I4
Item 4
I5
Item 5
I6
Item 6
I7
Item 7
I8
Item 8
I9
Item 9
I10
Item 10
I11
Item 11
I12
Item 12
Group
Binary group membership variable
Stratum
A prespecified matching variable with three levels
A data set with 15 variables: (1) binary (0/1) responses of 684 participants to 12 items; (2) a cluster indicator variable with ten levels; (3) a group indicator variable with two levels; and (4) a stratum variable with three levels.
data(data.adult.revised) ## maybe str(data.adult.revised) ; plot(data.adult.revised) ...
data(data.adult.revised) ## maybe str(data.adult.revised) ; plot(data.adult.revised) ...
This data example contains ordinal (1/2/3/4) responses of 300 participants to 5 items. Participants were classified into 6 clusters and 2 groups.
data("data.ordinal")
data("data.ordinal")
A data frame with 300 observations on the following 7 variables.
Group
Group membership
Cluster
Cluster membership
I1
Item 1
I2
Item 2
I3
Item 3
I4
Item 4
I5
Item 5
A data set with 7 variables: (1) ordinal (1/2/3/4) responses of 300 participants to 5 items; (2) a cluster indicator variable with six levels; and (3) a group indicator variable with two levels.
data(data.ordinal) ## maybe str(data.ordinal) ; plot(data.ordinal) ...
data(data.ordinal) ## maybe str(data.ordinal) ; plot(data.ordinal) ...
This main function computes both unadjusted and adjusted MH statistics in the presence of clustered data based on Begg (1999) <doi:10.1111/j.0006-341X.1999.00302.x>, Begg & Paykin (2001) <doi:10.1080/00949650108812115>, and French & Finch (2013) <doi: 10.1177/0013164412472341>.
ML.DIF (Response.data, Response.code=c(0,1),Cluster, Group, group.names=NULL, Stratum=NULL, correct.factor=0.85, missing.code="NA", missing.impute="LW", anchor.items=NULL, purification=FALSE, max.iter=10, alpha = .05)
ML.DIF (Response.data, Response.code=c(0,1),Cluster, Group, group.names=NULL, Stratum=NULL, correct.factor=0.85, missing.code="NA", missing.impute="LW", anchor.items=NULL, purification=FALSE, max.iter=10, alpha = .05)
Response.data |
A scored item responses matrix in the form of matrix or data frame. This matrix should not include any other variables (group, stratum, cluser, etc.). |
Response.code |
A numerical vector of all possible item responses. By default, Response.code=c(0,1). |
Cluster |
The cluster variable. Its length should be equal to the sample size of the item response matrix. |
Group |
The variable of group membership (e.g., gender). Its length should be equal to the sample size of the item response matrix. |
group.names |
Names for each defined group (e.g., c('Male','Female')). This argument is optional. By default, group.names=NULL. If not provided, group names of "Group.1, Group.2, etc." will be automatically generated. |
Stratum |
The matching variable. By default, Stratum=NULL. If not provided, the observed total score will be used. |
correct.factor |
The value of adjustment applied to the adjusted MH statistic (i.e., f). The default value used here is .85. The adjusted MH statistic was found to exhibit low statistical power for DIF detection in some conditions. One solution to this is to reduce the magnitude of f through multiplying it by the correct factor (e.g., .85, .90, .95). The value of .85 is suggested by French & Finch (2013) <doi: 10.1177/0013164412472341>. |
missing.code |
Indication of how missing values were defined in the data. By default, missing.code="NA". |
missing.impute |
The approach selected to handle missing item responses. By default, missing.impute="LW", indicating the list-wise
deletion will be used. Other options include: "PM" (person mean or row mean imputation),"IM" (item mean or column mean imputation),
"TW" (two-way imputation), "LR" (logistic regression imputation), and EM (EM imputation). Check the package "TestDataImputation"
(https://cran.r-project.org/package=TestDataImputation) for more details. |
anchor.items |
A scored item responses matrix of selected anchor items. This matrix should be a subset of the response data matrix specified above. By default, anchor.items=NULL. |
purification |
True of false argument, indicating whether purification will be used. By default, purification=FALSE. |
max.iter |
The maximum number of iterations for purification. The default value is 10. |
alpha |
The alpha value used to decide on the DIF items. The default value is .05. |
This main function computes both unadjusted and adjusted Mantel-Haenszel statistics in the presence of multilevel data.
A list of MH statistcs, contigency tables, etc.
MH.values |
Summary of estimated MH statistics and corresponding p-values. Specifically, |
Stratum.statistics |
summary statistics for each item: n.valid.strata, n.valid.category, and also sample sizes for each stratum across items. |
c.table.list.all |
A list that contains all contigency tables across items and strata. |
c.table.list.valid |
A list that contains only valid contigency tables across items and strata. Strata that have missign item response categories or zero marginal means are removed. |
data.out |
A cleaned data set with variables "Group", "Group.factor","Cluster", "Stratum", and all item responses (with missing data handled). |
Begg, M. D. (1999). "Analyzing k (2 × 2) Tables Under Cluster Sampling." Biometrics, 55(1), 302-307. doi:10.1111/j.0006-341X.1999.00302.x.
Begg, M. D. & Paykin, A. B. (2001). "Performance of and software for a modified mantel-haenszel statistic for correlated data." Journal of Statistical Computation and Simulation, 70(2), 175-195. doi:10.1080/00949650108812115.
French, B. F. & Finch, W. H. (2013). "Extensions of Mantel-Haenszel for Multilevel DIF Detection." Educational and Psychological Measurement, 73(4), 648-671. doi:10.1177/0013164412472341.
Holland, P. W. & Thayer, D. T. (1988). "Differential item performance and the Mantel-Haenszel procedure." In H. Wainer & H. I. Braun (Eds.), Test validity (pp.129-145). Lawrence Erlbaum Associates, Inc.
#Specify the item responses matrix data(data.adult) Response.data<-data.adult[,2:13] #Run the function with specifications ML.DIF.out<-ML.DIF (Response.data, Response.code=c(0,1),Cluster=data.adult$Cluster, Group=data.adult$Group, group.names=c('Reference','Focal'), Stratum=NULL, correct.factor=0.85, missing.code="NA", missing.impute="LW", anchor.items=NULL, purification=FALSE, max.iter=10, alpha = .05) #Obtain results ML.DIF.out$MH.values ML.DIF.out$Stratum.statistics
#Specify the item responses matrix data(data.adult) Response.data<-data.adult[,2:13] #Run the function with specifications ML.DIF.out<-ML.DIF (Response.data, Response.code=c(0,1),Cluster=data.adult$Cluster, Group=data.adult$Group, group.names=c('Reference','Focal'), Stratum=NULL, correct.factor=0.85, missing.code="NA", missing.impute="LW", anchor.items=NULL, purification=FALSE, max.iter=10, alpha = .05) #Obtain results ML.DIF.out$MH.values ML.DIF.out$Stratum.statistics