Differential Item Functioning in Change (DIF-C) with ordinal items

This ShinyItemAnalysis module provides interactive display of Differential Item Functioning in Change (DIF-C) analysis in ordinal items. We use Micro and Macro measurements of attitudes towards the expulsion of Sudeten Germans after WWII, as more closesly described in Kolek, Šisler, Martinková, and Brom (2021). DIF-C analysis was first described for binary items in Martinková, Hladká, and Potužníková (2020), demonstrating that this more detailed item-level analysis is able to detect between-group differences in pre-post gains even in case when no difference is observable in gains in total scores. DIF analysis is implemented with generalized logistic regression models in the difNLR package (Hladká & Martinková, 2020). The module is part of the ShinyItemAnalysis package (Martinková & Drabinová, 2018).

Summary of total scores

DIF analysis may come to a different conclusion than a test of group differences in total scores. Two groups may have the same distribution of total scores, yet, some items may function differently for the two groups. Also, one of the groups may have a significantly lower total score, yet, it may happen that there is no DIF item (Martinková et al., 2017). This section examines the between-group differences in total scores only. Further sections are devoted to DIF and DIF-C analysis.

The two groups on Pretest

We first examine the two groups (experimental and control) on Pretest. No between-group differences were expected on Pretest.

Group differences between testing sessions

We now examine the change in attitudes towards expulsion in the two groups, and the between-group differences in this change. We expected the experimental group being more affected by the game, both in short-term (Pretest - Posttest) and in the long term (Pretest - Delayed Posttest, while the change was expected to remain long-term (no difference was expected between Posttest - Delayed Posttes. Note that in their study, Kolek et al. (2021) complement the t tests displayed below also by more complex mixed-effect regression models taking into account respondent characteristics.

Difference in testing sessions for groups
Intergroup difference in time
Difference in testing sessions for groups
Intergroup difference in time
Difference in testing sessions for groups
Intergroup difference in time

Selected R code

# load libraries
library(ShinyItemAnalysis)
library(difNLR)
library(ggplot2)
library(moments)

# explore the variables of the dataset (from ShinyItemAnalysis)
names(AttitudesExpulsion)

# convert group variable to integer, assigning '1' to the experimental group
group <- as.numeric(AttitudesExpulsion[, "Group"] == "E")

# total score calculation with respect to group
score <- AttitudesExpulsion$PreMacro # or PreMicro, PostMacro/Micro, DelMacro/Micro
score0 <- score[group == 0] # control group
score1 <- score[group == 1] # experimental group

# summary of total score
tab <- rbind(
  c(
    length(score0), min(score0), max(score0), mean(score0), median(score0),
    sd(score0), skewness(score0), kurtosis(score0)
  ),
  c(
    length(score1), min(score1), max(score1), mean(score1), median(score1),
    sd(score1), skewness(score1), kurtosis(score1)
  )
)

colnames(tab) <- c("N", "Min", "Max", "Mean", "Median", "SD", "Skewness", "Kurtosis")
tab

# create a dataframe for plotting
df <- data.frame(score, group = as.factor(group))

# histogram of total scores with respect to group
ggplot(data = df, aes(x = score, fill = group, col = group)) +
  geom_histogram(binwidth = 1, position = "dodge2", alpha = 0.75) +
  xlab("Total score") +
  ylab("Number of respondents") +
  scale_fill_manual(
    values = c("dodgerblue2", "goldenrod2"), labels = c("Control", "Experimental")
  ) +
  scale_colour_manual(
    values = c("dodgerblue2", "goldenrod2"), labels = c("Control", "Experimental")
  ) +
  theme_app() +
  theme(legend.position = "left")

# t-test to compare total scores
t.test(score0, score1)

In Pretest, Kolek et al. (2021) assumed the items will function similarly for the experimental and the control group. As expected, no DIF was confirmed in Pretest.

Model specification

In their study, Kolek et al. (2021) used the group-specific cumulative logit model to detect DIF on Pretest, and DIF-C in Posttest and in Delayed posttest. They tested the hypthesis of any DIF/DIF-C against the alternative of any type of DIF/DIF-C (uniform or nonuniform). They used the Pretest total score as a matching criterion and the Benjamini-Hochberg correction for multiple comparisons. Item purification was not applied. Here we offer the DIF/DIC-C analysis with the same settings as in Kolek et al. (2021). You can also change the type of DIF to be tested, the matching criterion, and the parametrization - either the IRT or the classical intercept/slope. You can also select a correction method for a multiple comparison and/or item purification.

Equation

The probability that respondent \(p\) with the pretest score (matching criterion) \(X_p\) and the group membership variable \(G_p\) obtained at least \(k\) points in item \(i\) is given by the following equation:

The probability that respondent \(p\) with the pretest score (matching criterion) \(X_p\) and group membership \(G_p\) obtained exactly \(k\) points in item \(i\) is then given as the difference between the probabilities of obtaining at least \(k\) and \(k + 1\) points:

Results

Test statistic and model parameter estimates

This summary table contains information about \(\chi^2\)-statistics of the likelihood ratio test, corresponding \(p\)-values considering selected correction method, and significance codes. The table also provides estimated parameters for the best fitted model for each item.

Purification process

Plot with estimated DIF curves

Points represent a proportion of the obtained score with respect to the matching criterion. Their size is determined by the count of respondents who achieved a given level of the matching criterion and who selected given option with respect to the group membership.

Table of parameters

This table summarizes estimated item parameters together with the standard errors.

In Posttest, Kolek et al. (2021) assumed some but not necessarily all the items will function differentially for respondents in the experimental and the control group with the same pretest score. The DIF-C analysis revealed that Item 4 in Macro, and Item 10 in Micro measurement functioned differenially.

Model specification

In their study, Kolek et al. (2021) used the group-specific cumulative logit model to detect DIF on Pretest, and DIF-C in Posttest and in Delayed posttest. They tested the hypthesis of any DIF/DIF-C against the alternative of any type of DIF/DIF-C (uniform or nonuniform). They used the Pretest total score as a matching criterion and the Benjamini-Hochberg correction for multiple comparisons. Item purification was not applied. Here we offer the DIF/DIC-C analysis with the same settings as in Kolek et al. (2021). You can also change the type of DIF to be tested, the matching criterion, and the parametrization - either the IRT or the classical intercept/slope. You can also select a correction method for a multiple comparison and/or item purification.

Equation

The probability that respondent \(p\) with the pretest score (matching criterion) \(X_p\) and the group membership variable \(G_p\) obtained at least \(k\) points in item \(i\) is given by the following equation:

The probability that respondent \(p\) with the pretest score (matching criterion) \(X_p\) and group membership \(G_p\) obtained exactly \(k\) points in item \(i\) is then given as the difference between the probabilities of obtaining at least \(k\) and \(k + 1\) points:

Results

Test statistic and model parameter estimates

This summary table contains information about \(\chi^2\)-statistics of the likelihood ratio test, corresponding \(p\)-values considering selected correction method, and significance codes. The table also provides estimated parameters for the best fitted model for each item.

Purification process

Plot with estimated DIF curves

Points represent a proportion of the obtained score with respect to the matching criterion. Their size is determined by the count of respondents who achieved a given level of the matching criterion and who selected given option with respect to the group membership.

Table of parameters

This table summarizes estimated item parameters together with the standard errors.

Selected R code

# load libraries
library(ShinyItemAnalysis)
library(difNLR)
library(ggplot2)

# prepare data
data <- AttitudesExpulsion[, c(paste0("PostMacro_0", 1:7))]
group <- as.numeric(AttitudesExpulsion[, "Group"] == "E")
score <- AttitudesExpulsion$PreMacro # DIF matching score

# DIF-C with cumulative logit regression model
(fit <- difORD(
  Data = data, group = group, focal.name = 1, model = "cumulative",
  type = "both", match = score, p.adjust.method = "BH", purify = FALSE,
  parametrization = "classic"
))

# plot cumulative probabilities for item X2003
plot(fit, item = "PostMacro_04", plot.type = "cumulative")

# plot category probabilities for item X2003
plot(fit, item = "PostMacro_04", plot.type = "category")

# estimate coefficients for all items with SE
coef(fit, SE = TRUE)

In Delayed Posttest, Kolek et al. (2021) assumed some but not necessarily all the items will function differentially for respondents in the experimental and the control group with the same pretest score. The DIF-C analysis revealed that items 6 and 10 in Micro measurement functioned differenially.

Model specification

In their study, Kolek et al. (2021) used the group-specific cumulative logit model to detect DIF on Pretest, and DIF-C in Posttest and in Delayed posttest. They tested the hypthesis of any DIF/DIF-C against the alternative of any type of DIF/DIF-C (uniform or nonuniform). They used the Pretest total score as a matching criterion and the Benjamini-Hochberg correction for multiple comparisons. Item purification was not applied. Here we offer the DIF/DIC-C analysis with the same settings as in Kolek et al. (2021). You can also change the type of DIF to be tested, the matching criterion, and the parametrization - either the IRT or the classical intercept/slope. You can also select a correction method for a multiple comparison and/or item purification.

Equation

The probability that respondent \(p\) with the pretest score (matching criterion) \(X_p\) and the group membership variable \(G_p\) obtained at least \(k\) points in item \(i\) is given by the following equation:

The probability that respondent \(p\) with the pretest score (matching criterion) \(X_p\) and group membership \(G_p\) obtained exactly \(k\) points in item \(i\) is then given as the difference between the probabilities of obtaining at least \(k\) and \(k + 1\) points:

Results

Test statistic and model parameter estimates

This summary table contains information about \(\chi^2\)-statistics of the likelihood ratio test, corresponding \(p\)-values considering selected correction method, and significance codes. The table also provides estimated parameters for the best fitted model for each item.

Purification process

Plot with estimated DIF curves

Points represent a proportion of the obtained score with respect to the matching criterion. Their size is determined by the count of respondents who achieved a given level of the matching criterion and who selected given option with respect to the group membership.

Table of parameters

This table summarizes estimated item parameters together with the standard errors.

References

  • Kolek, L., Šisler, V., Martinková, P., & Brom, C. (2021). Can video games change attitudes towards history? Results from a laboratory experiment measuring short- and long-term effects. Journal of Computer Assisted Learning, 37(5), 1348-1369. doi:10.1111/jcal.12575
  • Martinková, P., Drabinová, A., & Potužníková, E. (2020). Is academic tracking related to gains in learning competence? Using propensity score matching and differential item change functioning analysis for better understanding of tracking implications. Learning and Instruction 66(April). doi:10.1016/j.learninstruc.2019.101286
  • Hladká, A., & Martinková, P. (2020). difNLR: Generalized logistic regression models for DIF and DDF detection. The R Journal, 12(1), 300-323. doi:10.32614/RJ-2020-014.
  • Martinková, P., & Drabinová, A. (2018). ShinyItemAnalysis for teaching psychometrics and to enforce routine analysis of educational tests. The R Journal, 10(2), 503-515. doi:10.32614/RJ-2018-074
  • Martinková, P., Drabinová, A., Liaw, Y. L., Sanders, E. A., McFarland, J. L., & Price, R. M. (2017). Checking equity: Why differential item functioning analysis should be a routine part of developing conceptual Assessments. CBE-Life Sciences Education, 16(2), rm2. doi:10.1187/cbe.16-10-0307

Acknowledgements

ShinyItemAnalysis Modules are developed by the Computational Psychometrics Group supported by the Czech Science Foundation under Grant Number 21-03658S.