Loading
For demonstration purposes, 20-item dataset
GMAT
from
difNLR
R package is used. On this page, you may select one of five datasets offered by
difNLR
and
ShinyItemAnalysis
packages or you may upload your own
dataset (see below). To return to demonstration dataset, refresh this page in your
browser
(F5)
.
Here you can upload your own dataset. Select all necessary files and use Upload data button on bottom of this page.
Main data file should contain responses of individual respondents (rows) to given items (columns). Data need to be either binary, nominal (e.g. in ABCD format), or ordinal (e.g. in Likert scale). Header may contain item names, no row names should be included. In all data sets header should be either included or excluded. Columns of dataset are by default renamed to Item and number of particular column. If you want to keep your own names, check box Keep item names below. Missing values in scored dataset are by default evaluated as 0. If you want to keep them as missing, check box Keep missing values below.
For nominal data, it is necessary to upload key of correct answers.
For ordinal data, you are advised to include vector containing cut-score which is used for binarization of uploaded data, i.e., values greater or equal to provided cut-score are set to 1, otherwise to 0. You can either upload dataset of item-specific values, or you can provide one value for whole dataset.
Note: In case that cut-score is not provided, vector of maximal values is used.
For ordinal data, it is optional to upload minimal and maximal values of answers. You can either upload datasets of item-specific values, or you can provide one value for whole dataset.
Note: If no minimal or maximal values are provided, these values are set automatically based on observed values.
Group is binary vector, where 0 represents reference group and 1 represents focal group. Its length needs to be the same as number of individual respondents in the main dataset. If the group is not provided then it won't be possible to run DIF and DDF detection procedures in DIF/Fairness section. Missing values are not supported for group membership vector and such cases/rows of the data should be removed.
Criterion variable is either discrete or continuous vector (e.g. future study success or future GPA in case of admission tests) which should be predicted by the measurement. Its length needs to be the same as number of individual respondents in the main dataset. If the criterion variable is not provided then it wont be possible to run validity analysis in Predictive validity section on Validity page.
Here you can explore uploaded dataset. Rendering of tables can take some time.
Total score, also known as raw score or sum score, is a total number of correct answers.
Table below summarizes basic characteristics of total scores including minimum and maximum, mean, median, standard deviation, skewness and kurtosis. The kurtosis here is estimated by sample kurtosis \(\frac{m_4}{s_4}\), where \(m_4\) is the fourth central moment and \(s^2\) is sample variance. The skewness is estimated by sample skewness \(\frac{m_3}{s^3}\), where \(m_3\) is the third central moment. The kurtosis for normally distributed scores is near the value of 3 and the skewness is near the value of 0.
For selected cut-score, blue part of histogram shows respondents with total score above the cut-score, grey column shows respondents with total score equal to the cut-score and red part of histogram shows respondents below the cut-score.
Download figurelibrary(difNLR)
library(ggplot2)
library(moments)
# loading data
data(GMAT)
data <- GMAT[, 1:20]
# total score calculation
score <- apply(data, 1, sum)
# summary of total score
c(min(score), max(score), mean(score), median(score), sd(score), skewness(score), kurtosis(score))
# colors by cut-score
cut <- median(score) # cut-score
color <- c(rep("red", cut - min(score)), "gray", rep("blue", max(score) - cut))
df <- data.frame(score)
# histogram
ggplot(df, aes(score)) +
geom_histogram(binwidth = 1, fill = color, col = "black") +
xlab("Total score") +
ylab("Number of respondents") +
theme_app()
Total score
also known as raw score is a total number of correct
answers. It can be used to compare individual score to a norm group, e.g. if the mean
is 12, then individual score can be compared to see if it is below or above this average.
Percentile
indicates the value below which a percentage of observations
falls, e.g. a individual score at the 80th percentile means that the individual score
is the same or higher than the scores of 80% of all respondents.
Success rate
is the percentage of success, e.g. if the maximum points of test
is equal to 20 and individual score is 12 then success rate is 12/20 = 0.6, i.e. 60%.
Z-score
or also standardized score is a linear transformation of total
score with a mean of 0 and with variance of 1. If X is total score, M its mean and SD its
standard deviation then Z-score = (X - M) / SD.
T-score
is transformed Z-score with a mean of 50 and standard deviation
of 10. If Z is Z-score then T-score = (Z * 10) + 50.
library(difNLR)
# loading data
data(GMAT)
data <- GMAT[, 1:20]
# scores calculations
score <- apply(data, 1, sum) # Total score
tosc <- sort(unique(score)) # Levels of total score
perc <- cumsum(prop.table(table(score))) # Percentiles
sura <- 100 * (tosc / max(score)) # Success rate
zsco <- sort(unique(scale(score))) # Z-score
tsco <- 50 + 10 * zsco # T-score
We are typically interested in unobserved true score \(T\), but have available only the observed score \(X\) which is contaminated by some measurement error \(e\), such that \(X = T + e\) and error term is uncorrelated with the true score.
Reliability is defined as squared correlation of the true and observed score
$$\text{rel}(X) = \text{cor}(T, X)^2$$Equivalently, reliability can be re-expressed as the ratio of the true score variance to total observed variance
$$\text{rel}(X) = \frac{\sigma^2_T}{\sigma^2_X}$$For test with \(I\) items total score is calculated as \(X = X_1 + ... + X_I\). Let \(\text{rel}(X)\) be the reliability of the test. For a test consisting of \(I^*\) items (equally precise, measuring the same construct), that is for test which is \(m = \frac{I^*}{I}\) times longer/shorter, the reliability would be
$$\text{rel}(X^*) = \frac{m\cdot \text{rel}(X)}{1 + (m - 1)\cdot\text{rel}(X)}.$$Spearman-Brown formula can be used to determine reliability of a test with similar items but of different number of items. It can also be used to determine necessary number of items to achieve desired reliability.
In calculations below reliability of original data is by default set to value of Cronbach's \(\alpha\) of the dataset currentli in use. Number of items in original data is by default set to number of items of dataset currently in use.
Here you can calculate estimate of reliability of a test consisting of different number of items (equally precise, measuring the same construct).
Here you can calculate necessary number of items (equally precise, measuring the same construct) to gain required level of reliability.
library(psychometrics)
library(ShinyItemAnalysis)
# loading data
data(HCI)
data <- HCI[, 1:20]
# reliability of original data
rel.original <- psychometric::alpha(data)
# number of items in original data
items.original <- ncol(data)
# number of items in new data
items.new <- 30
# ratio of tests lengths
m <- items.new/items.original
# determining reliability
psychometric::SBrel(Nlength = m, rxx = rel.original)
# desired reliability
rel.new <- 0.8
# determining test length
(m.new <- psychometric::SBlength(rxxp = rel.new, rxx = rel.original))
# number of required items
m.new*items.original
Split-half method uses correlation between two subscores for estimation of reliability. The underlying assumption is that the two halves of the test (or even all items on the test) are equally precise and measure the same underlying construct. Spearman-Brown formula is then used to correct the estimate for the number of items.
For test with \(I\) items total score is calculated as \(X = X_1 + ... + X_I\). Let \(X^*_1\) and \(X^*_2\) be total scores calculated from items only in the first and second subsets. Then estimate of reliability is given by Spearman-Brown formula (Spearman, 1910; Brown, 1910) with \(m = 2\).
$$\text{rel}(X) = \frac{m\cdot \text{cor}(X^*_1, X^*_2)}{1 + (m - 1)\cdot\text{cor}(X^*_1, X^*_2)} = \frac{2\cdot \text{cor}(X^*_1, X^*_2)}{1 + \text{cor}(X^*_1, X^*_2)}$$Below you can choose from different split-half approaches. First-last method uses correlation between the first half of items and the second half of items. Even-odd includes even items into the first subset and odd items into the second one. Random method performs random split of items, thus the resulting estimate may be different for each call. Revelle's \(\beta\) is actually the worst split-half (Revelle, 1979). Estimate is here calculated as the lowest split-half reliability of by default 10,000 random splits. Finally, Average considers by default 10,000 split halves and averages the resulting estimates. Number of split halves can be changed below. In case of odd number of items, first subset contains one more item than second one.
Estimate of reliability for First-last , Even-odd , Random and Revelle's \(\beta\) is calculated using Spearman-Brown formula. Confidence interval is based on confidence interval of correlation using delta method. Estimate of reliability for Average method is mean value of sampled reliabilities and confidence interval is confidence interval of this mean.
Histogram is based on selected number of split halves estimates (10,000 by default). The current estimate is highlighted by red colour.
Downloadlibrary(psych)
library(ShinyItemAnalysis)
# loading data
data(HCI)
# First-last splitting
df1 <- HCI[, 1:10]
df2 <- HCI[, 11:20]
# total score calculation
ts1 <- apply(df1, 1, sum)
ts2 <- apply(df2, 1, sum)
# correlation
cor.x <- cor(ts1, ts2)
# apply Spearmann-Brown formula to estimate reliability
(rel.x <- 2*cor.x/(1 + cor.x))
# Even-odd splitting
df1 <- HCI[, seq(1, 20, 2)]
df2 <- HCI[, seq(2, 20, 2)]
# total score calculation
ts1 <- apply(df1, 1, sum)
ts2 <- apply(df2, 1, sum)
# correlation
cor.x <- cor(ts1, ts2)
# apply Spearmann-Brown formula to estimate reliability
(rel.x <- 2*cor.x/(1 + cor.x))
# Random splitting
samp <- sample(1:20, 10)
df1 <- HCI[, samp]
df2 <- HCI[, setdiff(1:20, samp)]
# total score calculation
ts1 <- apply(df1, 1, sum)
ts2 <- apply(df2, 1, sum)
# correlation
cor.x <- cor(ts1, ts2)
# apply Spearmann-Brown formula to estimate reliability
(rel.x <- 2*cor.x/(1 + cor.x))
# Minimum of 10,000 split-halves (Revelle's beta)
split <- psych::splitHalf(HCI[, 1:20], raw = TRUE)
items1 <- which(split$minAB[, 'A'] == 1)
items2 <- which(split$minAB[, 'B'] == 1)
df1 <- HCI[, items1]
df2 <- HCI[, items2]
# total score calculation
ts1 <- apply(df1, 1, sum)
ts2 <- apply(df2, 1, sum)
# correlation
cor.x <- cor(ts1, ts2)
# apply Spearmann-Brown formula to estimate reliability
(rel.x <- 2*cor.x/(1 + cor.x))
# calculation of CI
z.r <- 0.5*log((1 + cor.x)/(1 - cor.x))
n <- length(ts1)
z.low <- z.r - 1.96 * sqrt(1/(n - 3))
z.upp <- z.r + 1.96 * sqrt(1/(n - 3))
cor.low <- (exp(2*z.low) - 1)/(exp(2*z.low) + 1)
cor.upp <- (exp(2*z.upp) - 1)/(exp(2*z.upp) + 1)
rel.x <- 2*cor.x/(1 + cor.x)
rel.low <- 2*cor.low/(1 + cor.low)
rel.upp <- 2*cor.upp/(1 + cor.upp)
# Average 10,000 split-halves
split <- psych::splitHalf(HCI[, 1:20], raw = TRUE)
(rel.x <- mean(split$raw))
# Average all split-halves
split <- psych::splitHalf(HCI[, 1:20], raw = TRUE, brute = TRUE)
(rel.x <- mean(split$raw))
# calculation of CI
n <- length(split$raw)
rel.low <- rel.x - 1.96 * sd(split$raw)/sqrt(n)
rel.upp <- rel.x + 1.96 * sd(split$raw)/sqrt(n)
Cronbach's \(\alpha\) is an estimate of internal consistency of a psychometric test. It is a function of the number of items in a test, the average covariance between item-pairs, and the variance of the total score (Cronbach, 1951).
For test with \(I\) items where \(X = X_1 + ... + X_I\) is a total score, \(\sigma^2_X\) its variance and \(\sigma^2_{X_i}\) variances of items, Cronbach's \(\alpha\) is given by following equation
$$\alpha = \frac{I}{I-1}\left(1 - \frac{\sum_{i = 1}^I \sigma^2_{X_i}}{\sigma^2_X}\right)$$Confidence interval is based on F distribution as proposed by Feldt et al. (1987).
library(psychometric)
library(ShinyItemAnalysis)
# loading data
data(HCI)
data <- HCI[, 1:20]
# Cronbach's alpha with confidence interval
a <- psychometric::alpha(data)
psychometric::alpha.CI(a, N = nrow(data), k = ncol(data), level = 0.95)
Correlation heat map displays selected type of correlations between items. The size and shade of circles indicate how much the items are correlated (larger and darker circle mean larger correlations). The color of circles indicates in which way the items are correlated - blue color mean possitive correlation and red color mean negative correlation. Correlation heat map can be reordered using hierarchical clustering method selected below. With number of clusters larger than 1, the rectangles representing clusters are drawn. The values of correlation heatmap may be displayed and also downloaded.
Pearson correlation coefficient describes linear correlation between two random variables X and Y. It is given by formula
$$\rho = \frac{cov(X,Y)}{\sqrt{var(X)}\sqrt{var(Y)}}.$$Sample Pearson corelation coefficient may be calculated as
$$ r = \frac{\sum_{i = 1}^{n}(x_{i} - \bar{x})(y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n}(x_{i} - \bar{x})^2}\sqrt{\sum_{i = 1}^{n}(y_{i} - \bar{y})^2}}$$Pearson correlation coefficient has a value between -1 and +1. Sample correlation of -1 and +1 correspond to all data points lying exactly on a line (decreasing in case of negative linear correlation -1 and increasing for +1). If coefficient is equal to 0 it implies no linear correlation between the variables.
Polychoric/tetrachoric correlation between two ordinal/binary variables is calculated from their contingency table, under the assumption that the ordinal variables dissect continuous latent variables that are bivariate normal.
Spearman's rank correlation coefficient describes strength and direction of monotonic relationship between random variables X and Y, i.e. dependence between the rankings of two variables. It is given by formula
$$\rho = \frac{cov(rg_{X},rg_{Y})}{\sqrt{var(rg_{X})}\sqrt{var(rg_{Y})}},$$where rgX and rgY are transformed random variables X and Y into ranks, i.e Spearman correlation coefficient is the Pearson correlation coefficient between the ranked variables.
Sample Spearman correlation is calculated by converting X and Y to ranks (average ranks are used in case of ties) and by applying Pearson correlation formula. If both X and Y have n unique ranks, i.e. there are no ties, then sample correlation coefficient is given by formula
$$ r = 1 - \frac{6\sum_{i = 1}^{n}d_i^{2}}{n(n-1)}$$where d = rgX - rgY is the difference between two ranks and n is size of X and Y. Spearman rank correlation coefficient has value between -1 and 1, where 1 means perfect increasing relationship between variables and -1 means decreasing relationship between the two variables. In case of no repeated values, Spearman correlation of +1 or -1 means all data points lying exactly on some monotone line. If coefficient is equal to 0, it means, there is no tendency for Y to either increase or decrease with X increasing.
Clustering methods. Ward's method aims at finding compact clusters based on minimizing the within-cluster sum of squares. Ward's n. 2 method uses squared disimilarities. Single method connects clusters with the nearest neighbours, i.e. the distance between two clusters is calculated as the minimum of distances of observations in one cluster and observations in the other clusters. Complete linkage with farthest neighbours on the other hand uses maximum of distances. Average linkage method uses the distance based on weighted average of the individual distances. McQuitty method uses unweighted average. Median linkage calculates the distance as the median of distances between an observation in one cluster and observation in the other cluster. Centroid method uses distance between centroids of clusters.
A scree plot displays the eigenvalues associated with an component or a factor in descending order versus the number of the component or factor. Location of a bend (an elbow) suggests a suitable number of factors.
Download figurelibrary(corrplot)
library(ggdendro)
library(difNLR)
library(psych)
# loading data
data(GMAT)
data <- GMAT[, 1:20]
# calculation of correlation
### Pearson
corP <- cor(data, method = "pearson")
### Spearman
corP <- cor(data, method = "spearman")
### Polychoric
corP <- polychoric(data)
corP$rho
# correlation heat map
corrplot(corP$rho) # correlation plot
corrplot(corP$rho, order = "hclust", hclust.method = "ward.D", addrect = 3) # correlation plot with 3 clusters using Ward method
# dendrogram
hc <- hclust(as.dist(1 - corP$rho), method = "ward.D") # hierarchical clustering
ggdendrogram(hc) # dendrogram
library(difNLR)
library(psych)
# loading data
data(GMAT)
data <- GMAT[, 1:20]
# scree plot
ev <- eigen(corP$rho)$values # eigen values
df <- data.frame(comp = 1:length(ev), ev)
ggplot(df, aes(x = comp, y = ev)) +
geom_point() +
geom_line() +
ylab("Eigen value") +
xlab("Component number") +
theme_app()
This section requires criterion variable (e.g. future study success or future GPA in case of admission tests) which should correlate with the measurement. Criterion variable can be uploaded in Data section.
Total scores are plotted according to criterion variable. Boxplot or scatterplot is displayed depending on the type of criterion variable - whether it is discrete or continuous. Scatterplot is provided with red linear regression line.
Download figureTest for association between total score and criterion variable is based on Spearman`s \(\rho\). This rank-based measure has been recommended if bivariate normal distribution is not guaranteed. The null hypothesis is that correlation is 0.
library(ShinyItemAnalysis)
library(difNLR)
# loading data
data(GMAT)
data01 <- GMAT[, 1:20]
# total score calculation
score <- apply(data01, 1, sum)
# criterion variable
criterion <- GMAT[, "criterion"]
# number of respondents in each criterion level
size <- as.factor(criterion)
levels(size) <- table(as.factor(criterion))
size <- as.numeric(paste(size))
df <- data.frame(score, criterion, size)
# descriptive plots
### boxplot, for discrete criterion
ggplot(df, aes(y = score, x = as.factor(criterion), fill = as.factor(criterion))) +
geom_boxplot() +
geom_jitter(shape = 16, position = position_jitter(0.2)) +
scale_fill_brewer(palette = "Blues") +
xlab("Criterion group") +
ylab("Total score") +
coord_flip() +
theme_app()
### scatterplot, for continuous criterion
ggplot(df, aes(x = score, y = criterion)) +
geom_point() +
ylab("Criterion variable") +
xlab("Total score") +
geom_smooth(method = lm,
se = FALSE,
color = "red") +
theme_app()
# correlation
cor.test(criterion, score, method = "spearman", exact = FALSE)
This section requires criterion variable (e.g. future study success or future GPA in case of admission tests) which should correlate with the measurement. Criterion variable can be uploaded in Data section. Here you can explore how the criterion correlates with individual items.
In distractor analysis based on criterion variable, we are interested in how test takers select the correct answer and how the distractors (wrong answers) with respect to group based on criterion variable.
With option Combinations all item selection patterns are plotted (e.g. AB, ACD, BC). With option Distractors answers are splitted into distractors (e.g. A, B, C, D).
Test for association between total score and criterion variable is based on Spearman`s \(\rho\). This rank-based measure has been recommended if bivariate normal distribution is not guaranteed. The null hypothesis is that correlation is 0.
library(ShinyItemAnalysis)
library(difNLR)
# loading data
data("GMAT", "GMATtest", "GMATkey")
data <- GMATtest[, 1:20]
data01 <- GMAT[, 1:20]
key <- GMATkey
criterion <- GMAT[, "criterion"]
# distractor plot for item 1 and 3 groups
plotDistractorAnalysis(data, key, num.groups = 3, item = 1, matching = criterion)
# correlation for item 1
cor.test(criterion, data01[, 1], method = "spearman", exact = F)
Traditional item analysis uses proportions of correct answers or correlations to estimate item properties.
Displayed is difficulty (red) and discrimination (blue)
for all items. Items are ordered by difficulty.
Difficulty
of items is estimated as percent of respondents who
answered correctly to that item.
Discrimination
is by default described by difference of percent correct
in upper and lower third of respondents (Upper-Lower Index, ULI). By rule of
thumb it should not be lower than 0.2 (borderline in the plot), except for
very easy or very difficult items. Discrimination can be customized (see also Martinkova, Stepanek, et al.
(2017)) by changing number of groups and by changing which groups should be compared:
Chronbach's alpha is an estimate of the reliability of a psychometric test. It is a function of the number of items in a test, the average covariance between item-pairs, and the variance of the total score (Cronbach, 1951).
library(difNLR)
library(psych)
library(psychometric)
library(ShinyItemAnalysis)
# loading data
data(GMAT)
data <- GMAT[, 1:20]
# difficulty and discrimination plot
DDplot(data, discrim = 'ULI', k = 3, l = 1, u = 3)
# Cronbach alpha
psych::alpha(data)
# traditional item analysis table
tab <- round(data.frame(item.exam(data, discr = TRUE)[, c(4, 1, 5, 2, 3)],
psych::alpha(data)$alpha.drop[, 1],
gDiscrim(data, k = 3, l = 1, u = 3)), 2)
colnames(tab) <- c("Difficulty", "SD", "Dsicrimination ULI", "Discrimination RIT", "Discrimination RIR", "Alpha Drop", "Customized Discrimination")
tab
In distractor analysis, we are interested in how test takers select the correct answer and how the distractors (wrong answers) were able to function effectively by drawing the test takers away from the correct answer.
With option Combinations all item selection patterns are plotted (e.g. AB, ACD, BC). With option Distractors answers are splitted into distractors (e.g. A, B, C, D).
library(difNLR)
library(ShinyItemAnalysis)
# loading data
data(GMATtest)
data <- GMATtest[, 1:20]
data(GMATkey)
key <- GMATkey
# combinations - plot for item 1 and 3 groups
plotDistractorAnalysis(data, key, num.group = 3, item = 1, multiple.answers = TRUE)
# distractors - plot for item 1 and 3 groups
plotDistractorAnalysis(data, key, num.group = 3, item = 1, multiple.answers = FALSE)
# table with counts and margins - item 1 and 3 groups
DA <- DistractorAnalysis(data, key, num.groups = 3)[[1]]
dcast(as.data.frame(DA), response ~ score.level, sum, margins = TRUE, value.var = "Freq")
# table with proportions - item 1 and 3 groups
DistractorAnalysis(data, key, num.groups = 3, p.table = TRUE)[[1]]
Various regression models may be fitted to describe item properties in more detail. Logistic regression can model dependency of probability of correct answer on total score by S-shaped logistic curve. Parameter b0 describes horizontal position of the fitted curve, parameter b1 describes its slope.
Points represent proportion of correct answer with respect to total score. Their size is determined by count of respondents who achieved given level of total score.
library(difNLR)
library(ggplot2)
# loading data
data(GMAT)
data <- GMAT[, 1:20]
score <- apply(data, 1, sum) # total score
# logistic model for item 1
fit <- glm(data[, 1] ~ score, family = binomial)
# coefficients
coef(fit)
# function for plot
fun <- function(x, b0, b1){exp(b0 + b1 * x) / (1 + exp(b0 + b1 * x))}
# empirical probabilities calculation
df <- data.frame(x = sort(unique(score)),
y = tapply(data[, 1], score, mean),
size = as.numeric(table(score)))
# plot of estimated curve
ggplot(df, aes(x = x, y = y)) +
geom_point(aes(size = size),
color = "darkblue",
fill = "darkblue",
shape = 21, alpha = 0.5) +
stat_function(fun = fun, geom = "line",
args = list(b0 = coef(fit)[1],
b1 = coef(fit)[2]),
size = 1,
color = "darkblue") +
xlab("Total score") +
ylab("Probability of correct answer") +
ylim(0, 1) +
ggtitle("Item 1") +
theme_app()
Various regression models may be fitted to describe item properties in more detail. Logistic regression can model dependency of probability of correct answer on standardized total score (Z-score) by S-shaped logistic curve. Parameter b0 describes horizontal position of the fitted curve (difficulty), parameter b1 describes its slope at inflection point (discrimination).
Points represent proportion of correct answer with respect to standardized total score. Their size is determined by count of respondents who achieved given level of standardized total score.
library(difNLR)
library(ggplot2)
# loading data
data(GMAT)
data <- GMAT[, 1:20]
zscore <- scale(apply(data, 1, sum)) # standardized total score
# logistic model for item 1
fit <- glm(data[, 1] ~ zscore, family = binomial)
# coefficients
coef(fit)
# function for plot
fun <- function(x, b0, b1){exp(b0 + b1 * x) / (1 + exp(b0 + b1 * x))}
# empirical probabilities calculation
df <- data.frame(x = sort(unique(zscore)),
y = tapply(data[, 1], zscore, mean),
size = as.numeric(table(zscore)))
# plot of estimated curve
ggplot(df, aes(x = x, y = y)) +
geom_point(aes(size = size),
color = "darkblue",
fill = "darkblue",
shape = 21, alpha = 0.5) +
stat_function(fun = fun, geom = "line",
args = list(b0 = coef(fit)[1],
b1 = coef(fit)[2]),
size = 1,
color = "darkblue") +
xlab("Standardized total score") +
ylab("Probability of correct answer") +
ylim(0, 1) +
ggtitle("Item 1") +
theme_app()
Various regression models may be fitted to describe item properties in more detail. Logistic regression can model dependency of probability of correct answer on standardized total score (Z-score) by s-shaped logistic curve. Note change in parametrization - the IRT parametrization used here corresponds to the parametrization used in IRT models. Parameter b describes horizontal position of the fitted curve (difficulty), parameter a describes its slope at inflection point (discrimination).
Points represent proportion of correct answer with respect to standardized total score. Their size is determined by count of respondents who achieved given level of standardized total score.
library(difNLR)
library(ggplot2)
# loading data
data(GMAT)
data <- GMAT[, 1:20]
zscore <- scale(apply(data, 1, sum)) # standardized total score
# logistic model for item 1
fit <- glm(data[, 1] ~ zscore, family = binomial)
# coefficients
coef <- c(a = coef(fit)[2], b = - coef(fit)[1] / coef(fit)[2])
coef
# function for plot
fun <- function(x, a, b){exp(a * (x - b)) / (1 + exp(a * (x - b)))}
# empirical probabilities calculation
df <- data.frame(x = sort(unique(zscore)),
y = tapply(data[, 1], zscore, mean),
size = as.numeric(table(zscore)))
# plot of estimated curve
ggplot(df, aes(x = x, y = y)) +
geom_point(aes(size = size),
color = "darkblue",
fill = "darkblue",
shape = 21, alpha = 0.5) +
stat_function(fun = fun, geom = "line",
args = list(a = coef[1],
b = coef[2]),
size = 1,
color = "darkblue") +
xlab("Standardized total score") +
ylab("Probability of correct answer") +
ylim(0, 1) +
ggtitle("Item 1") +
theme_app()
Various regression models may be fitted to describe item properties in more detail. Nonlinear regression can model dependency of probability of correct answer on standardized total score (Z-score) by s-shaped logistic curve. The IRT parametrization used here corresponds to the parametrization used in IRT models. Parameter b describes horizontal position of the fitted curve (difficulty), parameter a describes its slope at inflection point (discrimination). This model allows for nonzero lower left asymptote c (pseudo-guessing parameter).
Points represent proportion of correct answer with respect to standardized total score. Their size is determined by count of respondents who achieved given level of standardized total score.
library(difNLR)
library(ggplot2)
# loading data
data(GMAT)
data <- GMAT[, 1:20]
zscore <- scale(apply(data, 1, sum)) # standardized total score
# NLR 3P model for item 1
fun <- function(x, a, b, c){c + (1 - c) * exp(a * (x - b)) / (1 + exp(a * (x - b)))}
fit <- nls(data[, 1] ~ fun(zscore, a, b, c),
algorithm = "port",
start = startNLR(data, GMAT[, "group"], model = "3PLcg", parameterization = "classic")[[1]][1:3],
lower = c(-Inf, -Inf, 0,),
upper = c(Inf, Inf, 1))
# coefficients
coef(fit)
# empirical probabilities calculation
df <- data.frame(x = sort(unique(zscore)),
y = tapply(data[, 1], zscore, mean),
size = as.numeric(table(zscore)))
# plot of estimated curve
ggplot(df, aes(x = x, y = y)) +
geom_point(aes(size = size),
color = "darkblue",
fill = "darkblue",
shape = 21, alpha = 0.5) +
stat_function(fun = fun, geom = "line",
args = list(a = coef(fit)[1],
b = coef(fit)[2],
c = coef(fit)[3]),
size = 1,
color = "darkblue") +
xlab("Standardized total score") +
ylab("Probability of correct answer") +
ylim(0, 1) +
ggtitle("Item 1") +
theme_app()
Various regression models may be fitted to describe item properties in more detail. Nonlinear four parameter regression can model dependency of probability of correct answer on standardized total score (Z-score) by s-shaped logistic curve. The IRT parametrization used here corresponds to the parametrization used in IRT models. Parameter b describes horizontal position of the fitted curve (difficulty), parameter a describes its slope at inflection point (discrimination), pseudo-guessing parameter c is describes lower asymptote and inattention parameter d describes upper asymptote.
Points represent proportion of correct answer with respect to standardized total score. Their size is determined by count of respondents who achieved given level of standardized total score.
library(difNLR)
library(ggplot2)
# loading data
data(GMAT)
data <- GMAT[, 1:20]
zscore <- scale(apply(data, 1, sum)) # standardized total score
# NLR 4P model for item 1
fun <- function(x, a, b, c, d){c + (d - c) * exp(a * (x - b)) / (1 + exp(a * (x - b)))}
fit <- nls(data[, 1] ~ fun(zscore, a, b, c, d),
algorithm = "port",
start = startNLR(data, GMAT[, "group"], model = "4PLcgdg", parameterization = "classic")[[1]][1:4],
lower = c(-Inf, -Inf, 0, 0),
upper = c(Inf, Inf, 1, 1))
# coefficients
coef(fit)
# empirical probabilities calculation
df <- data.frame(x = sort(unique(zscore)),
y = tapply(data[, 1], zscore, mean),
size = as.numeric(table(zscore)))
# plot of estimated curve
ggplot(df, aes(x = x, y = y)) +
geom_point(aes(size = size),
color = "darkblue",
fill = "darkblue",
shape = 21, alpha = 0.5) +
stat_function(fun = fun, geom = "line",
args = list(a = coef(fit)[1],
b = coef(fit)[2],
c = coef(fit)[3],
d = coef(fit)[4]),
size = 1,
color = "darkblue") +
xlab("Standardized total score") +
ylab("Probability of correct answer") +
ylim(0, 1) +
ggtitle("Item 1") +
theme_app()
Here you can compare classic 2PL logistic regression model to non-linear model item by item using some information criteria:
Another approach to nested models can be likelihood ratio chi-squared test. Significance level is set to 0.05. As tests are performed item by item, it is possible to use multiple comparison correction method.
Rows BEST indicate which model has the lowest value of criterion, or is the largest significant model by likelihood ratio test.
library(difNLR)
# loading data
data(GMAT)
Data <- GMAT[, 1:20]
zscore <- scale(apply(Data, 1, sum)) # standardized total score
# function for fitting models
fun <- function(x, a, b, c, d){c + (d - c) * exp(a * (x - b)) / (1 + exp(a * (x - b)))}
# starting values for item 1
start <- startNLR(Data, GMAT[, "group"], model = "4PLcgdg", parameterization = "classic")[[1]][, 1:4]
# 2PL model for item 1
fit2PL <- nls(Data[, 1] ~ fun(zscore, a, b, c = 0, d = 1),
algorithm = "port",
start = start[1:2])
# NLR 3P model for item 1
fit3PL <- nls(Data[, 1] ~ fun(zscore, a, b, c, d = 1),
algorithm = "port",
start = start[1:3],
lower = c(-Inf, -Inf, 0),
upper = c(Inf, Inf, 1))
# NLR 4P model for item 1
fit3PL <- nls(Data[, 1] ~ fun(zscore, a, b, c, d),
algorithm = "port",
start = start,
lower = c(-Inf, -Inf, 0, 0),
upper = c(Inf, Inf, 1, 1))
# comparison
### AIC
AIC(fit2PL); AIC(fit3PL); AIC(fit4PL)
### BIC
BIC(fit2PL); BIC(fit3PL); BIC(fit4PL)
### LR test, using Benjamini-Hochberg correction
###### 2PL vs NLR 3P
LRstat <- -2 * (sapply(fit2PL, logLik) - sapply(fit3PL, logLik))
LRdf <- 1
LRpval <- 1 - pchisq(LRstat, LRdf)
LRpval <- p.adjust(LRpval, method = "BH")
###### NLR 3P vs NLR 4P
LRstat <- -2 * (sapply(fit3PL, logLik) - sapply(fit4PL, logLik))
LRdf <- 1
LRpval <- 1 - pchisq(LRstat, LRdf)
LRpval <- p.adjust(LRpval, method = "BH")
Various regression models may be fitted to describe item properties in more detail. Multinomial regression allows for simultaneous modelling of probability of choosing given distractors on standardized total score (Z-score).
Points represent proportion of selected option with respect to standardized total score. Their size is determined by count of respondents who achieved given level of standardized total score and who selected given option.
library(difNLR)
library(nnet)
# loading data
data(GMAT, GMATtest, GMATkey)
zscore <- scale(apply(GMAT[, 1:20] , 1, sum)) # standardized total score
data <- GMATtest[, 1:20]
key <-GMATkey
# multinomial model for item 1
fit <- multinom(relevel(data[, 1], ref = paste(key[1])) ~ zscore)
# coefficients
coef(fit)
Item Response Theory (IRT) models are mixed-effect regression models in which respondent ability \(\theta\) is assumed to be latent and is estimated together with item paramters.
In Rasch model (Rasch, 1960), all items are assumed to have the same slope in inflection point, i.e., the same discrimination parameter \(a\) which is fixed to value of 1. Items may differ in location of their inflection point, i.e. they may differ in difficulty parameter \(b\). Model parameters are estimated using marginal maximum likelihood (MML) method. Ability \(\theta\) is assumed to follow normal distribution with freely estimated variance.
Estimates of parameters are completed by SX2 item fit statistics (Orlando and Thissen, 2000). SX2 statistics are computed only when no missing data are present.
This table shows the response score of only six respondents. If you want to see scores for all respondents, click on Download abilities button.
Wright map (Wilson, 2005; Wright & Stone, 1979), also called item-person map, is a graphical tool to display person ability estimates and item parameters. The person side (left) represents histogram of estimated abilities of respondents. The item side (right) displays estimates of difficulty parameters of individual items.
Download figurelibrary(difNLR)
library(mirt)
library(ShinyItemAnalysis)
# loading data
data(GMAT)
data <- GMAT[, 1:20]
# fitting Rasch model
fit <- mirt(data, model = 1, itemtype = 'Rasch', SE = T)
# Item Characteristic Curves
plot(fit, type = 'trace', facet_items = F)
# Item Information Curves
plot(fit, type = 'infotrace', facet_items = F)
# Test Information Function
plot(fit, type = 'infoSE')
# Coefficients
coef(fit, simplify = TRUE)
coef(fit, IRTpars = TRUE, simplify = TRUE)
# Item fit statistics
itemfit(fit)
# Factor scores vs Standardized total scores
fs <- as.vector(fscores(fit))
sts <- as.vector(scale(apply(data, 1, sum)))
plot(fs ~ sts)
# Wright Map
b <- sapply(1:ncol(data), function(i) coef(fit)[[i]][, 'd'])
ggWrightMap(fs, b)
Item Response Theory (IRT) models are mixed-effect regression models in which respondent ability \(\theta\) is assumed to be latent and is estimated together with item paramters.
In Rasch model (Rasch, 1960), all items are assumed to have the same slope in inflection point, i.e., the same discrimination parameter \(a\) which is fixed to value of 1. Items may differ in location of their inflection point, i.e. they may differ in difficulty parameter \(b\). Model parameters are estimated using marginal maximum likelihood (MML) method. Ability \(\theta\) is assumed to follow normal distribution with freely estimated variance.
Estimates of parameters are completed by SX2 item fit statistics (Orlando & Thissen, 2000). SX2 is computed only when no missing data are present. In such a case consider using imputed dataset!
Item Response Theory (IRT) models are mixed-effect regression models in which respondent ability \(\theta\) is assumed to be latent and is estimated together with item paramters.
In 1PL IRT model, all items are assumed to have the same slope in inflection point, i.e., the same discrimination \(a\). Its value corresponds to standard deviation of ability estimates in Rasch model. Items can differ in location of their inflection point, i.e., in item difficulty parameters \(b\). Model parameters are estimated using marginal maximum likelihood (MML) method. Ability \(\theta\) is assumed to follow standard normal distribution.
Estimates of parameters are completed by SX2 item fit statistics (Orlando and Thissen, 2000). SX2 statistics are computed only when no missing data are present.
This table shows the response score of only six respondents. If you want to see scores for all respondents, click on Download abilities button.
Wright map (Wilson, 2005; Wright & Stone, 1979), also called item-person map, is a graphical tool to display person ability estimates and item parameters. The person side (left) represents histogram of estimated abilities of respondents. The item side (right) displays estimates of difficulty parameters of individual items.
Download figurelibrary(difNLR)
library(mirt)
library(ShinyItemAnalysis)
# loading data
data(GMAT)
data <- GMAT[, 1:20]
# fitting 1PL model
fit <- mirt(data, model = 1, itemtype = '2PL', constrain = list((1:ncol(data)) + seq(0, (ncol(data) - 1)*3, 3)), SE = T)
# Item Characteristic Curves
plot(fit, type = 'trace', facet_items = F)
# Item Information Curves
plot(fit, type = 'infotrace', facet_items = F)
# Test Information Function
plot(fit, type = 'infoSE')
# Coefficients
coef(fit, simplify = TRUE)
coef(fit, IRTpars = TRUE, simplify = TRUE)
# Item fit statistics
itemfit(fit)
# Factor scores vs Standardized total scores
fs <- as.vector(fscores(fit))
sts <- as.vector(scale(apply(data, 1, sum)))
plot(fs ~ sts)
# Wright Map
b <- sapply(1:ncol(data), function(i) coef(fit)[[i]][, 'd'])
ggWrightMap(fs, b)
# You can also use ltm library for IRT models
# fitting 1PL model
fit <- rasch(data)
# for Rasch model use
# fit <- rasch(data, constraint = cbind(ncol(data) + 1, 1))
# Item Characteristic Curves
plot(fit)
# Item Information Curves
plot(fit, type = 'IIC')
# Test Information Function
plot(fit, items = 0, type = 'IIC')
# Coefficients
coef(fit)
# Factor scores vs Standardized total scores
df1 <- ltm::factor.scores(fit, return.MIvalues = T)$score.dat
FS <- as.vector(df1[, 'z1'])
df2 <- df1
df2$Obs <- df2$Exp <- df2$z1 <- df2$se.z1 <- NULL
STS <- as.vector(scale(apply(df2, 1, sum)))
df <- data.frame(FS, STS)
plot(FS ~ STS, data = df, xlab = 'Standardized total score', ylab = 'Factor score')
Item Response Theory (IRT) models are mixed-effect regression models in which respondent ability \(\theta\) is assumed to be latent and is estimated together with item paramters.
In 1PL IRT model, all items are assumed to have the same slope in inflection point, i.e., the same discrimination \(a\). Its value corresponds to standard deviation of ability estimates in Rasch model. Items can differ in location of their inflection point, i.e., in item difficulty parameters \(b\). Model parameters are estimated using marginal maximum likelihood (MML) method. Ability \(\theta\) is assumed to follow standard normal distribution.
Estimates of parameters are completed by SX2 item fit statistics (Orlando and Thissen, 2000). SX2 statistics are computed only when no missing data are present.
Item Response Theory (IRT) models are mixed-effect regression models in which respondent ability \(\theta\) is assumed to be latent and is estimated together with item paramters.
2PL IRT model allows for different slopes in inflection point, i.e., different discrimination parameters \(a\). Items can also differ in location of their inflection point, i.e., in item difficulty parameters \(b\). Model parameters are estimated using marginal maximum likelihood (MML) method. Ability \(\theta\) is assumed to follow standard normal distribution.
Estimates of parameters are completed by SX2 item fit statistics (Orlando and Thissen, 2000). SX2 statistics are computed only when no missing data are present.
This table shows the response score of only six respondents. If you want to see scores for all respondents, click on Download abilities button.
library(difNLR)
library(mirt)
data(GMAT)
data <- GMAT[, 1:20]
# Model
fit <- mirt(data, model = 1, itemtype = "2PL", SE = T)
# Item Characteristic Curves
plot(fit, type = "trace", facet_items = F)
# Item Information Curves
plot(fit, type = "infotrace", facet_items = F)
# Test Information Function
plot(fit, type = "infoSE")
# Coefficients
coef(fit, simplify = TRUE)
coef(fit, IRTpars = TRUE, simplify = TRUE)
# Item fit statistics
itemfit(fit)
# Factor scores vs Standardized total scores
fs <- as.vector(fscores(fit))
sts <- as.vector(scale(apply(data, 1, sum)))
plot(fs ~ sts)
# You can also use ltm library for IRT models
library(difNLR)
library(ltm)
data(GMAT)
data <- GMAT[, 1:20]
# Model
fit <- ltm(data ~ z1, IRT.param = TRUE)
# Item Characteristic Curves
plot(fit)
# Item Information Curves
plot(fit, type = "IIC")
# Test Information Function
plot(fit, items = 0, type = "IIC")
# Coefficients
coef(fit)
# Factor scores vs Standardized total scores
df1 <- ltm::factor.scores(fit, return.MIvalues = T)$score.dat
FS <- as.vector(df1[, "z1"])
df2 <- df1
df2$Obs <- df2$Exp <- df2$z1 <- df2$se.z1 <- NULL
STS <- as.vector(scale(apply(df2, 1, sum)))
df <- data.frame(FS, STS)
plot(FS ~ STS, data = df,
xlab = "Standardized total score",
ylab = "Factor score")
Item Response Theory (IRT) models are mixed-effect regression models in which respondent ability \(\theta\) is assumed to be latent and is estimated together with item paramters.
2PL IRT model allows for different slopes in inflection point, i.e., different discrimination parameters \(a\). Items can also differ in location of their inflection point, i.e., in item difficulty parameters \(b\). Model parameters are estimated using marginal maximum likelihood (MML) method. Ability \(\theta\) is assumed to follow standard normal distribution.
Estimates of parameters are completed by SX2 item fit statistics (Orlando and Thissen, 2000). SX2 statistics are computed only when no missing data are present.
Item Response Theory (IRT) models are mixed-effect regression models in which respondent ability \(\theta\) is assumed to be latent and is estimated together with item paramters.
3PL IRT model allows for different discriminations of items \(a\), different item difficulties \(b\) and allows also for nonzero left asymptote, pseudo-guessing \(c\). Model parameters are estimated using marginal maximum likelihood (MML) method. Ability \(\theta\) is assumed to follow standard normal distribution.
Estimates of parameters are completed by SX2 item fit statistics (Orlando and Thissen, 2000). SX2 statistics are computed only when no missing data are present.
This table shows the response score of only six respondents. If you want to see scores for all respondents, click on Download abilities button.
library(difNLR)
library(mirt)
data(GMAT)
data <- GMAT[, 1:20]
# Model
fit <- mirt(data, model = 1, itemtype = "3PL", SE = T)
# Item Characteristic Curves
plot(fit, type = "trace", facet_items = F)
# Item Information Curves
plot(fit, type = "infotrace", facet_items = F)
# Test Information Function
plot(fit, type = "infoSE")
# Coefficients
coef(fit, simplify = TRUE)
coef(fit, IRTpars = TRUE, simplify = TRUE)
# Item fit statistics
itemfit(fit)
# Factor scores vs Standardized total scores
fs <- as.vector(fscores(fit))
sts <- as.vector(scale(apply(data, 1, sum)))
plot(fs ~ sts)
# You can also use ltm library for IRT models
library(difNLR)
library(ltm)
data(GMAT)
data <- GMAT[, 1:20]
# Model
fit <- tpm(data, IRT.param = TRUE)
# Item Characteristic Curves
plot(fit)
# Item Information Curves
plot(fit, type = "IIC")
# Test Information Function
plot(fit, items = 0, type = "IIC")
# Coefficients
coef(fit)
# Factor scores vs Standardized total scores
df1 <- ltm::factor.scores(fit, return.MIvalues = T)$score.dat
FS <- as.vector(df1[, "z1"])
df2 <- df1
df2$Obs <- df2$Exp <- df2$z1 <- df2$se.z1 <- NULL
STS <- as.vector(scale(apply(df2, 1, sum)))
df <- data.frame(FS, STS)
plot(FS ~ STS, data = df,
xlab = "Standardized total score",
ylab = "Factor score")
Item Response Theory (IRT) models are mixed-effect regression models in which respondent ability \(\theta\) is assumed to be latent and is estimated together with item paramters.
3PL IRT model allows for different discriminations of items \(a\), different item difficulties \(b\) and allows also for nonzero left asymptote, pseudo-guessing \(c\). Model parameters are estimated using marginal maximum likelihood (MML) method. Ability \(\theta\) is assumed to follow standard normal distribution.
Estimates of parameters are completed by SX2 item fit statistics (Orlando and Thissen, 2000). SX2 statistics are computed only when no missing data are present.
Item Response Theory (IRT) models are mixed-effect regression models in which respondent ability \(\theta\) is assumed to be latent and is estimated together with item paramters.
4PL IRT model allows for different discriminations of items \(a\), different item difficulties \(b\), nonzero left asymptote, pseudo-guessing \(c\) and also for upper asymptote lower than one, i.e, inattention parameter \(d\). Model parameters are estimated using marginal maximum likelihood (MML) method. Ability \(\theta\) is assumed to follow standard normal distribution.
Estimates of parameters are completed by SX2 item fit statistics (Orlando and Thissen, 2000). SX2 statistics are computed only when no missing data are present.
This table shows the response score of only six respondents. If you want to see scores for all respondents, click on Download abilities button.
library(difNLR)
library(mirt)
data(GMAT)
data <- GMAT[, 1:20]
# Model
fit <- mirt(data, model = 1, itemtype = "4PL", SE = T)
# Item Characteristic Curves
plot(fit, type = "trace", facet_items = F)
# Item Information Curves
plot(fit, type = "infotrace", facet_items = F)
# Test Information Function
plot(fit, type = "infoSE")
# Coefficients
coef(fit, simplify = TRUE)
coef(fit, IRTpars = TRUE, simplify = TRUE)
# Item fit statistics
itemfit(fit)
# Factor scores vs Standardized total scores
fs <- as.vector(fscores(fit))
sts <- as.vector(scale(apply(data, 1, sum)))
plot(fs ~ sts)
Item Response Theory (IRT) models are mixed-effect regression models in which respondent ability \(\theta\) is assumed to be latent and is estimated together with item paramters.
4PL IRT model allows for different discriminations of items \(a\), different item difficulties \(b\), nonzero left asymptote, pseudo-guessing \(c\) and also for upper asymptote lower than one, i.e, inattention parameter \(d\). Model parameters are estimated using marginal maximum likelihood (MML) method. Ability \(\theta\) is assumed to follow standard normal distribution.
Estimates of parameters are completed by SX2 item fit statistics (Orlando and Thissen, 2000). SX2 statistics are computed only when no missing data are present.
Item Response Theory (IRT) models are mixed-effect regression models in which respondent ability (theta) is assumed to be latent and is estimated together with item paramters. Model parameters are estimated using marginal maximum likelihood (MML) method, in 1PL, 2PL, 3PL and 4PL IRT models, ability (theta) is assumed to follow standard normal distibution.
IRT models can be compared by several information criteria:
Another approach to compare IRT models can be likelihood ratio chi-squared test. Significance level is set to 0.05.
Row BEST indicates which model has the lowest value of criterion, or is the largest significant model by likelihood ratio test.
library(difNLR)
library(mirt)
# loading data
data(GMAT)
data <- GMAT[, 1:20]
# 1PL IRT model
s <- paste("F = 1-", ncol(data), "\n",
"CONSTRAIN = (1-", ncol(data), ", a1)")
model <- mirt.model(s)
fit1PL <- mirt(data, model = model, itemtype = "2PL")
# 2PL IRT model
fit2PL <- mirt(data, model = 1, itemtype = "2PL")
# 3PL IRT model
fit3PL <- mirt(data, model = 1, itemtype = "3PL")
# 4PL IRT model
fit4PL <- mirt(data, model = 1, itemtype = "4PL")
# comparison
anova(fit1PL, fit2PL)
anova(fit2PL, fit3PL)
anova(fit3PL, fit4PL)
The nominal response model (NRM) was introduced by Bock (1972) as a way to model responses to items with two or more nominal categories. This model is suitable for multiple-choice items with no particular ordering of distractors. It is also generalization of some models for ordinal data, e.g. generalized partial credit model (GPCM) or its restricted versions partial credit model (PCM) and rating scale model (RSM).
library(difNLR)
library(mirt)
library(ShinyItemAnalysis)
# loading data
data("dataMedicalgraded")
data <- dataMedicalgraded[, 1:100]
# model
fit <- mirt(data, model = 1, itemtype = "nominal")
# item characteristic curves
plot(fit, type = "trace", facet_items = F)
# item information curves
plot(fit, type = "infotrace", facet_items = F)
# test information function
plot(fit, type = "infoSE")
# coefficients
coef(fit, simplify = TRUE)
coef(fit, IRTpars = TRUE, simplify = TRUE)
# factor scores vs standardized total scores
fs <- as.vector(fscores(fit))
sts <- as.vector(scale(apply(data, 1, sum)))
plot(fs ~ sts)
The nominal response model (NRM) was introduced by Bock (1972) as a way to model responses to items with two or more nominal categories. This model is suitable for multiple-choice items with no particular ordering of distractors. It is also generalization of some models for ordinal data, e.g. generalized partial credit model (GPCM) or its restricted versions partial credit model (PCM) and rating scale model (RSM).
Dichotomous models are used for modelling items producing a simple binary response (i.e., true/false). Most complex unidimensional dichotomous IRT model described here is 4PL IRT model. Rasch model (Rasch, 1960) assumes discrimination fixed to \(a = 1\) guessing fixed to \(c = 0\) and innatention to \(d = 1\). Similarly, other restricted models (1PL, 2PL and 3PL models) can be obtained by fixing appropriate parameters in 4PL model.
In this section, you can explore behavior of two item characteristic curves \(\mathrm{P}\left(\theta\right)\) and their item information functions \(\mathrm{I}\left(\theta\right)\) in 4PL IRT model.
Select parameters \(a\) (discrimination), \(b\) (difficulty), \(c\) (guessing) and \(d\) (inattention). By constraining \(a = 1\), \(c = 0\), \(d = 1\) you get Rasch model. With option \(c = 0\) and \(d = 1\) you get 2PL model and with option \(d = 1\) 3PL model.
When different curve parameters describe properties of the same item but for different groups of respondents, this phenomenon is called Differential Item Functioning (DIF). See further section for more information.
Select also the value of latent ability \(\theta\) to see the interpretation of the item characteristic curves.
Consider the following 2PL items with parameters
Item 1:
\(a = 2.5, b = -0.5\)
Item 2:
\(a = 1.5, b = 0\)
For these items fill the following exercises with an accuracy of up to 0.05.
Then click on
Submit answers
button.
If you need a hint, click on blue button with question mark.
Consider now 2 items with following parameters
Item 1:
\(a = 1.5, b = 0, c = 0, d = 1\)
Item 2:
\(a = 1.5, b = 0, c = 0.2, d = 1\)
For these items fill the following exercises with an accuracy of up to 0.05.
Then click on
Submit answers
button.
Consider now 2 items with following parameters
Item 1:
\(a = 1.5, b = 0, c = 0, d = 0.9\)
Item 2:
\(a = 1.5, b = 0, c = 0, d = 1\)
For these items fill the following exercises with an accuracy of up to 0.05. Then click on
Submit answers
button.
library(ggplot2)
library(data.table)
# parameters
a1 <- 1; b1 <- 0; c1 <- 0; d1 <- 1
a2 <- 2; b2 <- 0.5; c2 <- 0; d2 <- 1
# latent ability
theta <- seq(-4, 4, 0.01)
# latent ability level
theta0 <- 0
# function for IRT characteristic curve
icc_irt <- function(theta, a, b, c, d){ return(c + (d - c)/(1 + exp(-a*(theta - b)))) }
# calculation of characteristic curves
df <- data.frame(theta,
"icc1" = icc_irt(theta, a1, b1, c1, d1),
"icc2" = icc_irt(theta, a2, b2, c2, d2))
df <- melt(df, id.vars = "theta")
# plot for characteristic curves
ggplot(df, aes(x = theta, y = value, color = variable)) +
geom_line() +
geom_segment(aes(y = icc_irt(theta0, a = a1, b = b1, c = c1, d = d1),
yend = icc_irt(theta0, a = a1, b = b1, c = c1, d = d1),
x = -4, xend = theta0),
color = "gray", linetype = "dashed") +
geom_segment(aes(y = icc_irt(theta0, a = a2, b = b2, c = c2, d = d2),
yend = icc_irt(theta0, a = a2, b = b2, c = c2, d = d2),
x = -4, xend = theta0),
color = "gray", linetype = "dashed") +
geom_segment(aes(y = 0,
yend = max(icc_irt(theta0, a = a1, b = b1, c = c1, d = d1),
icc_irt(theta0, a = a2, b = b2, c = c2, d = d2)),
x = theta0, xend = theta0),
color = "gray", linetype = "dashed") +
xlim(-4, 4) +
xlab("Ability") +
ylab("Probability of correct answer") +
theme_bw() +
ylim(0, 1) +
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
ggtitle("Item characteristic curve")
# function for IRT information function
iic_irt <- function(theta, a, b, c, d){ return(a^2*(d-c)*exp(a*(theta-b))/(1 + exp(a*(theta-b)))^2) }
# calculation of information curves
df <- data.frame(theta,
"iic1" = iic_irt(theta, a1, b1, c1, d1),
"iic2" = iic_irt(theta, a2, b2, c2, d2))
df <- melt(df, id.vars = "theta")
# plot for information curves
ggplot(df, aes(x = theta, y = value, color = variable)) +
geom_line() +
xlim(-4, 4) +
xlab("Ability") +
ylab("Information") +
theme_bw() +
ylim(0, 4) +
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
ggtitle("Item information curve")
Polytomous models are used when partial score is possible, or when items are graded on Likert scale (e.g. from Totally disagree to Totally agree); some polytomous models can also be used when analyzing multiple-choice items. In this section you can explore item response functions of some polytomous models.
Two main classes of polytomous IRT models are considered:
Difference models are defined by setting mathematical form to cumulative probabilities, while category probabilities are calculated as their difference. These models are also sometimes called cumulative logit models as they set linear form to cumulative logits.
As an example, Graded Response Model (GRM; Samejima, 1970) uses 2PL IRT model to describe cumulative probabilities (probabilities to obtain score higher than 1, 2, 3, etc.). Category probabilities are then described as differences of two subsequent cumulative probabilities.
For divide-by-total models response category probabilities are defined as the ratio between category-related functions and their sum.
In Generalized Partial Credit Model (GPCM; Muraki, 1992), probability of the successful transition from one category score to the next category score is modelled by 2PL IRT model, while Partial Credit Model (PCM; Masters, 1982) uses 1PL IRT model to describe this probability. Even more restricted version, the Rating Scale Model (RSM; Andrich, 1978) assumes exactly the same K response categories for each item and threshold parameters which can be split into a response-threshold parameter and an item-specific location parameter. These models are also sometimes called adjacent-category logit models as they set linear form to adjacent logits.
To model distractor properties in multiple-choice items, Nominal Response Model (NRM; Bock, 1972) can be used. NRM is an IRT analogy of multinomial regression model. This model is also generalization of GPCM/PCM/RSM ordinal models. NRM is also sometimes called baseline-category logit model as it sets linear form to log of odds of selecting given category to selecting a baseline category. Baseline can be chosen arbitrary, although usually the correct answer or the first answer is chosen.
Graded response model (GRM; Samejima, 1970) uses 2PL IRT model to describe cumulative probabilities (probabilities to obtain score higher than 1, 2, 3, etc.). Category probabilities are then described as differences of two subsequent cumulative probabilities.
It belongs to class of difference models, which are defined by setting mathematical form to cumulative probabilities, while category probabilities are calculated as their difference. These models are also sometimes called cumulative logit models, as they set linear form to cumulative logits.
Select number of responses and difficulty for cummulative probabilities \(b\) and common discrimination parameter \(a\). Cummulative probability \(P(Y \geq 0)\) is always equal to 1 and it is not displayed, corresponding category probability \(P(Y = 0)\) is displayed with black color.
library(ggplot2)
library(data.table)
# setting parameters
a <- 1
b <- c(-1.5, -1, -0.5, 0)
theta <- seq(-4, 4, 0.01)
# calculating cummulative probabilities
ccirt <- function(theta, a, b){ return(1/(1 + exp(-a*(theta - b)))) }
df1 <- data.frame(sapply(1:length(b), function(i) ccirt(theta, a, b[i])) , theta)
df1 <- melt(df1, id.vars = "theta")
# plotting cummulative probabilities
ggplot(data = df1, aes(x = theta, y = value, col = variable)) +
geom_line() +
xlab("Ability") +
ylab("Cummulative probability") +
xlim(-4, 4) +
ylim(0, 1) +
theme_bw() +
theme(text = element_text(size = 14),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
ggtitle("Cummulative probabilities") +
scale_color_manual("", values = c("red", "yellow", "green", "blue"), labels = paste0("P(Y >= ", 1:4, ")"))
# calculating category probabilities
df2 <- data.frame(1, sapply(1:length(b), function(i) ccirt(theta, a, b[i])))
df2 <- data.frame(sapply(1:length(b), function(i) df2[, i] - df2[, i+1]), df2[, ncol(df2)], theta)
df2 <- melt(df2, id.vars = "theta")
# plotting category probabilities
ggplot(data = df2, aes(x = theta, y = value, col = variable)) +
geom_line() +
xlab("Ability") +
ylab("Category probability") +
xlim(-4, 4) +
ylim(0, 1) +
theme_bw() +
theme(text = element_text(size = 14),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
ggtitle("Category probabilities") +
scale_color_manual("", values = c("black", "red", "yellow", "green", "blue"), labels = paste0("P(Y >= ", 0:4, ")"))
# calculating expected item score
df3 <- data.frame(1, sapply(1:length(b), function(i) ccirt(theta, a, b[i])))
df3 <- data.frame(sapply(1:length(b), function(i) df3[, i] - df3[, i+1]), df3[, ncol(df3)])
df3 <- data.frame(exp = as.matrix(df3) %*% 0:4, theta)
# plotting category probabilities
ggplot(data = df3, aes(x = theta, y = exp)) +
geom_line() +
xlab("Ability") +
ylab("Expected item score") +
xlim(-4, 4) +
ylim(0, 4) +
theme_bw() +
theme(text = element_text(size = 14),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
ggtitle("Expected item score")
In Generalized Partial Credit Model (GPCM; Muraki, 1992), probability of the successful transition from one category score to the next category score is modelled by 2PL IRT model. The response category probabilities are then ratios between category-related functions (cumulative sums of exponentials) and their sum.
Two simpler models can be derived from GPCM by restricting some parameters: Partial Credit Model (PCM; Masters, 1982) uses 1PL IRT model to describe this probability, thus parameters \(\alpha = 1\). Even more restricted version, the Rating Scale Model (RSM; Andrich, 1978) assumes exactly the same K response categories for each item and threshold parameters which can be split into a response-threshold parameter \(\lambda_t\) and an item-specific location parameter \(\delta_i\). These models are also sometimes called adjacent logit models, as they set linear form to adjacent logits.
Select number of responses and their threshold parameters \(\delta\) and common discrimination parameter \(\alpha\). With \(\alpha = 1\) you get PCM. Numerator of \(\pi_0 = P(Y = 0)\) is set to 1 and \(\pi_0\) is displayed with black color.
library(ggplot2)
library(data.table)
# setting parameters
a <- 1
d <- c(-1.5, -1, -0.5, 0)
theta <- seq(-4, 4, 0.01)
# calculating category probabilities
ccgpcm <- function(theta, a, d){ a*(theta - d) }
df <- sapply(1:length(d), function(i) ccgpcm(theta, a, d[i]))
pk <- sapply(1:ncol(df), function(k) apply(as.data.frame(df[, 1:k]), 1, sum))
pk <- cbind(0, pk)
pk <- exp(pk)
denom <- apply(pk, 1, sum)
df <- apply(pk, 2, function(x) x/denom)
df1 <- melt(data.frame(df, theta), id.vars = "theta")
# plotting category probabilities
ggplot(data = df1, aes(x = theta, y = value, col = variable)) +
geom_line() +
xlab("Ability") +
ylab("Category probability") +
xlim(-4, 4) +
ylim(0, 1) +
theme_bw() +
theme(text = element_text(size = 14),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
ggtitle("Category probabilities") +
scale_color_manual("", values = c("black", "red", "yellow", "green", "blue"), labels = paste0("P(Y = ", 0:4, ")"))
# calculating expected item score
df2 <- data.frame(exp = as.matrix(df) %*% 0:4, theta)
# plotting category probabilities
ggplot(data = df2, aes(x = theta, y = exp)) +
geom_line() +
xlab("Ability") +
ylab("Expected item score") +
xlim(-4, 4) +
ylim(0, 4) +
theme_bw() +
theme(text = element_text(size = 14),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
ggtitle("Expected item score")
In Nominal Response Model (NRM; Bock, 1972), probability of selecting given category over baseline category is modelled by 2PL IRT model. This model is also sometimes called baseline-category logit model, as it sets linear form to log of odds of selecting given category to selecting a baseline category. Baseline can be chosen arbitrary, although usually the correct answer or the first answer is chosen. NRM model is generalization of GPCM model by setting item-specific and category-specific intercept and slope parameters.
Select number of distractors and their threshold parameters \(\delta\) and discrimination parameters \(\alpha\). Parameters of \(\pi_0 = P(Y = 0)\) are set to zeros and \(\pi_0\) is displayed with black color.
library(ggplot2)
library(data.table)
# setting parameters
a <- c(2.5, 2, 1, 1.5)
d <- c(-1.5, -1, -0.5, 0)
theta <- seq(-4, 4, 0.01)
# calculating category probabilities
ccnrm <- function(theta, a, d){ exp(d + a*theta) }
df <- sapply(1:length(d), function(i) ccnrm(theta, a[i], d[i]))
df <- data.frame(1, df)
denom <- apply(df, 1, sum)
df <- apply(df, 2, function(x) x/denom)
df1 <- melt(data.frame(df, theta), id.vars = "theta")
# plotting category probabilities
ggplot(data = df1, aes(x = theta, y = value, col = variable)) +
geom_line() +
xlab("Ability") +
ylab("Category probability") +
xlim(-4, 4) +
ylim(0, 1) +
theme_bw() +
theme(text = element_text(size = 14),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
ggtitle("Category probabilities") +
scale_color_manual("", values = c("black", "red", "yellow", "green", "blue"), labels = paste0("P(Y = ", 0:4, ")"))
# calculating expected item score
df2 <- data.frame(exp = as.matrix(df) %*% 0:4, theta)
# plotting expected item score
ggplot(data = df2, aes(x = theta, y = exp)) +
geom_line() +
xlab("Ability") +
ylab("Expected item score") +
xlim(-4, 4) +
ylim(0, 4) +
theme_bw() +
theme(text = element_text(size = 14),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
ggtitle("Expected item score")
Differential item functioning (DIF) occurs when people from different social groups (commonly gender or ethnicity) with the same underlying true ability have a different probability of answering the item correctly. If item functions differently for two groups, it is potentially unfair. In general, two type of DIF can be recognized: if the item has different difficulty for given two groups with the same discrimination, uniform DIF is present (left figure). If the item has different discrimination and possibly also different difficulty for given two groups, non-uniform DIF is present (right figure)
DIF is not about total scores! Two groups may have the same distribution of total scores, yet, some item may function differently for two groups. Also, one of the groups may have signifficantly lower total score, yet, it may happen that there is no DIF item (Martinkova et al., 2017).
For selected cut-score, blue part of histogram shows respondents with total score above the cut-score, grey column shows respondents with total score equal to cut-score and red part of histogram shows respondents below the cut-score.
Test for difference in total scores between reference and focal group is based od Welch two sample
t-test.
Explanation:
Diff. (CI)
- difference in means of total scores with 95% confidence interval,
t-value
- test statistic,
df
- degrees of freedom,
p-value
- if it is lower than 0.05, it means significant difference in total scores.
library(difNLR)
library(ggplot2)
library(moments)
# loading data
data(GMAT)
data <- GMAT[, 1:20]
group <- GMAT[, "group"]
# total score calculation wrt group
score0 <- apply(data, 1, sum)[group == 0]
score1 <- apply(data, 1, sum)[group == 1]
# summary of total score
rbind(c(length(score0), min(score0), max(score0), mean(score0), median(score0), sd(score0), skewness(score0), kurtosis(score0)),
c(length(score1), min(score1), max(score1), mean(score1), median(score1), sd(score1), skewness(score1), kurtosis(score1)))
# colors by cut-score wrt group
cut <- 12 # cut-score
color0 <- c(rep("red", cut - min(score0)), "gray", rep("blue", max(score0) - cut))
color1 <- c(rep("red", cut - min(score1)), "gray", rep("blue", max(score1) - cut))
# histogram for reference group
ggplot(data = data.frame(score0), aes(score0)) +
geom_histogram(binwidth = 1, fill = color0, col = "black") +
xlab("Total score") +
ylab("Number of respondents") +
ggtitle("Reference group") +
theme_app()
# histogram for focal group
ggplot(data = data.frame(score1), aes(score1)) +
geom_histogram(binwidth = 1, fill = color1, col = "black") +
xlab("Total score") +
ylab("Number of respondents") +
ggtitle("Focal group") +
theme_app()
# t-test to compare total scores
t.test(score0, score1)
Delta plot (Angoff & Ford, 1973) compares the proportions of correct answers per item in the two groups. It displays non-linear transformation of these proportions using quantiles of standard normal distributions (so called delta scores) for each item for the two genders in a scatterplot called diagonal plot or delta plot (see Figure). Item is under suspicion of DIF if the delta point considerably departs from the diagonal. The detection threshold is either fixed to value 1.5 or based on bivariate normal approximation (Magis & Facon, 2012).
library(deltaPlotR)
library(difNLR)
data(GMAT)
data <- GMAT[, 1:20]
group <- GMAT[, "group"]
# Delta scores with fixed threshold
deltascores <- deltaPlot(data.frame(data, group), group = "group",
focal.name = 1, thr = 1.5)
deltascores
# Delta plot
diagPlot(deltascores, thr.draw = T)
# Delta scores with normal threshold
deltascores <- deltaPlot(data.frame(data, group), group = "group",
focal.name = 1, thr = "norm", purify = F)
deltascores
# Delta plot
diagPlot(deltascores, thr.draw = T)
Mantel-Haenszel test is DIF detection method based on contingency tables that are calculated for each level of total score (Mantel & Haenszel, 1959).
Here you can select correction method for multiple comparison or item purification.
library(difNLR)
library(difR)
data(GMAT)
data <- GMAT[, 1:20]
group <- GMAT[, "group"]
# Mantel-Haenszel test
fit <- difMH(Data = data, group = group, focal.name = 1,
p.adjust.method = "none", purify = F)
fit
Mantel-Haenszel test is DIF detection method based on contingency tables that are calculated for each level of total score (Mantel & Haenszel, 1959).
library(difNLR)
library(difR)
data(GMAT)
data <- GMAT[, 1:20]
group <- GMAT[, "group"]
# Contingency table for item 1 and score 12
df <- data.frame(data[, 1], group)
colnames(df) <- c("Answer", "Group")
df$Answer <- relevel(factor(df$Answer, labels = c("Incorrect", "Correct")), "Correct")
df$Group <- factor(df$Group, labels = c("Reference Group", "Focal Group"))
score <- apply(data, 1, sum)
df <- df[score == 12, ]
tab <- dcast(data.frame(xtabs(~ Group + Answer, data = df)),
Group ~ Answer,
value.var = "Freq",
margins = T,
fun = sum)
tab
# Mantel-Haenszel estimate of OR
fit <- difMH(Data = data, group = group, focal.name = 1,
p.adjust.method = "none", purify = F)
fit$alphaMH
Logistic regression allows for detection of uniform and non-uniform DIF (Swaminathan & Rogers, 1990) by adding a group specific intercept b2 (uniform DIF) and group specific interaction b3 (non-uniform DIF) into model and by testing for their significance.
Here you can choose what type of DIF to test. You can also select correction method for multiple comparison or item purification.
library(difNLR)
library(difR)
data(GMAT)
data <- GMAT[, 1:20]
group <- GMAT[, "group"]
# Logistic regression DIF detection method
fit <- difLogistic(Data = data, group = group, focal.name = 1,
type = "both",
p.adjust.method = "none",
purify = F)
fit
Logistic regression allows for detection of uniform and non-uniform DIF (Swaminathan & Rogers, 1990) by adding a group specific intercept b2 (uniform DIF) and group specific interaction b3 (non-uniform DIF) into model and by testing for their significance.
Here you can choose what type of DIF to test. You can also select correction method for multiple comparison or item purification.
Points represent proportion of correct answer with respect to total score. Their size is determined by count of respondents who achieved given level of total score with respect to the group membership.
NOTE: Plots and tables are based on DIF logistic procedure without any correction method.
Download figurelibrary(difNLR)
library(difR)
data(GMAT)
data <- GMAT[, 1:20]
group <- GMAT[, "group"]
# Logistic regression DIF detection method
fit <- difLogistic(Data = data, group = group, focal.name = 1,
type = "both",
p.adjust.method = "none", purify = F)
fit
# Plot of characteristic curve for item 1
plotDIFLogistic(data, group,
type = "both",
item = 1,
IRT = F,
p.adjust.method = "none",
purify = F)
# Coefficients
fit$logitPar
Generalized logistic regression models can be seen as proxies of IRT models for DIF detection using standardized total score as estimate of knowledge. They can allow for nonzero lower asymptote - pseudoguessing \(c\) (Drabinova & Martinkova, 2017) or upper asymptote lower than one - inattention \(d\). Similarly to logistic regression, also its extensions provide detection of uniform and non-uniform DIF by letting difficulty parameter \(b\) (uniform) and discrimination parameter \(a\) (non-uniform) differ for groups and by testing for significance difference in their values. Moreover, these extensions allow for testing differences in pseudoguessing and inattention parameters.
With model you can specify what parameters should be kept the same for both groups and what parameters should differ. The notation is similar to IRT models. In 3PL and 4PL models abbreviations cg or dg mean that parameters c or d are the same for both groups. With type you can choose parameters in which difference between groups should be tested.
Displayed equation is based on selected model
Here you can choose what model to use and what type of DIF to test. You can also select correction method for multiple comparison or item purification.
library(difNLR)
# loading data
data(GMAT)
Data <- GMAT[, 1:20]
group <- GMAT[, "group"]
# generalized logistic regression DIF method
# using 3PL model with the same guessing parameter for both groups
fit <- difNLR(Data = Data, group = group, focal.name = 1, model = "3PLcg", type = "both", p.adjust.method = "none")
fit
Generalized logistic regression models can be seen as proxies of IRT models for DIF detection using standardized total score as estimate of knowledge. They can allow for nonzero lower asymptote - pseudoguessing \(c\) (Drabinova & Martinkova, 2017) or upper asymptote lower than one - inattention \(d\). Similarly to logistic regression, also its extensions provide detection of uniform and non-uniform DIF by letting difficulty parameter \(b\) (uniform) and discrimination parameter \(a\) (non-uniform) differ for groups and by testing for significance difference in their values. Moreover, these extensions allow for testing differences in pseudoguessing and inattention parameters.
With model you can specify what parameters should be kept the same for both groups and what parameters should differ. The notation is similar to IRT models. In 3PL and 4PL models abbreviations cg or dg mean that parameters c or d are the same for both groups. With type you can choose parameters in which difference between groups should be tested.
Here you can choose what model to use and what type of DIF to test. You can also select correction method for multiple comparison or item purification.
Points represent proportion of correct answer with respect to standardized total score. Their size is determined by count of respondents who achieved given level of standardized total score with respect to the group membership.
Download figurelibrary(difNLR)
# loading data
data(GMAT)
Data <- GMAT[, 1:20]
group <- GMAT[, "group"]
# generalized logistic regression DIF method
# using 3PL model with the same guessing parameter for both groups
fit <- difNLR(Data = Data, group = group, focal.name = 1, model = "3PLcg", type = "both", p.adjust.method = "none")
# plot of characteristic curve of item 1
plot(fit, item = 1)
# table of estimated coefficients
fit$nlrPAR
Lord test (Lord, 1980) is based on IRT model (1PL, 2PL, or 3PL with the same guessing). It uses the difference between item parameters for the two groups to detect DIF. In statistical terms, Lord statistic is equal to Wald statistic.
Here you can choose model to test DIF. You can also select correction method for multiple comparison or item purification.
library(difNLR)
library(difR)
data(GMAT)
data <- GMAT[, 1:20]
group <- GMAT[, "group"]
# 1PL IRT MODEL
fit1PL <- difLord(Data = data, group = group, focal.name = 1,
model = "1PL",
p.adjust.method = "none", purify = F)
fit1PL
# 2PL IRT MODEL
fit2PL <- difLord(Data = data, group = group, focal.name = 1,
model = "2PL",
p.adjust.method = "none", purify = F)
fit2PL
# 3PL IRT MODEL with the same guessing for groups
guess <- itemParEst(data, model = "3PL")[, 3]
fit3PL <- difLord(Data = data, group = group, focal.name = 1,
model = "3PL", c = guess,
p.adjust.method = "none", purify = F)
fit3PL
Lord test (Lord, 1980) is based on IRT model (1PL, 2PL, or 3PL with the same guessing). It uses the difference between item parameters for the two groups to detect DIF. In statistical terms, Lord statistic is equal to Wald statistic.
Here you can choose model to test DIF. You can also select correction method for multiple comparison or item purification.
NOTE: Plots and tables are based on larger DIF IRT model.
Download figurelibrary(difNLR)
library(difR)
data(GMAT)
data <- GMAT[, 1:20]
group <- GMAT[, "group"]
# 1PL IRT MODEL
fit1PL <- difLord(Data = data, group = group, focal.name = 1,
model = "1PL",
p.adjust.method = "none", purify = F)
fit1PL
# Coefficients for all items
tab_coef1PL <- fit1PL$itemParInit
# Plot of characteristic curve of item 1
plotDIFirt(parameters = tab_coef1PL, item = 1, test = "Lord")
# 2PL IRT MODEL
fit2PL <- difLord(Data = data, group = group, focal.name = 1,
model = "2PL",
p.adjust.method = "none", purify = F)
fit2PL
# Coefficients for all items
tab_coef2PL <- fit2PL$itemParInit
# Plot of characteristic curve of item 1
plotDIFirt(parameters = tab_coef2PL, item = 1, test = "Lord")
# 3PL IRT MODEL with the same guessing for groups
guess <- itemParEst(data, model = "3PL")[, 3]
fit3PL <- difLord(Data = data, group = group, focal.name = 1,
model = "3PL", c = guess,
p.adjust.method = "none", purify = F)
fit3PL
# Coefficients for all items
tab_coef3PL <- fit3PL$itemParInit
# Plot of characteristic curve of item 1
plotDIFirt(parameters = tab_coef3PL, item = 1, test = "Lord")
Raju test (Raju, 1988, 1990) is based on IRT model (1PL, 2PL, or 3PL with the same guessing). It uses the area between the item charateristic curves for the two groups to detect DIF.
Here you can choose model to test DIF. You can also select correction method for multiple comparison or item purification.
library(difNLR)
library(difR)
data(GMAT)
data <- GMAT[, 1:20]
group <- GMAT[, "group"]
# 1PL IRT MODEL
fit1PL <- difRaju(Data = data, group = group, focal.name = 1,
model = "1PL",
p.adjust.method = "none", purify = F)
fit1PL
# 2PL IRT MODEL
fit2PL <- difRaju(Data = data, group = group, focal.name = 1,
model = "2PL",
p.adjust.method = "none", purify = F)
fit2PL
# 3PL IRT MODEL with the same guessing for groups
guess <- itemParEst(data, model = "3PL")[, 3]
fit3PL <- difRaju(Data = data, group = group, focal.name = 1,
model = "3PL", c = guess,
p.adjust.method = "none", purify = F)
fit3PL
Raju test (Raju, 1988, 1990) is based on IRT model (1PL, 2PL, or 3PL with the same guessing). It uses the area between the item charateristic curves for the two groups to detect DIF.
Here you can choose model to test DIF. You can also select correction method for multiple comparison or item purification.
NOTE: Plots and tables are based on larger DIF IRT model.
Download figurelibrary(difNLR)
library(difR)
data(GMAT)
data <- GMAT[, 1:20]
group <- GMAT[, "group"]
# 1PL IRT MODEL
fit1PL <- difRaju(Data = data, group = group, focal.name = 1,
model = "1PL",
p.adjust.method = "none", purify = F)
fit1PL
# Coefficients for all items
tab_coef1PL <- fit1PL$itemParInit
# Plot of characteristic curve of item 1
plotDIFirt(parameters = tab_coef1PL, item = 1, test = "Raju")
# 2PL IRT MODEL
fit2PL <- difRaju(Data = data, group = group, focal.name = 1,
model = "2PL",
p.adjust.method = "none", purify = F)
fit2PL
# Coefficients for all items
tab_coef2PL <- fit2PL$itemParInit
# Plot of characteristic curve of item 1
plotDIFirt(parameters = tab_coef2PL, item = 1, test = "Raju")
# 3PL IRT MODEL with the same guessing for groups
guess <- itemParEst(data, model = "3PL")[, 3]
fit3PL <- difRaju(Data = data, group = group, focal.name = 1,
model = "3PL", c = guess,
p.adjust.method = "none", purify = F)
fit3PL
# Coefficients for all items
tab_coef3PL <- fit3PL$itemParInit
# Plot of characteristic curve of item 1
plotDIFirt(parameters = tab_coef3PL, item = 1, test = "Raju")
The SIBTEST method (Shealy and Stout, 1993) allows for detection of uniform DIF without requiring an item response model approach. Its modified version, the Crossing-SIBTEST (Chalmers, 2018; Li and Stout, 1996), focuses on detection of non-uniform DIF.
Here you can choose type of DIF to be tested. With uniform DIF, SIBTEST is applied, while with non-uniform DIF, the Crossing-SIBTEST method is used instead. You can also select correction method for multiple comparison or item purification.
library(difNLR)
library(difR)
# loading data
data(GMAT)
data <- GMAT[, 1:20]
group <- GMAT[, "group"]
# SIBTEST (uniform DIF)
fit <- difMH(Data = data, group = group, focal.name = 1, type = "udif", p.adjust.method = "none", purify = F)
fit
# Crossing-SIBTEST (non-uniform DIF)
fit <- difMH(Data = data, group = group, focal.name = 1, type = "nudif", p.adjust.method = "none", purify = F)
fit
Differential Distractor Functioning (DDF) occurs when people from different groups but with the same knowledge have different probability of selecting at least one distractor choice. DDF is here examined by Multinomial Log-linear Regression model with Z-score and group membership as covariates.
For K possible test choices is the probability of the correct answer for person i with standardized total score Z and group membership G in item j given by the following equation:
$$\mathrm{P}(Y_{ij} = K|Z_i, G_i, b_{jl0}, b_{jl1}, b_{jl2}, b_{jl3}, l = 1, \dots, K-1) = \frac{1}{1 + \sum_l e^{\left( b_{il0} + b_{il1} Z + b_{il2} G + b_{il3} Z:G\right)}}$$The probability of choosing distractor k is then given by:
$$\mathrm{P}(Y_{ij} = k|Z_i, G_i, b_{jl0}, b_{jl1}, b_{jl2}, b_{jl3}, l = 1, \dots, K-1) = \frac{e^{\left( b_{jk0} + b_{jk1} Z_i + b_{jk2} G_i + b_{jk3} Z_i:G_i\right)}} {1 + \sum_l e^{\left( b_{jl0} + b_{jl1} Z_i + b_{jl2} G_i + b_{jl3} Z_i:G_i\right)}}$$Here you can choose what type of DIF to test. You can also select correction method for multiple comparison or item purification.
library(difNLR)
data(GMATtest, GMATkey)
Data <- GMATtest[, 1:20]
group <- GMATtest[, "group"]
key <- GMATkey
# DDF with difNLR package
fit <- ddfMLR(Data, group, focal.name = 1, key, type = "both",
p.adjust.method = "none")
fit
Differential Distractor Functioning (DDF) occurs when people from different groups but with the same knowledge have different probability of selecting at least one distractor choice. DDF is here examined by Multinomial Log-linear Regression model with Z-score and group membership as covariates.
Here you can choose what type of DIF to test. You can also select correction method for multiple comparison or item purification.
Points represent proportion of selected answer with respect to standardized total score. Their size is determined by count of respondents who achieved given level of standardized total score and who selected given option with respect to the group membership.
Download figurelibrary(difNLR)
data(GMATtest, GMATkey)
Data <- GMATtest[, 1:20]
group <- GMATtest[, "group"]
key <- GMATkey
# DDF with difNLR package
fit <- ddfMLR(Data, group, focal.name = 1, key, type = "both",
p.adjust.method = "none")
# Estimated coefficients of item 1
fit$mlrPAR[[1]]
ShinyItemAnalysis
offers an option to download a report in HTML or PDF format. PDF report
creation requires latest version of
MiKTeX
(or other TeX distribution). If you don't have the latest installation, please, use the HTML report.
There is an option to use customized settings. When checking the Customize settings local settings will be offered and used for each selected section of the report. Otherwise, the settings will be taken from sections of the application. You may also include your name into the report, as well as the name of analyzed dataset.
Reports by default contain summary of total scores, table of standard scores, item analysis, distractor plots for each item and multinomial regression plots for each item. Other analyses can be selected below.
Validity
Difficulty/discrimination plot
Distractors plots
DIF method selection
Delta plot settings
Logistic regression settings
Multinomial regression settings
Recommendation: Report generation can be faster and more reliable when you first check sections of intended contents. For example, if you wish to include a 3PL IRT model, you can first visit IRT models section and 3PL subsection.
Welcome to ShinyItemAnalysis!
ShinyItemAnalysis is an interactive online application, built on R and shiny, for psychometric analysis of educational and other psychological tests and their items. You can simply start using the application by choosing toy dataset (or upload your own one) in section Data and run analysis including:
All graphical outputs and selected tables can be downloaded via download button. Moreover, you can automatically generate HTML or PDF report in Reports section. All offered analyses are complemented by selected R code which is ready to be copy-pasted into your R console, hence a similar analysis can be run and modified in R.
Application can be downloaded as an R package from
CRAN.
It is also available online at
Czech Academy of Sciences
and
shinyapps.io
.
Visit our web page about ShinyItemAnalysis to learn more!
If you discover a problem with this application please contact the project maintainer at martinkova(at)cs.cas.cz or use GitHub. We also encourage you to provide your feedback using Google form.
This program is free software and you can redistribute it and or modify it under the terms of the GNU GPL 3 as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability of fitness for a particular purpose.
To cite ShinyItemAnalysis in publications, please use:
In Czech written papers you can also use
Project was supported by Czech Science Foundation grant GJ15-15856Y 'Estimation of psychometric measures as part of admission test development' and by Charles University grant PRIMUS/17/HUM/11 'Center for Educational Measurement and Psychometrics (CEMP)'.
corrplot
Wei, T. & Simko, V. (2017).
R package `corrplot`: Visualization of a Correlation Matrix.
R package version 0.84.
See online.
cowplot
Claus O. Wilke (2018).
cowplot: Streamlined Plot Theme and Plot Annotations for "ggplot2".
R package version 0.9.3.
See online.
CTT
Willse, J. & Willse, T. (2018).
CTT: Classical Test Theory Functions.
R package version 2.3.2.
See online.
data.table
Dowle, M. & Srinivasan, A. (2018).
data.table: Extension of `data.frame`.
R package version 1.11.4.
See online.
deltaPlotR
Magis, D. & Facon, B. (2014).
deltaPlotR: An R Package for Differential Item Functioning Analysis with Angoff`s Delta Plot.
Journal of Statistical Software, Code Snippets, 59(1), 1--19.
See online.
difNLR
Drabinova, A., Martinkova, P. & Zvara, K. (2018).
difNLR: DIF and DDF Detection by Non-Linear Regression Models.
R package version 1.2.2.
See online.
difR
Magis, D., Beland, S., Tuerlinckx, F. & De Boeck, P. (2010).
A general framework and an R package for the detection of dichotomous differential item functioning.
Behavior Research Methods, 42847--862.
DT
Xie, Y. (2018).
DT: A Wrapper of the JavaScript Library `DataTables`.
R package version 0.4.
See online.
ggdendro
Andrie de Vries & Brian D. Ripley (2018).
ggdendro: Create Dendrograms and Tree Diagrams Using "ggplot2".
R package version 0.1-20.
See online.
ggplot2
Wickham, H. (2016).
ggplot2: Elegant Graphics for Data Analysis.
See online.
gridExtra
Auguie, B. (2017).
gridExtra: Miscellaneous Functions for `Grid` Graphics.
R package version 2.3.
See online.
knitr
Xie, Y. (2018).
knitr: A General-Purpose Package for Dynamic Report Generation in R.
R package version 1.20.
See online.
lattice
Sarkar, D. (2008).
Lattice: Multivariate Data Visualization with R.
See online.
latticeExtra
Sarkar, D. & Andrews, F. (2016).
latticeExtra: Extra Graphical Utilities Based on Lattice.
R package version 0.6-28.
See online.
ltm
Rizopoulos, D. (2006).
ltm: An R package for Latent Variable Modelling and Item Response Theory Analyses.
Journal of Statistical Software, 17(5), 1--25.
See online.
MASS
Venables, C. & Ripley, C. (2002).
Modern Applied Statistics with S.
See online.
mirt
Chalmers, R. & Chalmers, P. (2012).
mirt: A Multidimensional Item Response Theory Package for the R Environment.
Journal of Statistical Software, 48(6), 1--29.
moments
Komsta, L. & Novomestky, F. (2015).
moments: Moments, cumulants, skewness, kurtosis and related tests.
R package version 0.14.
See online.
msm
Jackson, C. & Jackson, H. (2011).
Multi-State Models for Panel Data: The msm Package for R.
Journal of Statistical Software, 38(8), 1--29.
See online.
multilevel
Bliese, P. (2016).
multilevel: Multilevel Functions.
R package version 2.6.
See online.
nlme
Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D. & NULL, R. (2018).
nlme: Linear and Nonlinear Mixed Effects Models.
R package version 3.1-137.
See online.
nnet
Venables, C. & Ripley, C. (2002).
Modern Applied Statistics with S.
See online.
plotly
Sievert, C., Parmer, C., Hocking, T., Chamberlain, S., Ram, K., Corvellec, M. & Despouy, P. (2017).
plotly: Create Interactive Web Graphics via `plotly.js`.
R package version 4.7.1.
See online.
polycor
Fox, J. (2016).
polycor: Polychoric and Polyserial Correlations.
R package version 0.7-9.
See online.
psych
Revelle, W. (2018).
psych: Procedures for Psychological, Psychometric, and Personality Research.
R package version 1.8.4.
See online.
psychometric
Fletcher, T. & Fletcher, D. (2010).
psychometric: Applied Psychometric Theory.
R package version 2.2.
See online.
RColorBrewer
Neuwirth, E. (2014).
RColorBrewer: ColorBrewer Palettes.
R package version 1.1-2.
See online.
reshape2
Wickham, H. (2007).
Reshaping Data with the reshape Package.
Journal of Statistical Software, 21(12), 1--20.
See online.
rmarkdown
Allaire, J., Xie, Y., McPherson, J., Luraschi, J., Ushey, K., Atkins, A., Wickham, H., Cheng, J. & Chang, W. (2018).
rmarkdown: Dynamic Documents for R.
R package version 1.10.
See online.
shiny
Chang, W., Cheng, J., Allaire, J., Xie, Y. & McPherson, J. (2018).
shiny: Web Application Framework for R.
R package version 1.1.0.
See online.
shinyBS
Bailey, E. (2015).
shinyBS: Twitter Bootstrap Components for Shiny.
R package version 0.61.
See online.
ShinyItemAnalysis
Martinkova, P., & Drabinova, A. (2018).
ShinyItemAnalysis for teaching psychometrics and to enforce routine analysis of educational tests.
The R Journal, 10(2), 503-515.
See online.
shinyjs
Attali, D. (2018).
shinyjs: Easily Improve the User Experience of Your Shiny Apps in Seconds.
R package version 1.0.
See online.
stringr
Wickham, H. (2018).
stringr: Simple, Consistent Wrappers for Common String Operations.
R package version 1.3.1.
See online.
xtable
Dahl, D. & Dahl, B. (2016).
xtable: Export Tables to LaTeX or HTML.
R package version 1.8-2.
See online.
Set the number of cycles for IRT 1PL, 2PL, 3PL and 4PL models.
Here you can change setting for download of figures.