Hey everyone!

I have been working on updating my Advanced Statistics course to be completely online due to course changes and demands. We now teach psychology undergraduates + graduates, nursing anesthesia doctoral students, and athletic training masters students. What a mix! The course is a choose your own adventure course where students can choose R or Excel/JASP as their analysis program, and we have designed tracks of analyses to complement each student’s educational goals.

Last year, we did a blended flipped classroom, wherein students watched lectures on their own, and then came to help desk hours to ask questions and check in. This semester, we are trying a completely online design – still with many help hours for the students who are struggling. The good news is that the materials are all updated (minus some small JASP and MOTE update notes I need to make) and free for anyone to use!

In prep for an invited talk on mediation and moderation I am giving in April, you will see some videos coming soon on how to analyze each design in SPSS, JASP/Jamovi, Excel, and R. I know I have several of these already, but I am designing these specifically for my workshop. Keep an eye on the youtube page for those to come your way.

Keep trucking in the new year!



I have so much stuff backlogged to blog about – especially that we are working on fully integrating to OSF and putting up preprints of the cool work we are doing! But this blog post is reserved for HOW EXCITED I AM to announce that MOTE is ready to go to import into R. Run this code in your R:

install.packages(“devtools”) ##only needed if you do not have it yet


Remember that “” sometimes does not copy correctly into R. Go nuts! Ask questions! Give feedback! One thing I did not talk about in the video is a limitation of V in chi-square. Due to the distribution of chi-square, V confidence intervals are only useful on smaller r x c combinations (like 2X2, 3×3). After you hit about 4 rows/columns, the distribution flattens out, and the calculated confidence interval is not around the V value.  For example, a X2 of 14 with sample size 100, with four rows and columns gives you:

v.chi.sq(x2 = 14, n = 100,r = 4, c = 4, a = .05)
[1] 0.6480741

[1] 0.1732051

[1] 0.3241347

[1] 100

[1] 9

[1] 14

[1] 0.1223252

Warning message:
The size of the effect combined with the degrees of freedom is too small to determine a lower confidence limit for the ‘alpha.lower’ (or the (1/2)(1-‘conf.level’) symmetric) value specified (set to zero).

As you can see, this is a limitation of confidence intervals on chi-square. Also, I found more typos :|.

Go check out github:


Go check out the video on how to install and the history of MOTE:

A new paper is now in press at the Psychonomic Bulletin and Review, entitled A Meta Analysis of the Survival Processing Advantage in Memory. This paper explores several different meta-analytic techniques and bias-correcting tools on the topic of survival processing. An abstract is posted below, and check out the unformatted manuscript online at https://osf.io/6sd8e/.

“The survival processing advantage occurs when processing words for their survival value improves later performance on a memory test. Due to the interest in this topic, we conducted a meta-analysis to review literature regarding the survival processing advantage to estimate a bias-corrected effect size. Traditional meta-analytic methods were used, as well as the Test of Excessive Success, p-curve, p-uniform, trim and fill, PET-PEESE, and selection models to re-evaluate effect sizes while controlling for forms of small-study effects. Average effect sizes for survival processing ranged between ηp2 = .06 and .09 for between-subjects experiments, and between .15 and .18 for within-subjects experiments after correcting for potential bias and selective reporting. Overall, researchers can expect to find medium to large survival processing effects, with selective reporting and bias correcting techniques typically estimating lower effects than traditional meta-analytic techniques.”

We recently submitted a paper to the Journal of Psychological Inquiry that focuses on the utilization of undergraduate learning assistants (ULAs) in Introductory Psychology classes at Missouri State University. The research has identified many problems for students associated with large class sizes. These large classes unfortunately limit opportunities for interaction among students and faculty. Missouri State University has implemented a program that utilizes ULAs to help increase interactions between course staff and students. Additionally, this course has reaped additional benefits that are discussed. In this manuscript, we review different ways in which large courses hinder student success and discuss different ways to implement undergraduate assistants. Additionally, we examine data reported by prior studies examining the effectiveness of ULAs. The finalized manuscript will soon be uploaded to Open Science Framework at the following link:




Hey all!

I wanted to write a post about the permutation test video I uploaded to YouTube. I have linked the video and put up the materials on Advanced Statistics page.

I mainly wanted to cover that advantages to permutation tests:

  • You are not relying on some magical population. I hope I expressed this idea well in the video. The more I do research, the more I realize that populations are a thing of magic that just doesn’t exist – especially because, short of a lot of money, how are we supposed to randomly sample from that population anyway?
  • Those pesky assumptions! I am a big proponent of checking your assumptions – which is why all my videos have information about data screening in them. However, I am also guilty of being like “oh well shrug, there goes some power because what else am I supposed to do?”. Or even better … what do you do when all the reviewers only know ANOVA, and you do want to use something special? It’s a messy system we have going here.
  • They have a certain elegance to them … I test my data, and it turns out to be X statistic number. If I randomize that data, how many times do I get X or greater? How simple is that idea?

The hidden side of permutation tests is that they still rely on some form of probability, and potentially, the same faulty logic that we use now for null hypothesis significance testing. Additionally, I can see someone running the test to fish – if something is close, you could run permutation until it comes out your way.

I do know that I said something a bit wrong at the end of the video around 30 minutes in … you can’t really calculate F for the permutation test, because there are lots of F values (that’s the point). I would suggest reporting the p values and potentially calculating F for the original test by doing MS / Residuals but making it very clear the p value is for a permutation test. I also highly recommend adding eta or eta squared for effect size using the SS Variable and SS Residual information provided in the table. If you compare the aovp() output to regular ANOVA, you will find it is approximately the same for the SS and MS, but p changes based on the randomization results.


Pase et al. (2017) recently published an article in the journal Stroke entitled “Sugar- and Artificially Sweetened Beverages and the Risks of Incident Stroke and Dementia”. This publication was quickly picked up by the mainstream media, and it was off to the races from there. News article titles included:

Daily dose of diet soda tied to triple risk of deadly stroke

Diet sodas may be tied to stroke, dementia risk

Diet sodas tied to dementia and stroke

Diet Sodas May raise risk of dementia and stroke, study finds

Diet soda can increase risk of dementia and stroke, study finds

With headlines at these at the fingertips of millions of people, it is important to retain a healthy dose of scientific skepticism, especially with suggestions like these:

“if you’re partial to a can of Pepsi Max at lunch, or enjoy a splash of Coke Zero with your favorite rum – you might want to put that drink back on ice”

The driving point from this article was that drinking artificially sweetened drinks led to a 3x increase in the risk of incidental stroke and dementia. Now to be fair, while some news outlets may have overstated some of the results from Pase et al., others actually included fair points regarding the article, including that this connection was only found between artificially sweetened beverages, and that no link was found between other sugary beverages (i.e. sugar-sweetened sodas, fruit juice). Some news articles rightly pointed out the classic “correlation does not equal causation” remark. The lead author of the paper even pointed out in an interview that “In our study, 3% of the people had a new stroke and 5% developed dementia, so we’re still talking about a small number of people developing either stroke or dementia”. With 2,888 participants analyzed for incident stroke, and 1,484 observed for new-onset dementia, that translates into roughly 87 people who suffered a stroke, and roughly 74 people who developed new-onset dementia.

Pase et al. (2017) did adjust for age, sex, caloric intake, education, diet, exercise, and smoking, among other things. Interestingly however, they did not control for multiple comparisons, which is the bigger point I would like to raise in this post. Whenever a researcher runs multiple tests using the same dataset (for instance when a treatment has 3 or more levels), or when running extra analyses on a subset of a dataset, or even when running extra tests on variables that weren’t previously considered, this all increases the risk of Type I error. A.K.A. a “false positive”, Type I error occurs when you find an effect, when in the population no effect actually exists. Running multiple tests yields more chances that an effect will be found, thus increasing the risk of running into Type I error. An easy solution would be to adjust the alpha criterion (usually .05) by the number of tests being run (i.e. the Bonferroni correction, a very popular Type I error correction because it is easy to calculate manually). For instance, if you are running 10 t-tests on the same dataset, one could easily adjust alpha to .005 (.05/10). So, controlling for multiple comparisons, an effect would be deemed significant if the p-value fell below .005, not the typical .05.

How much does this actually matter? Would adjusting for multiple comparisons yield any meaningful changes in regards to statistical interpretation? To investigate this, I simulated 100 datasets, each with a sample size of 100, assuming a medium effect size. Data were generated as Likert type data ranging from 1 to 7. One factor with five levels was considered (with means of 2.0, 2.3, 2.6, 2.9, 3.2). Post-hoc t-tests were then analyzed for all pairwise comparisons both with no p-value adjustment as well as using a Bonferroni correction. The number of significant p-values were then logged in both cases for each dataset. The R code used for this simulation can be found at the end of this post.

Simulation results revealed, probably not to anyone’s surprise, that yes, there is a difference in the number of significant p-values found, depending on if one controls for multiple comparisons. Out of 10 possible comparisons, on average, post-hoc t-tests revealed more significant p-values (M = 5.60, SD = 1.33) when you don’t control for multiple comparisons compared to when you do (M = 3.64, SD = 1.38), t(99) = 17.25, p < .001, davg = 1.45, 95% CI [1.16, 1.73].

Turning back to Pase et al. (2017), what effect would this have had on their conclusions, considering they did not control for multiple comparisons? Below is a snapshot of beverage intake and the risk of stroke from Pase et al.

Apologies if the table is a bit blurry (might be better to look up the article directly), but the top 2/3 of this table shows neither total sugary beverages nor sugar-sweetened soft drinks had any significant effect on the risk of stroke, as the p-values are above what we assume is their alpha-criterion, p < .05. The bottom third of the table shows artificially sweetened soft drinks. Looking at just the results from model 3, which controlled for the most potentially confounding factors, we see that 6 out of the 8 p-values are significant. However, by using a simple adjustment of .05/8 tests, our new alpha criterion is .00625. Using this criterion, none of the p-values reported would fall below the significance criterion.

Considering dementia, total sugary beverages and sugar-sweetened soft drinks both remained non-significant. However, using Model 3 (the authors most conservative model), none of the p-values were significant for artificially sweetened drink intake, even before controlling for multiple comparisons.  If we assume that these regression models include control for other variables (i.e. lessened df for including many predictors), we could reduce the number of comparisons down to 4 (recent intake/cumulative intake by stroke type), and then the corrected alpha would be .05/4 = .0125. Given the precision of the table is only two decimals, it is difficult to tell if the .01 values would still be considered significant. By not employing this simple adjustment, the authors increased their risk of Type I error, and as a result the conclusions from this paper drastically changed, from finding significant effects to finding none. By making sure we control multiple comparisons, we can avoid problems, such as finding false positives, and make for better, more robust science.

Given the large sample size and the large number of models employed here (24 regressions!), we must be careful in our interpretation – especially given we are predicting very small categories, which is always a difficult science. The use of an effect size in the table is encouraging, especially with confidence intervals. These confidence intervals indicate an even more telling story – many of them include a 1:1 ratio or very close (i.e. you have a 50-50 or chance shot of developing a stroke) and are quite wide, indicating we don’t quite have a good picture of the relationship between drinks and stroke just yet.

R Syntax

#set up
Means = c(2.0, 2.3, 2.6, 2.9, 3.2)
N = 100
round = 0
sig_data = data.table(no = 1:N,
yes = 1:N)
for(i in 1:N){ #start loop
#create data
sigma = matrix(c(3,0,0,0,0,
), nrow = 5, ncol = 5)
dataset = as.data.table(rmvnorm(N, Means, sigma))
dataset = round(dataset, digits = 0)
dataset[ dataset < 1 ] = 1
dataset[ dataset > 7 ] = 7
dataset$partno = as.factor(1:nrow(dataset))
longdataset = melt(dataset,
id = “partno”)
#pairwise comparisons
round = round+1
noadjust = pairwise.t.test(longdataset$value,
paired = T,
p.adjust.method = “none”)
x = unname(c(noadjust$p.value[c(1:4)],
noadjust$p.value[4, 4]))
x = x < .05
sig_data$no[round] = sum(x == TRUE)
yesadjust = pairwise.t.test(longdataset$value,
paired = T,
p.adjust.method = “bonferroni”)
y = unname(c(yesadjust$p.value[c(1:4)],
yesadjust$p.value[4, 4]))
y = y < .05
sig_data$yes[round] = sum(y == TRUE)
} # end loop
## Differences in number of significant p-values?
t.test(sig_data$no, sig_data$yes, paired=TRUE)
m1 = mean(sig_data$no)
sd1 = sd(sig_data$no)
n = length(sig_data$no)
m2 = mean(sig_data$yes)
sd2 = sd(sig_data$yes)
d.dept.avg(m1 = m1, m2 = m2, sd1 = sd1, sd2 = sd2, n = n, a = .05, k = 2)
sig_data$partno = 1:nrow(sig_data)
noout = melt(sig_data, id = “partno”)
theme = theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line.x = element_line(colour = “black”),
axis.line.y = element_line(colour = “black”),
legend.key = element_rect(fill = “white”),
text = element_text(size = 15))
ggplot(noout, aes(variable, value)) +
stat_summary(fun.y = mean,
geom = “point”,  size = 2, fill = “gray”, color = “gray”) +
stat_summary(fun.data = mean_cl_normal,
geom = “errorbar”, position = position_dodge(width = 0.90),
width = 0.2) +
theme + xlab(“Controlling for Multiple Comparisons”) + ylab(“Number of Significant p-values”) +
scale_x_discrete(labels = c(“None”,”Bonferroni”))


All blogs have to start somewhere, so wanted to give a quick introduction. I am an Associate Professor of Quantitative Psychology and Missouri State University. I teach a lot of stuff mostly related to statistics: baby stats (undergraduate basics), advanced stats (undergraduate/graduate mix of multivariate methods), graduate stats (graduate basics), and structural equation modeling. I run the Statistics and Research Design certificate program at MSU, along with working closely with our Experimental Psychology Track in the master’s program.

My research focuses on computational linguistics and applied statistics, which you can read a whole lot more about on my website. I would describe my language work as being interested in the types and way we use psycholinguistic variables, and how these variables relate to judgments and memory. Statistically speaking, I usually help others by exploring how they might analyze their data, but more recently am interested in the way we do business in statistics (i.e. understanding the way our statistics work and function under different scenarios) and how to teach statistics.

Here on the blog, we will be posting all sorts of information including links to new help videos, discussions about statistics in the real world, promoting our research papers, and any random thoughts that might cross the brain. My goal for this information is to not only promote what we are doing as scientists, but also to be able to teach anyone interested in how we did our work and promote the open science framework.

I also have purple hair, much to the amusement of my students and small children.