StatsTools
Heyo! Frederik, the author of papaja, requested that we update him with papers written with his package. I was like, oh man, like the whole lab?! So, I decided that I could probably make it easy by making a table here. Obviously, this table is current at the moment, as I hope many of the ones under review will get accepted, and I have several others that we will start writing soon. I only listed ones here you could find the actual .Rmd if you went to the links provided. Github is linked to each of these OSF pages as well.

OSF Title OSF Link Pre-Print Link Status
Methods to Detect Low Quality Data and Its Implication for Psychological Research https://osf.io/x6t8a/ https://osf.io/cv2bn/ 10.3758/s13428-018-1035-6
Does the Delivery Matter? Examining Randomization at the Item Level https://osf.io/gvx7s/ https://osf.io/p93df/ accepted pending small revisions
Beyond p-values: Utilizing Multiple Estimates to Evaluate Evidence https://osf.io/u9hf4/ https://osf.io/9hp7y/ revision to resubmit
Perceived Grading and Student Evaluation of Instruction https://osf.io/jdpfs/ https://osf.io/7x4uf/ revision to resubmit
Investigating the Interaction between Associative, Semantic, and Thematic Database Norms for Memory Judgments and Retrieval https://osf.io/y8h7v/ https://osf.io/fcesn/ under review
Bulletproof Bias? Considering the Type of Data in Common Proportion of Variance Effect Sizes https://osf.io/urd8q/ https://osf.io/cs4vy/ under review
The LAB: Linguistic Annotated Bibliography https://osf.io/9bcws/ https://osf.io/h3bwx/ under review
English Semantic Feature Production Norms: An Extended Database of 4,436 Concepts https://osf.io/cjyzw/ https://osf.io/gxbf4/ under review
A Meta-Analysis of Expressive Writing on Positive Psychology Variables and Traumatic Stress https://osf.io/4mjqt/ https://osf.io/u98cw/ under review
The N400’s 3 As: Association, Automaticity, Attenuation (and Some Semantics Too) https://osf.io/h5sd6/ https://osf.io/6w2se/ under review
Focus on the Target: The Role of Attentional Focus in Decisions about War https://osf.io/r8qp2/ https://osf.io/9fgu8 under review
An Extension of the QWERTY Effect: Not Just the Right Hand, Expertise and Typability Predict Valence Ratings of Words https://osf.io/zs2qj/ https://osf.io/k7dx5/ under review
Modeling Memory: Exploring the Relationship Between Word Overlap and Single Word Norms when Predicting Relatedness Judgments and Retrieval https://osf.io/j7qtc/ https://osf.io/qekad/ writing
Outrageous Observations: The Redheaded Stepchild of Data Analysis https://osf.io/52mqw/ writing
Moral Foundations of U.S. Political News Organizations https://osf.io/5kpj7/ writing
A Validation of the Moral Foundations Questionnaire and Dictionary https://osf.io/kt9yf/ writing

You can also check out the YouTube for the couple of videos I’ve made on papaja and markdown.

Hi everyone! I don’t really feel like working too hard today, so I decided to write a blog post about how my student Will and I used rvest to mine articles from several different news sources for a project. All the scripts and current ongoings of this project can be found on our OSF page – this project is also connected to the GitHub folder with the files.

First, we picked four web sources to scrape – The New York Times, NPR, Fox News, and Breitbart because of their known political associations, and specifically, we focused on their political sections. To get started, you need the rvest library. After you load the library, you can set your url that you want to pull articles from.

library(rvest)
#Specifying the url for desired website to be scrapped
url <- 'https://www.nytimes.com/section/politics'

Now, this url is just where we expect to find a list of links to open for each individual article that was written by the Times. In many rvest tutorials, they focus on pulling only the information from one page – in this blog, I am showing you how to use loops to pull a bunch of separate pages/posts – this tutorial would also work well for pulling from blog type pages.

Next, we read in the main webpage:

#Reading the HTML code from the website - headlines
webpage <- read_html(url)
headline_data <- html_nodes(webpage,'.story-link a, .story-body a')

> headline_data
{xml_nodeset (48)}
 [1] <a href="https://www.n ...
 [2] <a href="https://www.n ...
 [3] <a href="https://www.n ...

Specifically, read_html pulled in the entire webpage, and the html_nodes function helped us find what we were looking for. In this part, we used the Selector Gadget extension to find the right parts we were looking for. If you know a bit of CSS, you can view page source on your target page, and then find the class/id properties you are searching for. For the non-web people, essentially, this tool allows you to find the specific parts of a website you want to extract. In our case, we were looking for the story headlines and their individual page links a for a href, which is code for links on the web.

From there, we extracted the attributes of the story links, which created a big list of the headlines and other attributes about them. I really only wanted the links to the individual stories though – not all the information about them. html_attrs created mini-lists of all the attributes for each part of the page we had scraped.

attr_data <- html_attrs(headline_data) 

> attr_data
[[1]]
                                                                                     href 
"https://www.nytimes.com/2018/05/07/us/politics/don-blankenship-trump-west-virginia.html" 
                                                                                data-rref 
                                                                                       ""

To get only the links, we tried this:

urlslist <- unlist(attr_data)
urlslist <- urlslist[grep("http", urlslist)]
urlslist <- unique(urlslist)
urlslist

> urlslist
 [1] "https://www.nytimes.com/2018/05/07/us/politics/don-blankenship-trump-west-virginia.html"                               
 [2] "https://www.nytimes.com/2018/05/06/us/politics/giuliani-says-trump-would-not-have-to-comply-with-mueller-subpoena.html"

unlist took out the list of lists and created the attribute data with only one giant vector. Then I used the grep function to find the urls. Therefore, grep("http", urslists) returns the vector number of each item with http in it. I wanted the actual urls, not just the item numbers, so I stuck that inside urslist[...]. The unique function was necessary, as links often repeated, and we really only needed them once.

A warning: websites don’t always use absolute links. Sometimes they use references to folders or relative links. We found this with two of our sites, and solved that problem in a couple of ways. The solution will depend on how exactly the website references their other pages.

urlslist3 <- urlslist3[grep("http|.html", urlslist3)]
##fix the ones without the leading foxnews.com
urlslist3F <- paste("http://www.foxnews.com", urlslist3[grep("^http", urlslist3, invert = T)], sep = "")
urlslist3N <- urlslist3[grep("^http", urlslist3)]
urlslist3 <- c(urlslist3N, urlslist3F)
urlslist3 <- unique(urlslist3)

On Fox, we could find the urls in our attributes with http OR (that’s the pipe |) .html. On Breitbart, we had to use the folder name by doing urlslist4 <- urlslist4[grep("http|/big-government", urlslist4)]. Then we created the absolute link by sticking the homepage on the front when necessary with the paste function. The urlslist3N here found all the ones with the http at the front ^ that we didn’t have to fix. Then we combined the fixed and non-fixed ones and found only the unique set.

From there, we started a blank data frame for storing the final data. Then the real magic occurs.

##start a data frame
NYtimesDF <- matrix(NA, nrow = length(urlslist), ncol = 3)
colnames(NYtimesDF) <- c("Source", "Url", "Text")
NYtimesDF <- as.data.frame(NYtimesDF)

##for loops
for (i in 1:length(urlslist)){
  
  ##read in the URL
  webpage <- read_html(urlslist[i])
  
  ##pull the specific nodes
  headline_data <- html_nodes(webpage,'.story-content') 
  
  ##pull the text
  text_data <- html_text(headline_data)
  
  ##save the data
  NYtimesDF$Source[i] <- "NY Times"
  NYtimesDF$Url[i] <- urlslist[i]
  NYtimesDF$Text[i] <- paste(text_data, collapse = "")
    } ##end for loop

For a good loop tutorial, see here. What this code does is loop over the url list you created at the start. For each separate post page it:

1) pulls in the entire page by reading that one url,

2) pulls out just the story (again figured out with selector gadget how to just get the words instead of headlines this time),

3) uses html_text to get the text in our text section,

4) saves the data for further use. Notice we used paste with the collapse argument to make sure it did not return a list but rather one giant cell of text.

We ran this for DAYS (about twice a day for a month). Websites often use things like “see more” or “older articles” to collapse the site – or in the case of Fox (I think), when you scroll paste the current information, more is automatically added (like Facebook). This process saves loading time for the user. We couldn’t really force that action to happen from this script, so we simply ran it multiple days to get newer data. The use of unique really allowed us to make sure we weren’t getting duplicate data – and if I had to write this again, I would make sure we also pulled in the old data and filtered out more at the beginning rather than the end (but either way works). If you check out our whole script, you can see some other things we did to make this work more efficiently, such as adding all the sub-pages that Fox uses to post politics articles, as they don’t all make it to the homepage (or it’s going by so fast we weren’t getting them even at twice a day).

At the moment, we are still analyzing the data, but the analysis script in our github folder can give you a preview of the next blog post to come about working with text data. Enjoy!

One more announcement! We just had a new publication accepted:

“Textisms”: The Comfort of the Recipient: This paper was an *undergraduate* honors thesis that Flora-Jean and I finally got accepted! She did a great job making sure this paper was completed and published.

You can check out the materials here: https://osf.io/8kt52/

You can view the pre-print: https://osf.io/ptf7c/

We should have the real print up soon! Just waiting on the journal now.

Abstract: The purpose of this study was to determine whether certain textisms (texting cues) were perceived as more comfortable than others, both in the context of conversation as well as with regards to the general perception of the textism. Participants were assigned to one of two conversations and were asked to rate how comfortable they would feel after each statement in a conversation. Next, they were all asked to rank the general comfort ratings of each textism. We predicted that participants would feel more comfortable with the usage of emoticons (a smiley face) and initialisms (JK), whereas they would feel less comfortable with typographical symbols (…) and capital letters (WHAT) in general, as well as in the context of a conversation. Results indicated that, globally and in the context of a conversation, participants perceive initialisms and emoticons as more comfortable and typographical symbols and capital letters as less comfortable.

Hi everyone!

I have been super swamped with a bunch of due dates that all hit in April. For a small brag, and I like making lists:

  • 9 revise and resubmits (four we’ve sent back, two have been accepted!)
  • 4 conference posters and one invited talk
  • 1 submitted grant (fingers crossed!)
  • 2 invited papers
  • 2 theses that I’m chairing, 2 that I’m on the committee for
  • Data camp!

It’s been nuts, so haven’t left the house much or done much of anything else. Anywho, I thought I would share my mediation and moderation talk information for the conferece.

You can get it here: https://osf.io/t3syq/

The information includes SPSS and R guides for mediation/moderation, including bootstrapped confidence intervals for the indirect effect. These CI values give you more to talk about rather than saying “fully” or “partially” mediated based on some magic “p” value change (don’t do this). I will record my talk and put it online on our YT page as well. Enjoy!

Heyo! I am doing my best to procrastinate here on a blustery Tuesday afternoon. So, I decided to share some code I’ve put together that solves problems in R that I used to do in perl. HTML or C++ was probably my first real language, but I love the heck out of perl. It’s never done me wrong (unlike you PHP).

Anyways! The context of this project is that we are developing a dictionary of words to complement the work done by Jonathan Haidt and Jesse Graham – learn more. I had a student who was interested in Moral Foundations Theory and its relationship to language, and we had tested some of the dictionary and found it to be frustratingly obtuse. Meaning, that a lot of the words in it are great, but not things that people like, college freshman, or even me were likely to say. She’s moved on to working with the founder of the LIWC – and even worked on the newest version of it :small brag:.

Now I have a second student who’s helping finish up some work on the dictionary, to see if what we were doing is worthwhile (spoiler alert: I don’t know). However, I thought I might share some code we were using and it’s context for people who are also trying to get into doing some of this text mining/cleaning/editing in R. You can find all the materials for this project, including the code in context of our messy paper, on GitHub.

Here’s a view of what the data looks like (this isn’t even the messiest part, and part 2 of our study uses full written paragraphs):

> head(noout1$Q27)
[1] "doctors, babysitting"                    
[2] "criminals, doctors, shootings, medicine "
[3] "Health"                                  
[4] "physical healthiness, mental healthiness"
[5] "hurt, effect, love, protect"             
[6] "hurt, depression, pain"

So, couple things we have to deal with:

  • Mixed case
  • Punctuation
  • Stemming (affixes)

Now, don’t hate on me folks, but I love a good loop. I could probably do this with the apply family, but I didn’t:

> ##stem the data library(corpus) was loaded earlier
> for (i in 1:nrow(noout1)) {
+   noout1$Q27[i] = paste(unlist(
+     text_tokens(noout1$Q27[i], stemmer = "en")), collapse = " ")
+ }

Unpacking what this does:

  • Loops over each participant’s answers in Q27. I did this because text_tokens returns a list of lists, which I personally find troublesome to deal with, and I wanted to retain each persons answers in one cell.
  • Uses text_tokens to “tokenize” or de-affix the data. stemmer = "en" is an argument to stem the words in English.
  • Unlists the list returned by text_tokens.
  • Pastes the updated data back to one cell. Be sure to use collapse here and not sep, as we want 1 item returned, and sep would just stick spaces between items if there were more than one.
##one example
> paste(unlist(
+     text_tokens(noout1$Q27[4], stemmer = "en")), collapse = " ")
[1] "physic healthi , mental healthi" ##one string
> paste(unlist(
+     text_tokens(noout1$Q27[4], stemmer = "en")), sep = " ")
[1] "physic"  "healthi" ","       "mental"  "healthi" ##five strings

Let’s look at the data now:

> head(noout1$Q27)
[1] "doctor , babysit"                 
[2] "crimin , doctor , shoot , medicin"
[3] "health"                           
[4] "physic healthi , mental healthi"  
[5] "hurt , effect , love , protect"   
[6] "hurt , depress , pain"

You can see that the words have been stemmed and are now in lower case. We haven’t removed punctuation yet. There’s lots of ways to do that, but since one of the next steps does it for me, I won’t cover those. The next step requires the tm library, although I bet the corpus library also does similar steps, just more familiar with tm. We will create a corpus out of the vector of participant answers I have:

> ##create a corpus
> harm_corpus = Corpus(VectorSource(noout1$Q27))
> harm_TDM = as.matrix(TermDocumentMatrix(harm_corpus,
+                               control = list(removePunctuation = TRUE,
+                                              stopwords = TRUE)))

The Corpus step simply creates a big list of all the “documents” (here, each participant is treated as a separate document, which is what I want) from a Vector, rather than opening separate documents in a file. The TermDocumentMatrix function creates a giant matrix wherein:

  • Terms (words) are rows
  • Documents (participants) are columns
  • Each row, column combination stores the number of times a term appeared in each document.

These can get real big, real fast, fyi. The nice thing about the TermDocumentMatrix function is that it handled the punction for me by using removePunctuation = TRUE and also dealt with the stop words. Stop words are things like the, an, a, of that are traditionally removed from these types of analyses that focus on content words over helper words.

> harm_TDM[1:6, 1:6]
         Docs
Terms     1 2 3 4 5 6
  babysit 1 0 0 0 0 0
  doctor  1 1 0 0 0 0
  crimin  0 1 0 0 0 0
  medicin 0 1 0 0 0 0
  shoot   0 1 0 0 0 0
  health  0 0 1 0 0 0

Great, now what can I do with that? Everything! Here’s what we did. Found the most frequent words by creating a data.frame that was a frequency table (thanks StackOverflow!):

> ##view the most frequent words
> harm_freq = data.frame(Word = rownames(harm_TDM),
+                        Freq = rowSums(harm_TDM),
+                        row.names = NULL)
> harm_freq$Word = as.character(harm_freq$Word)
> harm_freq$percent = harm_freq$Freq/nrow(noout1) *100
> head(harm_freq)
     Word Freq    percent
1 babysit    1  0.2298851
2  doctor   52 11.9540230
3  crimin    6  1.3793103
4 medicin    5  1.1494253
5   shoot    1  0.2298851
6  health   16  3.6781609

Doctor is in the top 5, other big words included hurt, love, pain, and hospit(al). In this prompt, participants were free associating with the harm/care foundation. Now the tricky part was to combine this data back with my other data frame that included particiapnt information, including their moral foundation questionnaire scores:

> harm_words = harm_freq$Word[harm_freq$percent >=1]
> head(harm_words)
[1] "doctor"  "crimin"  "medicin" "health"  "mental"  "physic"

First, I created a list of harm words that were mentioned at least 1% of the time. I use the transpose function t() to flip the dataset from rows as words, to columns as words to maintain “tidy-ish” data (i.e., each participant is their own row). Then I subset out the dataset to only be my top words:

> harm_TDM = as.data.frame(t(harm_TDM))
> harm_TDM = harm_TDM[ , harm_words]
> harm_TDM[1:6, 1:6]
  doctor crimin medicin health mental physic
1      1      0       0      0      0      0
2      1      1       1      0      0      0
3      0      0       0      1      0      0
4      0      0       0      0      1      1
5      0      0       0      0      0      0
6      0      0       0      0      0      0

Now, we can cbind our harm dataset with the other relevant columns for harm.

> harm_final = cbind(noout1[ , c("ResponseId", "Q15_1", "Q23", "harmMFQ")],
+                    harm_TDM)
> harm_final[1:6, 1:6]
         ResponseId Q15_1         Q23 harmMFQ doctor crimin
1 R_2BkYH8gEtZMEQnG     8    Democrat      18      1      0
2 R_qCTluTnJCgGFqXT     6    Democrat      18      1      1
3 R_11hglRVpaSclG0K     5  Republican      13      0      0
4 R_3kMsBrEjwDtu5iJ     6 Independent      16      0      0
5 R_swkbG8889YEOxoZ     3  Republican      14      0      0
6 R_s682tzsz2YIkwJX    10    Democrat      17      0      0

So, now you too can create participant term-document matrices! In later posts, I’ll show you how we are going to use this information to create an updated dictionary and examine if that dictionary relates to the Moral Foundations Questionnaire. This task will involve some correlations, but also a multi-trait multi-method analysis using lavaan so stay tuned if you are interested in structural equation modeling.

Just wanted to do a quick post to say that the Nature Human Behavior response paper, Justify Your Alpha is now online at NHB’s website: Springer – it is free to view but not download. You can download the PDF version on OSF.

We’ve submitted a couple new papers as well – updated those on my research publications page. I also have a couple more to get done – hoping to feature some of the cool coding work I’ve done this week after taking a breather from a seriously packed week. I’ve reached my revise and resubmit limit … five total: 1 accepted, 1 under review again, 3 editing. With two invited papers due in April and a big conference, I might implode!

My coauthor John Scofield and I just had a publication accepted at Behavior Research Methods – you can check out the publication preprint at OSF.

We thew together a website for the paper that summarizes everything we found, as well as puts all the materials together in one place – check it out.

We create a really nice R function to help you detect low quality data, which you can find on GitHub, and I even made a video that explains all the parts to the function at YouTube.

If you aren’t a R person, you can use our Shiny App, download the code, and watch the YouTube video that explains everything to you.

Enjoy!

Heyo! I wanted to write a post about some of the quirky things I’ve found with writing manuscripts in R Markdown, as well as provide a solution to a problem that someone else might be having.

Update: The csl file I describe below is a special formatted one, which was shared with me. You can download it from GitHub to try the suggestions below.

Update 2: Turns out, potentially, the suggestions from the manual are not working correctly, as Frederik has checked it out and opened an issue on github. I’ll write a new post when there are updates!

First, let me tell you how much I love Frederik Aust’s papaja package for R. I had been trying to integrate open science and transparency in our lab, which was helped by the switch to R to track what we were doing in our data analysis. I heard about papaja through a former student, and I jumped in head first. I know it’s helped us think a LOT about reproducibility and replication, as we want people to be able to track what we did and avoid p-hacking in our papers. Having a workflow that is integrated throughout the manuscript really forces you to think about how you are presenting your data and knowing that others can view it especially forces you to be clear about what you did. We’ve fully embraced working transparently through Open Science Foundation integration, much of work in on GitHub, and we are writing manuscripts with papaja to make it more obvious what is what.

Before doing that, I had started learning markdown, and although I’ve been using it for a bit now, I still feel like a noob. Mix LaTeX in there, and even more so. Thankfully, I have some very awesome twitter friends that help me when I get stuck in trying to do something … like trying to stick a % symbol in a column name for a table. Whew. One thing I wish were a little bit different is citations. Currently, papaja using pandoc-citeproc to create the text referencing for knitting to PDF or Word.

The problem with this is that any time you have the same author last names (like Erin Buchanan and Tom Buchanan), you automatically get E. Buchanan and T. Buchanan in the in-text referencing. That is APA style but reviewers and the like do not like it. Real APA != to Used APA. The other issue stems from the fact that you will get the the first initials, even if the other author name match is in second or third place. Therefore, if I cite myself and cite Tom but he only appears as second author, I will still get E. Buchanan in the in text citation. That’s probably also a correct interpretation of APA but ain’t worth fighting reviewers over. Additionally, the absolute name matching often forces us to fix bibtex files a lot over things like Buchanan, E. versus Buchanan, E.M. versus Buchanan, Erin etc. Many different permutations of one person’s name via differences in doi citations can be tedious to fix.

Therefore! I checked out the papaja manual – which is stellar – to see if there was some other way to do it. I also googled this, but really got stuck with the translation of latex to markdown. The manual suggests you can do this:

---
output:
  papaja::apa6_pdf:
    citation_package: biblatex
---

To pass the citations through a different processor. Great! I will try that.

Latexmk: This is Latexmk, John Collins, 19 Jan. 2017, version: 4.52c.
Latexmk: applying rule 'biber QWERTY'...
Rule 'biber QWERTY': File changes, etc:
   Non-existent destination files:
      'QWERTY.bbl'
------------
Run number 1 of rule 'biber QWERTY'
------------
------------
Running 'biber  "QWERTY"'
------------
INFO - This is Biber 2.7
INFO - Logfile is 'QWERTY.blg'
ERROR - QWERTY.bcf is malformed, last biblatex run probably failed. Deleted QWERTY.bbl
INFO - ERRORS: 1
Latexmk: biber found malformed bcf file for 'QWERTY'.
  I'll ignore error, and delete any bbl file.
Rule 'pdflatex': File changes, etc:
   Non-existent destination files:
      'QWERTY.pdf'
------------
Run number 1 of rule 'pdflatex'
------------
Biber error: [427] Utils.pm:180> ERROR - QWERTY.bcf is malformed, last biblatex run probably failed. Deleted QWERTY.bbl
Latexmk: applying rule 'pdflatex'...
------------
Running 'pdflatex  -halt-on-error -interaction=batchmode -recorder  "QWERTY.tex"'
------------
This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017) (preloaded format=pdflatex)
restricted \write18 enabled.
entering extended mode
Latexmk: Non-existent bbl file 'QWERTY.bbl'
No file QWERTY.bbl.
=== TeX engine is 'pdfTeX'
Biber error: [427] Utils.pm:180> ERROR - QWERTY.bcf is malformed, last biblatex run probably failed. Deleted QWERTY.bbl
Latexmk: Errors, so I did not complete making targets
Collected error summary (may duplicate other messages):
  pdflatex: Command for 'pdflatex' gave return code 1
      Refer to 'QWERTY.log' for details
Latexmk: Use the -f option to force complete processing,
unless error was exceeding maximum runs of latex/pdflatex.
! LaTeX Error: Command \c@author already defined.
               Or name \end... illegal, see p.192 of the manual.
 
Error: Failed to compile QWERTY.tex. See QWERTY.log for more info.
Execution halted

Balls. I searched this error for a while and found: 1) update LaTeX: check, 2) figure out why your bibtext was messed up: check … tried with only one reference and still crashed, and 3) other stuff I don’t remember. When I tried a separate markdown, thinking the one that I had open was the problem, I got the actual citation codes, rather than the text:

Researchers discovered that online data collection can be 
advantageous over laboratory and paper data collection, as it 
is often cheaper and more efficient (Ilieva2001;Schuldt1994;Reips2012)

I thought maybe it was my computer, so one of my coauthors tried it. Same as the first error. Maybe it’s a mac thing? Another coauthor with a mac, got the second error. I’m sad to say that I don’t have an answer for either of these problems – from the looks of it, I’m following the guidelines suggested, but both problems pop up. I would love to hear if you know why.

Enter Julia! Julia helped find a work around for the issue. In the head of your markdown file (note I used some … to shorten some of what papaja does for you automatically):

...
bibliography      : ["q_bib.bib"]
...
output            : papaja::apa6_pdf
replace_ampersands: yes
csl               : apa6.csl
---

And then be sure to put the apa6.csl in the same folder as your markdown. Now, you can confuse people with all your Buchanans, Logans, Cohens, and Fritzs. Or, in our case, we can make Reviewer #2 happy and annoy the copy editor.

Note: I had to update papaja to get this solution to work, as the replace ampersands did not work the first time.

For a recent publication comparing null hypothesis testing p-values to Bayes Factors and Observation Oriented Modeling, we created a Shiny app to graph all of our complex plots. I particularly pleased with the plotly 3D graph – as I usually think that 3D graphs are impossible to read. This plot shows what we found in our study (albeit I would recommend viewing the 2D plots more):

  • Bayes Factors and p-values follow a power function, as we expected.
  • Bayes Factors and OOM values follow an interesting pattern, wherein as sample size increases, BF expands outwards, while PCC values tend to constrict.
  • p-values will always decrease to floor, and PCC values still tend to constrict toward the simulated effect size range.

Another component of this app I wanted to show off was the interactive response points, wherein the input options (on the left) change based on a user selected input option. Therefore, options that are normally only input are both input and output in the traditional Shiny set up.

You can see that by having the selection (first part) and the changing selection (second part) in the fluid page:

selectInput("Nselect", "Select N Scaling:",
                  c("N" = "N",
                    "Log N" = "log")),
                    
htmlOutput("slider_selector")

Which is connected to the server function below:

  ####change the slider####
  output$slider_selector = renderUI({ 
    
    if (input$Nselect == "N") { minN = 10; maxN = 1000; stepN = 10}
    if (input$Nselect == "log") { minN = round(log(10),1) 
                                  maxN = round(log(1000),1)
                                  stepN = .1}
    
    sliderInput("xaxisrange", "X-Axis Range:",
                min = minN, max = maxN,
                value = c(minN,maxN),
                sep = "",
                round = -1,
                step = stepN)
  })

These two pieces feed information back and forth depending on the user input to show either X on a real scale or X on a log scale.Code is included below, and when our server isn’t being cranky, the app is here. The code is pretty long due to the sheer number of graphs, so it’s edited down to just the shiny parts – when you see ####GRAPH#### that’s some kicking ggplot2 graphs you can view in our github repo.Check out the project OSF page here. You can download the entire app from our github repo (also other shiny apps!).

library(shiny)
library(ggplot2)
library(reshape)
library(plotly)

####remove data loading and reshaping####

####user interface####
ui <- fluidPage (
  
  titlePanel("Valentine et al. Interactive Graphics"),
  
  sidebarLayout(
    
    ##sidebarpanel
    sidebarPanel(
      
      br(),
      
      ##put input boxes here
      tags$em("All Graphs:"),
      selectInput("sizeselect", "Select Effect Size:",
                  c("Negligible" = "None",
                    "Small" = "Small",
                    "Medium" = "Medium",
                    "Large" = "Large")),
      
      tags$em("Percent Graphs:"),
      selectInput("Nselect", "Select N Scaling:",
                  c("N" = "N",
                    "Log N" = "log")),
      
      htmlOutput("slider_selector"),
      
      tags$em("Comparison Graphs:"),
      
      selectInput("graphselect", "Select Graph:",
                  c("PCC - p" = "pccp",
                    "PCC - BF" = "pccbf",
                    "BF - p" = "bfp")),
      
      sliderInput("bfrange", "Log BF Range:",
                  min = -5, max = 600,
                  value = c(-5,600),
                  sep = "",
                  step = 10),
      
      sliderInput("prange", "p Range:",
                  min = 0, max = 1,
                  value = c(0,1),
                  step = .01),
      
      sliderInput("pccrange", "PCC Range:",
                  min = 0, max = 1,
                  value = c(0,1),
                  step = .01)
      
    ), #close sidebar panel
    
    mainPanel(
      
      tabsetPanel(
        tabPanel("Significant", plotOutput("sigpic"),
                 br(),
                 helpText("Complete dataset avaliable at: https://osf.io/u9hf4/")),
        tabPanel("Non-Significant", plotOutput("nonpic"),
                 br(),
                 helpText("Complete dataset avaliable at: https://osf.io/u9hf4/")),
        tabPanel("Omnibus Agreement", plotOutput("omniagree"),
                 br(),
                 helpText("Complete dataset avaliable at: https://osf.io/u9hf4/")),
        tabPanel("Posthoc Agreement", plotOutput("postagree"),
                 br(),
                 helpText("Complete dataset avaliable at: https://osf.io/u9hf4/")),
        tabPanel("Criterion Comparison", plotOutput("compare"), 
                 br(),
                 helpText("Complete dataset avaliable at: https://osf.io/u9hf4/",br(), 
                          "BF values have been log transformed to show the entire range of the data.")),
        tabPanel("3D Comparison", plotlyOutput("compare3d"), 
                 br(),
                 helpText("Complete dataset avaliable at: https://osf.io/u9hf4/",br(), 
                          "BF values have been log transformed to show the entire range of the data."))
      )
      
    ) #close main panel 
    
  ) #close sidebar layout

) #close fluid page

####server functions####
server <- function(input, output) {
   
  ####change the slider####
  output$slider_selector = renderUI({ 
    
    if (input$Nselect == "N") { minN = 10; maxN = 1000; stepN = 10}
    if (input$Nselect == "log") { minN = round(log(10),1) 
                                  maxN = round(log(1000),1)
                                  stepN = .1}
    
    sliderInput("xaxisrange", "X-Axis Range:",
                min = minN, max = maxN,
                value = c(minN,maxN),
                sep = "",
                round = -1,
                step = stepN)
  })
  
   ####SIGNIFICANT EFFECTS####
   output$sigpic <- renderPlot({

     graphdata = subset(long_graph, Significance=="Sig" & Effect == input$sizeselect)
     
     ##log N
     if (input$Nselect == "log") { graphdata$N = log(graphdata$N) 
                                    xlabel = "Log N" } else { xlabel = "N"}
     
     ####GRAPH####
   })
   
   ####NONSIGNIFICANT EFFECTS####
   output$nonpic <- renderPlot({
     
     nsgraphdata = subset(long_graph, Significance=="Non" & Effect == input$sizeselect)
     
     ##log N
     if (input$Nselect == "log") { nsgraphdata$N = log(nsgraphdata$N)  
                                   xlabel = "Log N" } else { xlabel = "N"}
     
     ####GRAPH####
   })
   
   ####OMNIBUS AGREEMENT####
   output$omniagree <- renderPlot({
     
     ##log n to get a better graph
     if (input$Nselect == "log") { agreelong$N = log(agreelong$N)
                                   xlabel = "Log N" } else { xlabel = "N"}
     
     ####GRAPH####
   })
   
   ####POST HOC AGREEMENT####
   output$postagree <- renderPlot({
     
     ##log n to get a better graph
     if (input$Nselect == "log") { agreelong$N = log(agreelong$N)
     xlabel = "Log N" } else { xlabel = "N"}
     
     ####GRAPH####
   })
   
   ####COMPARISON GRAPHS####
   output$compare <- renderPlot({
     
     if (input$graphselect == "pccp"){
       
       ####GRAPH####
       
     } else if (input$graphselect == "pccbf"){
       
       ####GRAPH####
       
     } else if (input$graphselect == "bfp"){
       
       ####GRAPH####
       
     }
     
   })
   
   ####3D COMPARISON GRAPHS####
   output$compare3d <- renderPlotly({
     
     ####GRAPH SET UP####
     
     overall = plot_ly(overallgraph3d, 
                       x = ~overallBF,
                       y = ~oompcc,
                       z = ~omniP,
                       color = ~N,
                       symbol = ~star,
                       symbols=c("circle","cross"),
                       mode="markers") %>%
       add_markers() %>%
       layout(scene = list(xaxis = list(title = 'Bayes Factors'),
                           yaxis = list(title = 'OOM PCC'),
                           zaxis = list(title = 'p-Value')),
              annotations = list(
                x = 1.13,
                y = 1.05,
                text = colorlabel,
                xref = 'paper',
                yref = 'paper',
                showarrow = FALSE
              ))
     
     overall
     
   })
   
} #close server functions

# Run the application 
shinyApp(ui = ui, server = server)