Skip to main content

Data

Data can be downloaded by going to the Data tab in the admin portal. On this page, you can a table with the following columns with buttons to download their respective data:

  • Condition Name is the unique name of the condition that you specified on the Condition Settings page.
  • Responses shows the number of responses for that condition so far.
  • Download CSV allows you to download a subset of the data in a .CSV format.
  • Download JSON allows you to download more comprehensive data in a .JSON format.
  • Download Media allows you to download a zip file of user-submitted media (e.g., uploaded profile photos, attachments to posts created by participants).

CSV data

The .csv data contains a subset of what we believe is the most important data from participants in an easy-to-parse and easy-to-analyze format that most researchers will be familiar with. The data in the columns are the following:

  • uniqueResponseID is a randomly-generated ID for every unique response recorded by the tool.
  • responseCode is the six-digit completion code that was generated by the info page to compensate participants on MTurk, Prolific, etc. (if you toggled that option on).
  • conditionId is the unique ID for the condition that participants completed.
  • accessCode is the Access Code (or, likely, unique URL extension) that participants used to access the website.
  • participantId is the participant ID that participants entered.
  • consent has a value of TRUE if participants consented to participate on an information page, and FALSE if participants did not.
  • startedAt is the time in UTC that participants started the study.
  • finishedAt is the time in UTC that participants reached the end of the study.
  • conditionName is the name of the condition that participants completed.
    • Note: Different conditions will always have unique conditionID values, but if you give two conditions the exact same name (which is not advised), they will have the same conditionName value.
  • language is the language selected for that condition that participants completed.
  • Multiple choice and open-text responses have columns that dynamically appear in your data if your condition has them (i.e., if you do not include any, they will not appear). Multiple choice questions with the option for multiple responses will create individual columns for each response option. The column names are the following three pieces of information in order, separated by !~*! strings:
    • The type of question; either MCQ for multiple choice question or OPENTEXT for open-text responses
    • The unique question ID
    • The response, which will output the response text for multiple choice questions or what they wrote for open-text responses
  • FACEBOOK or TWITTER columns will dynamically appear in your data if your condition has them. The column names are the following four pieces of information in order, separated by !~*! strings:
    • The type of platform being simulated (i.e., FACEBOOK or TWITTER)
    • The unique social media page ID
    • The name you gave to that social media page when you created it
    • The order in which the stimuli were presented (e.g., DESC for descending)
  • LIKE, LOVE, HAHA, WOW, SAD, ANGRY, and LINKCLICK columns will output all of the post IDs of the posts that participants had the respective interactions with, separated by |$| strings
  • LINK, VIDEO, PHOTO, TEXT, SHARE, RETWEET, QUOTETWEET, REPLYTO columns will output the contents of the posts of the respective types that participants make. Currently, participants cannot make LINK type posts, so this column will remain empty. Each post is separated by |$| strings. Each post contains the following information in order, separated by !~*! strings:
    • The unique post ID
    • The text that they added to that post
    • The unique media ID that is generated for any media that they uploaded
    • The post ID of the parent for this post. Shared posts and quote-tweeted posts will have the post ID of the parent post here (in the case of a shared post, this is the post that was shared). If there is no parent post, this will be a value of -9999
  • REGISTRATION contains what participants input into the fields on the registration page. Fields are separated by |$| strings. Each field contains the following information in order, separated by !~*! strings:
    • The unique field ID, which is generated per-field.
    • The display name you specified for that field
    • The custom field reference name (e.g., the type of data, such as HANDLE for Twitter or PROFILE PHOTO)
    • The value that the participants input into the field. For images, this will be a value of -9999 and the associated image when you go to download media will be named after the unique field ID.
Participant-generated Post IDs

If you see extra-long post IDs that show up in your data and (likely) are not ones you assigned to post stimuli, these are posts that participants made themselves. Every time a participant posts, their post is automatically assigned a randomly generated and very long post ID. Thus, if you see a long post ID that looks something like 18b97229-d945-46f1-aeec-947ba9f1b165 in the LIKE column, it means that participants made a new post and then liked their own post.

Analyses

Once you have your study data, one of the most basic types of analyses you will want to do is to count the number of times a given subset of the post stimuli was interacted with. Below is some sample R code for doing this.

# Install required packages

install.packages(c("tidyverse","dplyr","stringr","stringi"))

# Load required packages

library(tidyverse)
library(stringr)
library(dplyr)
library(stringi)

# Creates a vector (i.e., a list) of the post IDs that you want to count the number of a given interactions with. The example below is a variable called "misinfo" which is a list of the post IDs for all posts which are misinformation (101, 102, 103, and 104).

misinfo <- c("101", "102", "103", "104")

# Adds a new variable to the dataset that is a count of the number of posts from the vector that were interacted with a for a given interaction type. In the example below, for the dataset named "dataset", a new variable called "misinfoCount" is created that is a count of the unique number of post IDs from the "misinfo" list which were liked (using the "LIKE" variable, which could be replaced or repeated with any interaction type you want).

dataset %>% mutate(misinfoCount = sapply(str_extract_all(dataset$LIKE, misinfo), function(i)length(unique(i)))) -> dataset

JSON data

The .json data contains all of the raw data collected from participants, in a format that isn't as easily parsed.

Under construction

This section is under construction. Please check back later. In the future, this section will contain detailed explanations of all the data types collected and template code for parsing and analyzing it.

Media data

The media data contains all of the photos and videos participants may have uploaded (e.g., profile photos submitted during registration, attachments to posts) in a .zip file. Profile photos will be named according to the unique field IDs that they are associated with, and photos and videos attached to posts that participants make will be named according to the unique media IDs that are generated for them.