Associate Professor, Departamento de Ciencias Sociales, PONTIFICIA UNIVERSIDAD CATOLICA DEL PERU (jmagallanes@pucp.edu.pe).
Visiting Associate Professor, Evans School of Public Policy and Governance / Senior Data Science Fellow, eScience Institute, UNIVERSITY OF WASHINGTON (magajm@uw.edu).
We will focus on the production of a simple paper (the simplest ever). Let me describe you the steps to follow:
Create a GitHub account. Go to GitHub and sign up to create an account. Your username should have your name/surname. You should also download and install the GitHub desktop app.
Create a GitHub repository Sign into Github, and create a repository or repo there. You should complete the information shown in the figure below:
CREATING A REPO
CLONING A REPO
After confirming the operation, the desktop client will ask you where you want to save the local copy of the cloud repo, as shown here:
LOCAL FOLDER FOR A REPO
Synchronizing. Your GitHub Client detects any changes in your GitHub. Go to your client and check that these changes from Colab are synced by pressing FETCH ORIGIN.
Get the link to the data file. The data can be accessed now (as long as you have an internet connection). Go to your repo in the cloud, and click on the file name. This will take you to the file contents. Depending on the file type you can or can not see the values. This is a RDS file, so you will NOT see the contents. Now, get the link to the data, by right-clicking on the option download or raw (whichever is available).
Create the following document in R. Go to your RStudio and create an RScript. The codes will be:
# collecting
fileLink="linkToGithub repo"
MyFile=url(fileLink)
dataidx=readRDS(MyFile)
# Describing a categorical variable**:
tableONI=table(dataidx$ONIpolitical)
tableONI
# Using a plot for the categorical:
barplot(tableONI)
# Describing the numerical variables
summary(dataidx[,c(3,4)])
# Using a plot for the numerical:
boxplot(dataidx[,c(3,4)])
## Describing bivariate relationships
# * Numerical and categorical:
boxplot(dataidx$FHF~dataidx$Region)
#Boxplots were introduced by Tuckey (Tukey, John W (1977). Exploratory Data Analysis. Addison-Wesley.)
# * Numerical and Numerical
plot(dataidx$FHF~dataidx$RWB)
# The scatter plot is thought to be invented by John Frederick W. Herschel according to this link: https://qz.com/1235712/the-origins-of-the-scatter-plot-data-visualizations-greatest-invention/
Transform the RScript. I have a series of templates. Let’s use each one. For that, you need to go to our repo, fork it to your GitHub account; and then clone the forked repo into your machine.