Chapter 2 Data Import
2.1 Entering data
You can’t use the R language to analyze data until you put your data in R.
Importing data into R is fairly simple. For Stata and Systat, use the foreign package. For SPSS and SAS I would recommend the Hmisc package for ease and functionality. See the Quick-R section on packages, for information on obtaining and installing the these packages. Example of importing data are provided below.
2.2 From Text
The read.csv function can include settings or parameters that may need to be set for the file to be read correctly (see ?read.csv for more info). Parameters are entered after the file name and separated by a comma. Some of the more useful parameters are shown below:
read.csv for comma separated values with period as decimal separator.
read.csv2 for semicolon separated values with comma as decimal separator.
read.delim tab-delimited files with period as decimal separator.
read.delim2 tab-delimited files with comma as decimal separator.
read.fwf data with a predetermined number of bytes per column.
setwd(“~/Desktop/”) #set working directory
setwd(“C:/Users/mateu/Desktop/data”)
person <- read.csv(file = "data.csv", header = FALSE, col.names = c("age","height"),sep=";")
(To practice importing a csv file, try this exercise.)
2.3 From Excel
One of the best ways to read an Excel file is to export it to a comma delimited file and import it using the method above. Alternatively you can use the xlsx package to access Excel files. The first row should contain variable/column names.
Read in the first worksheet from the workbook myexcel.xlsx. First row contains variable names.
library(xlsx)
mydata <-read.xlsx("c:/myexcel.xlsx", 1)
Read in the worksheet named mysheet:
mydata <-read.xlsx("c:/myexcel.xlsx", sheetName = "mysheet")
(To practice, try this exercise on importing an Excel worksheet into R.)
2.4 From SPSS
# save SPSS dataset in trasport format
get file='c:\mydata.sav'
export outfile='c:\mydata.por'
In R:
library(Hmisc)
mydata <- spss.get("c:/mydata.por", use.value.labels=TRUE)
- last option converts value labels to R factors!
(To practice importing SPSS data with the foreign package, try this exercise.)
2.5 From SAS
Save SAS dataset in trasport format:
libname out xport 'c:/mydata.xpt'; data out.mydata;
set sasuser.mydata;
run;
In R:
library(Hmisc)
mydata <-sasxport.get("c:/mydata.xpt")
2.6 From Stata
Input Stata file:
library(foreign)
mydata <- read.dta("c:/mydata.dta")
(To practice importing Stata data with the foreign package, try this exercise.)
2.8 Data from R packages
Good clean data packages, ideal for practicing:
gapminder
babynames
not a package but https://github.com/rfordatascience/tidytuesday
for tidying:
- https://github.com/jennybc/lotr
- Most raw data from gapminder: https://www.gapminder.org/data/
- https://www.jvcasillas.com/untidydata/