<<

Reading Data from Files

In practical data analysis, data are often stored in a data file. R can read different types of data files such as the free format text files, comma separated value files, Excel files, SPSS files, SAS files, and Stata files.

Read Data from a Free Format Text File

The most common way to get data into R is to save data as free format in a text file and then use read.table() function to read the data. For example, let's read the data in a file called gpa.txt which is available on the website. The content of the data file is shown below.

## GPA data
## 999 represents missing data
 id gender college   gpa weight
  1      f     yes   3.6    110
  2      m     yes   3.5    170
  3      m      no  99.0    165
  4      m      no 999.0    190
  5      f      no 999.0     95
  6      m     yes   3.7    200
  7      m     yes   3.6    150
  8      f     yes   3.8    100
  9      f     yes   3.0    130
 10      f      no 999.0    120

Note that the first two lines of the data file start with "#", which are clearly notes or comments about the data. The third line appears to be variable names. After that, there are 10 lines of data.

The function read.table() can load data from a local computer and from a remote location on Internet. Since the data file here is online, we first show how to get data in this way using the code below. Note that

> gpadata <- read.table(file='https://advstats.psychstat.org/data/gpa.txt', header=TRUE, na.string="999", comment.char = "#") > > gpadata id gender college gpa weight 1 1 f yes 3.6 110 2 2 m yes 3.5 170 3 3 m no 99.0 165 4 4 m no NA 190 5 5 f no NA 95 6 6 m yes 3.7 200 7 7 m yes 3.6 150 8 8 f yes 3.8 100 9 9 f yes 3.0 130 10 10 f no NA 120 >

Access data

Data that are read into R are generally saved as a data frame. Some useful operations.

> gpadata <- read.table('https://advstats.psychstat.org/data/gpa.txt', header=TRUE, na.string="999") > head(gpadata) id gender college gpa weight 1 1 f yes 3.6 110 2 2 m yes 3.5 170 3 3 m no 99.0 165 4 4 m no NA 190 5 5 f no NA 95 6 6 m yes 3.7 200 > tail(gpadata) id gender college gpa weight 5 5 f no NA 95 6 6 m yes 3.7 200 7 7 m yes 3.6 150 8 8 f yes 3.8 100 9 9 f yes 3.0 130 10 10 f no NA 120 > names(gpadata) [1] "id" "gender" "college" "gpa" "weight" > dim(gpadata) [1] 10 5 > gpadata$weight [1] 110 170 165 190 95 200 150 100 130 120 > gpadata[, 2] [1] f m m m f m m f f f Levels: f m > gpadata[, 'gender'] [1] f m m m f m m f f f Levels: f m > attach(gpadata) > gender [1] f m m m f m m f f f Levels: f m > detach(gpadata) > gender ## this would produce an error Error: object 'gender' not found Execution halted