How to Read in a Data File in Rstudio

[This article was first published on R on Stats and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this folio here)


Want to share your content on R-bloggers? click here if you have a blog, or here if yous don't.

  • Introduction
  • Transform an Excel file to a CSV file
  • R working directory
    • Become working directory
    • Gear up working directory
      • User-friendly method
      • Via the console
      • Via the text editor
  • Import your dataset
    • User-friendly way
    • Via the text editor
  • Import SPSS (.sav) files

Introduction

As we have seen in this commodity on how to install R and RStudio, R is useful for many kind of computational tasks and statistical analyses. Still, it would not be so powerful and useful without the possibility to import datasets into R. As you will most likely use R with your ain information, existence able to import it into R is crucial for any user.

In this article I present two different ways to import an Excel file; (i) via the text editor and (2) in a more than "user-friendly" way. I also discuss about the main advantages and inconvenients of both methods. Note that:

  • How to import a dataset often depends on the format of the file (Excel, CSV, text, SPSS, Stata, etc.). I focus here only on Excel files every bit information technology is the most common type of file for a dataset
  • There are several other ways to import an Excel file (probably even some I am non aware of), merely I nowadays the two most elementary yet robust ways to import such files
  • No matter what blazon of file and how yous import it, there is one gold standard regarding how datasets are structured: columns stand for to variables, rows correspond to observations (in the broad sense of the term) and each value must have its own prison cell:

Structure of a dataset. Source: R for Data Science by Hadley Wickham & Garrett Grolemund

Structure of a dataset. Source: R for Data Science by Hadley Wickham & Garrett Grolemund

Transform an Excel file to a CSV file

Before dealing with the importation, the first matter is to change the format of your Excel file to a CSV format. CSV format is the standard when working with datasets and programming languages as it is a more robust format compared to Excel. If your file is already in the CSV format (with the extension .csv), you can skip this section. If the file is not in the CSV format (for example the extension is .xlsx) you tin easily transform it to CSV by following these steps:

  1. Open your Excel file
  2. Click on File > Salvage as
  3. Choose the format .csv
  4. Click on Relieve

Check that your file finishes with the extension .csv. If that is the case, your file is now ready to exist imported. But first, let me innovate an of import concept when importing datasets into RStudio, the working directory.

R working directory

Although programming languages may exist very powerful, information technology often needs our help and importing a dataset is non an exception. Indeed, before importing your data, y'all must tell RStudio where your file is located (and so allow RStudio know in which folder to wait for your dataset). But before this, let me introduce the working directory. The working directory is the location (in your estimator) of where RStudio is currently working (in fact RStudio is not working across your entire computer; it is working within 1 folder of your computer). Concerning this working directory, there are ii functions that we volition need:

  1. getwd() (wd stands for working directory)
  2. setwd()

Go working directory

In well-nigh cases, when you lot open RStudio, the working directory (so where it is currently working) is different than where your dataset is located. To know what is the working directory RStudio is currently using, run getwd(). On MacOS, this function will near likely render a location such equally "/Users/yourname/", while on Windows it will well-nigh likely return "c:/Documents/". Do not worry if your working directory is different, the well-nigh of import is to set the working directory correctly (so where your file is located) and not where it is now.

Set working directory

Equally mentioned earlier, your dataset is most probable located in a different location than your working directory. Without any activeness from you, RStudio volition never be able to import your file every bit it is not looking in the right folder (you lot volition come across the following error in the console: cannot open file 'information.csv': No such file or directory). Now, in order to specify the correct location of your file (that is, to tell RStudio in which folder information technology should look for your dataset), you have three options:

  1. the convenient method
  2. via the console
  3. via the text editor (run into below why it is my preferred option)

User-friendly method

To prepare the correct folder, then to fix the working directory equal to the folder where your file is located, follow these steps:

  1. In the lower correct pane of RStudio, click on the tab "Files"
  2. Click on "Home" next to the house icon
  3. Go to the folder where your dataset is located
  4. Click on "More"
  5. Click on "Set Equally Working Directory

Set working directory in RStudio (user-friendly method)

Set up working directory in RStudio (user-friendly method)

Alternativaly, you lot tin also set up the working directory by clicking on Session > Set Working Directory > Cull Directory…

Set working directory in RStudio (user-friendly method)

Set working directory in RStudio (convenient method)

As you can see in the console, any of the 2 methods volition actually execute the code setwd() with the path to the binder you specified. Then by clicking on the buttons yous really asked RStudio to write a line of code for you. This method has the advantage that you practise not need to recollect the code and that you will not make a mistake in the name of the path to your folder. The disadvantage is that if you leave RStudio and open information technology again later, you will have to specify the working directory over again as RStudio did not save your deportment via the buttons.

Via the console

Y'all can specifiy the working directory by running setwd(path/to/binder) straight in the console, with path/to/folder beingness the path to the folder containing your dataset. However, yous will demand to run the command again when reoping RStudio.

Via the text editor

This method is actually a combination of the two above:

  1. Set the working directory by following the verbal same steps than for the convenient method (via the buttons)
  2. Copy the code executed in the console and paste it in the text editor (i.e., your script)

I recommend this method for several reasons. First, you do not need to call up the setwd() function. 2d, you will not make typos in the path of your binder (path which tin sometimes exist quite long if you accept folders within folders). 3rd, when saving your script (which I assume yous do otherwise y'all would lose all your work), you likewise salvage the actions you merely made via the buttons. And so when you lot reopen your script in the futurity, no thing what is the current directory, by executing your script (which now include the line of code for setting the working directory), yous will at the same fourth dimension specify the working directory you selected for this project.

Import your dataset

Now that you take tranformed your Excel file into a CSV file and you have specified the binder containing your data by setting the working directory, you are now ready to actually import your dataset. Remind that at that place are a 2 methods to import a file:

  1. in a convenient way
  2. via the text editor (see also below why it is my preferred option)

No thing which method y'all cull, it is a good practice to get-go open your file in TextEdit (on Mac) or Notepad (on Windows) in social club to see the raw information. If y'all open up the file in Excel you will see the data already formatted and thus miss some important information needed for the importation. Below an example of raw data:

Example of raw data

Example of raw data

In that location are a few things we demand to expect for in social club to properly import our dataset:

  • Are the variables names present?
  • How are the values seperated? Comma, semicolon, whitespace, tab?
  • Is the decimal a point or a comma?
  • How are specified missing values? Empty cells, NA, null, O, other?

User-friendly manner

As shown below, just click on the file > Import Dataset…

Import dataset in RStudio

Import dataset in RStudio

A window which looks like this will open:

Import window in RStudio

Import window in RStudio

From this window, you lot tin can have a preview of your information, and more importantly, check whether your data seems to accept been imported correctly. If your data have been correctly imported, y'all can click on "Import". If this is non the case, you can change the import options at the bottom of the window (beneath the data preview) respective to the information you gathered when looking at the raw information. Below, the import options you will about likely apply:

  • Proper noun: set the name of your data set (default is the name of the file). Avoid special characters and long names (as you will take to type the proper noun of your dataset several times). I personnaly rename my datasets with a generic name such every bit "dat", others use "df" (for dataframe), "data", or even "my_data". You could use more explicit names such equally "tennis_data" if you are using information on tennis matches for example. Notwithstanding, the main drawback with using specific names for datasets is that if, for instance, you want to reuse the lawmaking you created while analysing lawn tennis data on other datasets, yous will need to edit your code by replacing all occurences of "tennis_data" past the name of your new dataset
  • Skip: specify the number of top rows you want to skip (default is 0). Near of the time, 0 is fine. All the same, if your file contains some blank rows at the top (or information yous want to disregard), set the number of rows to skip
  • Commencement Row as Names: specify whether the variables names are present or not (default is that variables names are present)
  • Delimiter: the character which separate the values. From our raw data above, you can meet that the delimiter is a comma (","). Change it to semicolon if your values are separated by ";"
  • NA: how missing values are specified (default is empty cells). From our raw information above, y'all can run into that missing values are simply empty cells, so leave NA to default or change it to "empty". Change this option if missing values in your raw data are coded as "NA" or "0" (tip: do not code yourself missing values as "0", otherwise yous will not be able to distinguish the true zero values and the missing values)

After changing the import options respective to your data, click on "Import". Yous should now run across your dataset in a new window and from there yous can commencement analyzing your data.

This convenient method has the advantage that you practise not need to remember the code (meet the next section for the entire code). However, the main drawback is that your import options volition not exist saved for a future usage so you will need to import your dataset manually each time you open RStudio.

Via the text editor

Similarily to setting the working directory, I also recommend using the text editor instead of the user-friendly method for the simple reason that you can save your import options when using the text editor (and not when using the user-friendly method). Saving your import options in your script (thanks to a line of code) allows you to quickly import your dataset the verbal aforementioned fashion without having to echo all the necessary steps everytime you lot import your dataset. The command to import a CSV file is read.csv() (or read.csv2() which is equivalent just with other default import options). Here is an example with the same file than in the convenient method:

dat <- read.csv(   file = "information.csv",   header = True,   sep = ",",   dec = "." )
  • dat <-: name of the dataset in RStudio. This means that afterwards importation, I will demand to refer to the dataset by calling dat
  • file =: name of the file in the working directory. Exercise non forget "" effectually the name, the extension .csv at the stop and the fact that RStudio is instance sensitive ("Data.csv" volition requite an error) and space sensitive within "" ("information .csv" will also throw an fault). In our case the file is named "information.csv" then file = "data.csv"
  • header =: are variables names present? The default is TRUE, alter it to Fake if it is not the example in your dataset (TRUE and Fake are always in capital letters, true will not work!)
  • sep =: separator. Equivalent to delimiter in the user-friendly method. Practise not forget the "". In our dataset the separator of the values is a comma so sep = ","
  • dec =: decimal. Do not forget the "". In our dataset, the decimal for the numeric values is a bespeak, and then dec = "."
  • I do non write that missing values are coded as empty cells in my dataset because it is the default
  • Final but non least, practise not forget that the arguments are separated by a comma

Other arguments be, run ?read.csv to see all of them.

Later on the importation you tin can bank check whether your information have been correctly imported by running View(dat) where dat is the proper noun you chose for your information. A window, like than for the user-friendly method, will display your data. Alternatively you lot can also run head(dat) to come across the kickoff 6 rows and check that it corresponds to your Excel file. If something is not right, edit the import options and check once again. If your dataset has been correctly imported, y'all can now start analyzing your data. Meet other articles on R if yous want to learn how.

The reward of importing your dataset directly via the code in the text editor is that your import options will be saved for a future usage, preventing you from importing it manually every fourth dimension yous open your script. You will, however, need to call back the function read.csv() (not the arguments since you can always cheque them in the help documentation).

Import SPSS (.sav) files

Just Excel files are covered in details here. Yet, SPSS files (.sav) tin besides be read in R by using the post-obit command:

library(strange) dat <- read.spss(   file = "filename.sav",   utilize.value.labels = TRUE,   to.data.frame = True )

The read.spss() role outputs a information table which retrieves all the characteristics of the .sav file, including the names given for the different levels of the categorical variables and the characteristics of the variables. If you demand more than data about this control, meet the help documentation (library(foreign) then ?read.spss).

Cheers for reading. I hope this commodity helped you to import an Excel file in RStudio. If your dataset is correctly imported, larn how to manipule it. As ever, if y'all detect a mistake/bug or if you lot have whatsoever questions exercise not hesitate to let me know in the annotate department beneath, raise an issue on GitHub or contact me. Get updates every time a new article is published by subscribing to this blog.

leeuted1963.blogspot.com

Source: https://www.r-bloggers.com/2019/12/how-to-import-an-excel-file-in-rstudio/

Related Posts

0 Response to "How to Read in a Data File in Rstudio"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel