Import Data Again

This lesson is called Import Data Again, part of the Getting Started With R course. This lesson is called Import Data Again, part of the Getting Started With R course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video

library(tidyverse)

coffee_ratings <- read_csv(
    "coffee_ratings.csv",
    na = c("No rating", "Unknown", "8", "1", "3", "4", "23", "47"),
    col_types = cols(total_cup_points = "d", harvest_year = "i")
)

coffee_ratings

Your Turn

Adjust your read_csv() code so that you import the data again
Use the na argument to tell read_csv() what should be treated as missing in the altitude_mean_meterscolumn
Use the col_types argument to make sure that altitude_mean_meters gets imported as numeric data
Examine your data again to confirm that the changes were made successfully

The code that I created and that you can use to get started is below.

library(tidyverse)

coffee_ratings <- read_csv(
    "coffee_ratings.csv",
    na = c("No rating", "Unknown", "8", "1", "3", "4", "23", "47"),
    col_types = cols(total_cup_points = "d", harvest_year = "i")
)

coffee_ratings

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Caitlin McLemore • March 3, 2026

What is the difference between the data types of number, double, and integer?

Gracielle Higino Coach • March 6, 2026

Hi Caitlin! Both double and integer are types of numbers. Integers (sometimes you can see the notations Int, In16, Int 32 or Int64) refer to whole numbers, without decimals, such as 123, and can represent variables such as years, age and - to be very niche here - sample plots in space. They can often be thought of as categorical variables. Doubles are numbers with decimal digits (like 123.5), often thought of as continuous variables, and can represent variables such as height, length, and weight.

Eda Akpek • March 13, 2026

I am trying to do the practice exercise for altitude_mean_meters. The code shows that it's executing, but I don't see any changes in the histogram.

Gracielle Higino Coach • March 14, 2026

Hi Eda! Do you want to share the exact code you used? The change in the histogram is really subtle, but you should see differences when opening the little "toggle" for more info if you define what should be read as NAs in the dataset. But you're right that changing the column type doesn't change much in this case, but I suggest that you play a bit with the arguments to see what happens in each case. What happens if you don't designate the NA values for that column? What happens if you do? What if you change the column type?

Eda Akpek • March 16, 2026

I figured it out! I set the altitude_mean_meters as an integer, and then, because it changed -999.00 to -999, I was able to use -999 in my code to set as "NA."

Gracielle Higino Coach • March 16, 2026

Awesome! 🎉

Al-Afroza Sultana • March 26, 2026

Hi, as this is a small data set, we could check the data manually in an ascending or descending manner to find out if there's any abnormality. But how do we check or find out any abnormal values in a comparatively large dataset using R? Thanks.

Gracielle Higino Coach • March 26, 2026

There are a few methods to do that! Here's an informative tutorial to inspect outliers with basic min and max functions and with plots: https://statsandr.com/blog/outliers-detection-in-r/

Plotting is one of the most effective ways to detect outliers and discrepancies - very useful in my area, where we deal with millions of biodiversity data points!

One other tool you can take a look at is OpenRefine, which can be integrated with your R scripts: https://openrefine.org/ OpenRefine is great to detect typos.

Lalitha Vaishnavi Subramanyan • May 1, 2026

Inputting 1, 4, 23 from harvest_year etc into na worked here, but with a dataset where those can be legitimate values in another column this approach may fail and corrupt data, is there another approach where we would avoid this pitfall?

Gracielle Higino Coach • May 7, 2026

Great questions! Yes, in this case you could skip the na assignment when reading the data, and deal with it in a case-by-case level, perhaps, using mutate(), or case_when(), or na_if(). See some examples here: https://dplyr.tidyverse.org/reference/na_if.html#ref-examples