Skip to content
Coming soon: Ally. Your guide to the world of AI and R. Learn More →
R for the Rest of Us Logo

R in 3 Months (Fall 2025)

arrange()

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video
# Load Packages -----------------------------------------------------------

library(tidyverse)

# Import Data -------------------------------------------------------------

penguins <- read_csv("penguins.csv")

# arrange() ---------------------------------------------------------------

# With arrange(), we can reorder rows in a data frame based on the values 
# of one or more variables. 
# R arranges in ascending order by default.

penguins |> 
  arrange(bill_length_mm)

# We can also arrange in descending order using desc().

penguins |>  
  arrange(desc(bill_length_mm))

# We often use arrange() at the end of pipelines to display things in order.

penguins |> 
  group_by(island, year) |> 
  summarize(mean_bill_length = mean(bill_length_mm, na.rm = TRUE)) |> 
  arrange(desc(mean_bill_length))

Your Turn

# Load Packages -----------------------------------------------------------

# Load the tidyverse package

library(tidyverse)

# Import Data -------------------------------------------------------------

# Download data from https://rfor.us/penguins
# Copy the data into the RStudio project
# Create a new R script file and add code to import your data

penguins <- read_csv("penguins.csv")
			
# arrange() ---------------------------------------------------------------

# Use arrange() to display the penguins data frame in order by body mass

# YOUR CODE HERE

# Now display the penguins data in descending order by body mass

# YOUR CODE HERE

# Create a pipeline that does the following:
# 1. Filters to only keep penguins on Biscoe island
# 2. Drops any rows with NA values for the body_mass_g or sex variables
# 3. Calculates the average body mass by sex
# 4. Displays the result in descending order by average body mass

# YOUR CODE HERE

Learn More

To learn more about the arrange() function, check out Chapter 3 of R for Data Science.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Brian Slattery

Brian Slattery • September 20, 2023

I tried using arrange() on variables with string data (island, sex, etc), and it looks like it's sorting by alphabetic order. Is that an accepted usage or is there a different function that's normally used for sorting rows with strings?

Also, is there a corresponding function to desc() that makes explicit that it's sorting in ascending order? I couldn't find one by googling. I'm imagining from a readability standpoint it might be nice to make that clear if there are ascending and descending arranges all mixed together. Or, is that just something that I would write a comment to make clear if needed?

Gracielle Higino

Gracielle Higino Coach • September 20, 2023

Hi Brian! Great questions!

Yes, using arrange() to sort data alphabetically is very common and recommended. [= This should also help you find typos and hidden characters! sort() in base R works the same way, and it has an argument you can use to make it explicit if you are sorting in ascending or descending order. If you really want to make it explicit how you are arranging your data, a trick could be to always use desc() and add a negative sign before that if you're sorting by ascending order:

penguins |> 
  arrange(-desc(island))

I hope this helps! Ping me on Discord if you want to chat more! =D

Jessica France

Jessica France • September 20, 2023

Hi. I answered the last question using this code : penguins |> filter(island == "Biscoe", !is.na(body_mass_g)|is.na(sex)) |> group_by(sex) |> summarise(mean_weight = mean(body_mass_g)) |> arrange(desc(mean_weight))

And I got this output:

A tibble: 3 × 2

sex mean_weight

I will like to verify whether I did anything wrong. I do not know if I am to see the 'NA' output as well.

Jessica France

Jessica France • September 20, 2023

I do not know if the output I copied on here is displaying. After submitting the comment, I do not see it. Kindly let me know if it can be seen on your end. Thanks.

Gracielle Higino

Gracielle Higino Coach • September 21, 2023

Hi Jessica! Don't worry, we can see the formatting on the back end!

I understand your line of thought, but what you are coding translates to something like "take penguins, filter only the rows to which the column island is equal to 'Biscoe', AND the column body_mass_g is not NA, OR the column sex is NA". This final bit doesn't really do anything to your data because the logical operator allows R to include the NAs. So in the end you get the NAs in the sex column because you told R it could include them.

Alternatively, you should use the drop_na() function after you have already filtered by island (for clarity), and proceed with the grouping and summary.

Feel free to follow up on Discord if it's not clear! [=

Maria Dougherty

Maria Dougherty • April 26, 2024

Hi Grace! How do I join the discord? Thank you!

Libby Heeren

Libby Heeren Coach • April 26, 2024

Hello, Maria! The Discord server you see mentioned here is for members of the R in 3 Months program! Sorry for any confusion!

gene trevino

gene trevino • May 22, 2025

When I run the code and the code in the solutions,

penguins |> filter(island == "Biscoe") |> drop_na(body_mass_g, sex) |> group_by(sex) |> summarize(avg_body_mass = mean(body_mass_g)) |> arrange(desc(avg_body_mass))

I get the following:

Warning message: There were 3 warnings in summarize(). The first warning was: ℹ In argument: avg_body_mass = mean(body_mass_g). ℹ In group 1: sex = "NA". Caused by warning in mean.default(): ! argument is not numeric or logical: returning NA ℹ Run warnings()dplyr::last_dplyr_warnings() to see the 2 remaining warnings.

Gracielle Higino

Gracielle Higino Coach • May 23, 2025

Hi Gene! I can't reproduce the warning with the code you provided, but it seems like R is misunderstanding some of your variables. Maybe you have objects with the same name on your environment. Do you get a 2x2 tibble like this?

# A tibble: 2 × 2
  sex    avg_body_mass
  <chr>          <dbl>
1 male           5105.
2 female         4319.

If so, you should ignore the warning. If not, then try refreshing your session without saving the history or RData.

Let us know if the message persists!

Kaela Scott

Kaela Scott • September 30, 2025

If I wanted to arrange and remove NAs when I arranged, can I add that in as an argument or do I need to do that first?

penguins |> arrange(body_mass_g, rm.na = TRUE)

OR

penguins |> drop_na(body_mass_g) |> arrange(body_mass_g)

Gracielle Higino

Gracielle Higino Coach • October 2, 2025

Hi Kaela! That's a very interesting question!

The na.rm argument ignores NAs when doing an operation, it doesn't actually remove them from the dataset by default. That's why you end up with the same number of rows than the original data if you run arrange(body_mass_g, na.rm = TRUE).

On the other hand, drop_na() removes the NAs from the dataset, and that's why you end up with less rows when running drop_na(body_mass_g) |> arrange(body_mass_g).

I hope that helps! Let me know what other questions you have!

Jessica Purser

Jessica Purser • October 17, 2025

Hi! It would be helpful if you also put the results of the functions in the solution. I'm left wondering if my code, which ran, gave me the correct answers. For the last question, I used the code: penguins |> filter(island == "Biscoe") |> drop_na(body_mass_g,sex) |> summarize(mass_sex = mean(body_mass_g),.by=c(sex)) |> arrange(desc(mass_sex))

In the console, my answers were male 5105 and female 4319, but I'm still like, ah, I don't know if I did it right.

Gracielle Higino

Gracielle Higino Coach • October 19, 2025

Hi Jessica! I totally get it and it does take time sometimes for us to trust R! Overall, if the process is exactly the same, it should yield the same results. This means that if the code on the solutions is the same that you are using, both would get the same numbers. The code is what's important here, and getting the same numbers is not a guarantee that you got the process right.

But I do understand the importance of "testing the machine". One thing I would do when I first started learning R, 15 years ago, would be to do a series of stats in Excel (and some by hand) and then compare the result with what I'd get with R. Excel often gives a slightly different result because of how it deals with decimals and rounding, but overall the results were the same. [=

Course Content

128 Lessons