Skip to content
R for the Rest of Us Logo

This lesson is locked

Get access to all lessons in this course.


Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

  1. Perform a chi-square to examine how grade_class relates to live_on_campus. What is the p-value? Is there a relationship?

  2. If there is a significant difference, examine standardized residuals and the observed/expected frequencies to determine what grade class is more or less likely to live on campus. Interpret the results.

Learn More

Read more about the chisq.test() function in the janitor package.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Zach Tilton

Zach Tilton

March 10, 2022

Hi Dana, thanks for the great videos here. I have been using what I learned in this video a lot lately. However, I recently hit a roadblock when running a chi-square test on two variables in a data set I am working with. Both variables are factors, but one has a level or value I don't want to include in the analysis because it doesn't make sense and would bias my test. To explain, for this test I am looking at the relationship between evaluation report type and generic evaluation use, where report type has the levels "written", "oral", "both", or "none" (which is what I am attempting to filter out) and where generic use has the dichotomous levels of "yes" or "no".

When I run the following code, the "none" level still shows up in the tabyl output, despite the fact I seem to have successfully filtered that level from the original dataframe. This prevents the test from working, though it does show the residuals.

report_type_yes % filter(report_type != "none")

report_type_x_generic_use % tabyl(report_type, generic_use, show_na = FALSE) %>% janitor::chisq.test()





Here are the outputs:

statistic p.value parameter method

NaN NaN 3 Pearson's Chi-squared test

(stdres) report_type no yes none NaN NaN written 4.264175 -4.264175 oral 1.489306 -1.489306 both -4.905978 4.905978 (observed) report_type no yes none 0 0 written 53 120 oral 10 23 both 33 235 (expected) report_type no yes none 0.000000 0.00000 written 35.037975 137.96203 oral 6.683544 26.31646 both 54.278481 213.72152

Any thoughts on what might be happening? I recognize this might be more of a data manipulation question, but related to this statistic, nonetheless. Let me know if anything isn't clear or you need more information. Many thanks in advance for responding to a long question.

Zach Tilton

Zach Tilton

March 10, 2022

Those initial pipe operators don't seem to have translated, but they are correct like the third one in this sample code chunk.

Dana Wanzer

Dana Wanzer

March 17, 2022

Hey Zach! I know we spoke briefly offline about this, but I want to comment here for other students who might also have similar questions. filter() does not drop levels, it just makes it so there are no values within that level.

One option would have been to use the droplevels() function to remove the "none" option.

Another option, that I think you used, was to use the forcats package with the fct_drop() function to drop levels within a categorical variable.