Skip to content
R for the Rest of Us: A Statistics-Free Introduction comes out June 25th. Or you can read the online version today. Check it out →
R for the Rest of Us Logo

This lesson is locked

Get access to all lessons in this course.

If the video is not playing correctly, you can watch it in a new window

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

Your Turn

  1. Perform a chi-square to examine how grade_class relates to live_on_campus. What is the p-value? Is there a relationship?

  2. If there is a significant difference, examine standardized residuals and the observed/expected frequencies to determine what grade class is more or less likely to live on campus. Interpret the results.

Learn More

Read more about the chisq.test() function in the janitor package.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Zach Tilton

Zach Tilton

March 10, 2022

Hi Dana, thanks for the great videos here. I have been using what I learned in this video a lot lately. However, I recently hit a roadblock when running a chi-square test on two variables in a data set I am working with. Both variables are factors, but one has a level or value I don't want to include in the analysis because it doesn't make sense and would bias my test. To explain, for this test I am looking at the relationship between evaluation report type and generic evaluation use, where report type has the levels "written", "oral", "both", or "none" (which is what I am attempting to filter out) and where generic use has the dichotomous levels of "yes" or "no".

When I run the following code, the "none" level still shows up in the tabyl output, despite the fact I seem to have successfully filtered that level from the original dataframe. This prevents the test from working, though it does show the residuals.

report_type_yes % filter(report_type != "none")

report_type_x_generic_use % tabyl(report_type, generic_use, show_na = FALSE) %>% janitor::chisq.test()

tidy(report_type_x_generic_use)

report_type_x_generic_use$stdres

report_type_x_generic_use$observed

report_type_x_generic_use$expected

Here are the outputs:

statistic p.value parameter method

NaN NaN 3 Pearson's Chi-squared test

(stdres) report_type no yes none NaN NaN written 4.264175 -4.264175 oral 1.489306 -1.489306 both -4.905978 4.905978 (observed) report_type no yes none 0 0 written 53 120 oral 10 23 both 33 235 (expected) report_type no yes none 0.000000 0.00000 written 35.037975 137.96203 oral 6.683544 26.31646 both 54.278481 213.72152

Any thoughts on what might be happening? I recognize this might be more of a data manipulation question, but related to this statistic, nonetheless. Let me know if anything isn't clear or you need more information. Many thanks in advance for responding to a long question.