Get access to all lessons in this course.
-
Welcome
- Welcome to Inferential Statistics with R
- Introduction to the Dataset
-
t-tests
- Independent t-test
- Dependent t-test
-
One-Way ANOVA
- One-Way ANOVA
- Post Hoc Comparisons
- Other ANOVA Tests
-
Chi-Square
- Chi-Square
- Dealing with Small Cells
-
Correlation
- Correlation
-
Regression
- Linear Regression
- Multiple Regression
- Hierarchical Regression
-
Reliability
- Reliability
-
Reporting Results
- Extracting Output
- Reporting Results
-
Testing Assumptions
- Testing Assumptions
- Testing for Normality
- Testing for Homogeneity of Variance
- Violated Assumptions
Inferential Statistics with R
Chi-Square
This lesson is locked
This lesson is called Chi-Square, part of the Inferential Statistics with R course. This lesson is called Chi-Square, part of the Inferential Statistics with R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Your Turn
Perform a chi-square to examine how
grade_class
relates tolive_on_campus
. What is the p-value? Is there a relationship?If there is a significant difference, examine standardized residuals and the observed/expected frequencies to determine what grade class is more or less likely to live on campus. Interpret the results.
You need to be signed-in to comment on this post. Login.
Zach Tilton
March 9, 2022
Hi Dana, thanks for the great videos here. I have been using what I learned in this video a lot lately. However, I recently hit a roadblock when running a chi-square test on two variables in a data set I am working with. Both variables are factors, but one has a level or value I don't want to include in the analysis because it doesn't make sense and would bias my test. To explain, for this test I am looking at the relationship between evaluation report type and generic evaluation use, where report type has the levels "written", "oral", "both", or "none" (which is what I am attempting to filter out) and where generic use has the dichotomous levels of "yes" or "no".
When I run the following code, the "none" level still shows up in the tabyl output, despite the fact I seem to have successfully filtered that level from the original dataframe. This prevents the test from working, though it does show the residuals.
report_type_yes % filter(report_type != "none")
report_type_x_generic_use % tabyl(report_type, generic_use, show_na = FALSE) %>% janitor::chisq.test()
tidy(report_type_x_generic_use)
report_type_x_generic_use$stdres
report_type_x_generic_use$observed
report_type_x_generic_use$expected
Here are the outputs:
statistic p.value parameter method
NaN NaN 3 Pearson's Chi-squared test
(stdres) report_type no yes none NaN NaN written 4.264175 -4.264175 oral 1.489306 -1.489306 both -4.905978 4.905978 (observed) report_type no yes none 0 0 written 53 120 oral 10 23 both 33 235 (expected) report_type no yes none 0.000000 0.00000 written 35.037975 137.96203 oral 6.683544 26.31646 both 54.278481 213.72152
Any thoughts on what might be happening? I recognize this might be more of a data manipulation question, but related to this statistic, nonetheless. Let me know if anything isn't clear or you need more information. Many thanks in advance for responding to a long question.
Zach Tilton
March 9, 2022
Those initial pipe operators don't seem to have translated, but they are correct like the third one in this sample code chunk.
Dana Wanzer
March 17, 2022
Hey Zach! I know we spoke briefly offline about this, but I want to comment here for other students who might also have similar questions. filter() does not drop levels, it just makes it so there are no values within that level.
One option would have been to use the droplevels() function to remove the "none" option.
Another option, that I think you used, was to use the forcats package with the fct_drop() function to drop levels within a categorical variable.