<- readRDS("impeach.rds") impeach
R Assignment #3
House Democrats - First Trump Impeachment
Load the data
Run the code below to load the data.
It will create a dataframe called impeach
, which contains a row for every House Democrat and whether or not the member publicly called for impeachment in the case of the first Trump impeachment.
Additionally, a series of election results and demographics are included related to each member’s district.
Questions
Write your code using grouping in the chunks below to help answer the following questions.
- How many members in the dataset favor impeachment vs. not (using the
for_impeachment
column)?
# METHOD 1:
%>%
impeachfilter(for_impeachment == "YES") %>%
summarise(count = n())
# A tibble: 1 × 1
count
<int>
1 209
%>%
impeach filter(for_impeachment == "NO") %>%
summarise(count = n())
# A tibble: 1 × 1
count
<int>
1 26
# OR METHOD 2:
%>%
impeach group_by(for_impeachment) %>%
summarise(count = n())
# A tibble: 2 × 2
for_impeachment count
<chr> <int>
1 NO 26
2 YES 209
# There were 26 people that were not in favor of impeachment, and 209 people who were in favor of impeachment.
- Similar to #1 above, using grouping to now break down the Democratic House members by TWO measures: those who are for or against impeachment (as you did above)….and then districts above/below national GDP.
You’ll want to ensure you do the grouping on both columns together, e.g. group_by(column1, column2)
%>%
impeach group_by(for_impeachment, gdp_above_national) %>%
summarise(count = n())
`summarise()` has grouped output by 'for_impeachment'. You can override using
the `.groups` argument.
# A tibble: 4 × 3
# Groups: for_impeachment [2]
for_impeachment gdp_above_national count
<chr> <chr> <int>
1 NO ABOVE 7
2 NO BELOW 19
3 YES ABOVE 126
4 YES BELOW 83
# Of those NOT in favor of impeachment, 19 were below the GDP and 7 were above. Of those IN favor
# of impeachment, 126 were above the GDP and 83 were below the GDP.
- Now do the same as #2, but this time instead of GDP, group by whether the district is above or below the national average for the percentage of college graduates. The column that captures this information is
pct_bachelors_compared_to_national
.
%>%
impeach group_by(for_impeachment, pct_bachelors_compared_to_national) %>%
summarise(count = n())
`summarise()` has grouped output by 'for_impeachment'. You can override using
the `.groups` argument.
# A tibble: 4 × 3
# Groups: for_impeachment [2]
for_impeachment pct_bachelors_compared_to_national count
<chr> <chr> <int>
1 NO ABOVE 7
2 NO BELOW 19
3 YES ABOVE 128
4 YES BELOW 81
# Of those AGAINST impeachment, 7 were above the national average for the percentage of college graduates and 19
# were below. Whereas of those FOR impeachment, 128 were above the national average for percent of college graduates
# and 81 were below the national average.
- Let’s look at the college graduation comparison in a slightly different way.
Instead of counting how many districts are above/below the national average, this time summarize by the MEAN percentage of college grads (located in the column pct_bachelors
) for districts that are Yes for impeachment vs. No.
In other words, you want to end up with the calculated mean for what that percentage is for the Yes districts and the No districts.
%>%
impeach group_by(for_impeachment) %>%
summarize(avg = mean(pct_bachelors))
# A tibble: 2 × 2
for_impeachment avg
<chr> <dbl>
1 NO 27.7
2 YES 33.7
# The mean for those against impeachment is 27.7%, while the mean for those in favor of impeachment is 33.7%.
- Do the same as #4, but this time show the MEAN percentage of the vote that Donald Trump received for districts that are Yes for impeachment vs. No.
The relevant column for that istrump_percent
.
%>%
impeach group_by(for_impeachment) %>%
summarize(avg = mean(trump_percent))
# A tibble: 2 × 2
for_impeachment avg
<chr> <dbl>
1 NO 43.8
2 YES 32.0
# For those against impeachment, the mean percentage of the vote that Trump received is 43.8%. For those in favor
# of impeachment, the mean percentage is 32%.
- Filter out only the members who are a yes for impeachment. Then of those “Yes” member, how many won their 2018 election by less than 5 percentage points (
margin_flag_2018
) vs. more?
%>%
impeach filter(for_impeachment == "YES") %>%
group_by(margin_flag_2018) %>%
summarize(count = n())
# A tibble: 2 × 2
margin_flag_2018 count
<chr> <int>
1 5_points_or_less 17
2 more_than_5_points 192
# For those in favor of impeachment, 17 won by less than 5 percentage points, and 192 won by more than
# 5 percentage points.
- Come up with another breakdown of your choosing for how you’d like to examine this dataset. Say what you’ll look at, and then put the code below to find the answer.
# Of House Democrats against impeachment, how many were from the South? (For this information, I will be
# referencing states considered the South by U.S. federal government.)
%>%
impeach filter(for_impeachment == "NO") -> against_impeachment
%>%
against_impeachment group_by(state) %>%
summarize(count = n())
# A tibble: 20 × 2
state count
<chr> <int>
1 AL 1
2 AZ 1
3 FL 2
4 GA 1
5 HI 1
6 IL 1
7 KS 1
8 ME 1
9 MN 1
10 NJ 1
11 NM 1
12 NV 2
13 NY 2
14 OK 1
15 OR 1
16 PA 1
17 SC 1
18 TX 4
19 UT 1
20 WI 1
# Of the 26 against impeachment, 10 were from Southern states.
- You’ll see a column in the table called date_announced. For members who came out as a Yes for impeachment, this is the date they announced their support for it.
Use the mutate() function to create two new columns: one that extracts the year from date_announced, and a second that extracts the month.
%>%
impeach mutate(year_anc = year(date_announced), month_anc = month(date_announced)) -> new_columns
- Using the new columns you created in #7, use grouping to count up how many House Democrats during each month announced their support of impeachment.
%>%
new_columns filter(for_impeachment == "YES") %>%
group_by(month_anc) %>%
summarize(count = n())
# A tibble: 9 × 2
month_anc count
<dbl> <int>
1 1 3
2 4 7
3 5 39
4 6 27
5 7 33
6 8 18
7 9 76
8 11 2
9 12 4