R Assignment #3

Author

Rachel Garton

Published

May 09, 2023

House Democrats - First Trump Impeachment

Load the data

Run the code below to load the data.

It will create a dataframe called impeach, which contains a row for every House Democrat and whether or not the member publicly called for impeachment in the case of the first Trump impeachment.

Additionally, a series of election results and demographics are included related to each member’s district.

impeach <- readRDS("impeach.rds")

Questions

Write your code using grouping in the chunks below to help answer the following questions.

How many members in the dataset favor impeachment vs. not (using the for_impeachment column)?

# METHOD 1:

impeach%>% 
  filter(for_impeachment == "YES") %>% 
  summarise(count = n())

# A tibble: 1 × 1
  count
  <int>
1   209

impeach %>% 
  filter(for_impeachment == "NO") %>% 
  summarise(count = n())

# A tibble: 1 × 1
  count
  <int>
1    26

# OR METHOD 2:

impeach %>% 
  group_by(for_impeachment) %>% 
  summarise(count = n())

# A tibble: 2 × 2
  for_impeachment count
  <chr>           <int>
1 NO                 26
2 YES               209

# There were 26 people that were not in favor of impeachment, and 209 people who were in favor of impeachment.

Similar to #1 above, using grouping to now break down the Democratic House members by TWO measures: those who are for or against impeachment (as you did above)….and then districts above/below national GDP.

You’ll want to ensure you do the grouping on both columns together, e.g. group_by(column1, column2)

impeach %>% 
  group_by(for_impeachment, gdp_above_national) %>% 
  summarise(count = n())

`summarise()` has grouped output by 'for_impeachment'. You can override using
the `.groups` argument.

# A tibble: 4 × 3
# Groups:   for_impeachment [2]
  for_impeachment gdp_above_national count
  <chr>           <chr>              <int>
1 NO              ABOVE                  7
2 NO              BELOW                 19
3 YES             ABOVE                126
4 YES             BELOW                 83

# Of those NOT in favor of impeachment, 19 were below the GDP and 7 were above. Of those IN favor
# of impeachment, 126 were above the GDP and 83 were below the GDP.

Now do the same as #2, but this time instead of GDP, group by whether the district is above or below the national average for the percentage of college graduates. The column that captures this information is pct_bachelors_compared_to_national.

impeach %>% 
  group_by(for_impeachment, pct_bachelors_compared_to_national) %>% 
  summarise(count = n())

`summarise()` has grouped output by 'for_impeachment'. You can override using
the `.groups` argument.

# A tibble: 4 × 3
# Groups:   for_impeachment [2]
  for_impeachment pct_bachelors_compared_to_national count
  <chr>           <chr>                              <int>
1 NO              ABOVE                                  7
2 NO              BELOW                                 19
3 YES             ABOVE                                128
4 YES             BELOW                                 81

# Of those AGAINST impeachment, 7 were above the national average for the percentage of college graduates and 19
# were below. Whereas of those FOR impeachment, 128 were above the national average for percent of college graduates
# and 81 were below the national average.

Let’s look at the college graduation comparison in a slightly different way.

Instead of counting how many districts are above/below the national average, this time summarize by the MEAN percentage of college grads (located in the column pct_bachelors) for districts that are Yes for impeachment vs. No.

In other words, you want to end up with the calculated mean for what that percentage is for the Yes districts and the No districts.

impeach %>% 
  group_by(for_impeachment) %>% 
  summarize(avg = mean(pct_bachelors))

# A tibble: 2 × 2
  for_impeachment   avg
  <chr>           <dbl>
1 NO               27.7
2 YES              33.7

# The mean for those against impeachment is 27.7%, while the mean for those in favor of impeachment is 33.7%.

Do the same as #4, but this time show the MEAN percentage of the vote that Donald Trump received for districts that are Yes for impeachment vs. No.
The relevant column for that is trump_percent.

impeach %>% 
  group_by(for_impeachment) %>% 
  summarize(avg = mean(trump_percent))

# A tibble: 2 × 2
  for_impeachment   avg
  <chr>           <dbl>
1 NO               43.8
2 YES              32.0

# For those against impeachment, the mean percentage of the vote that Trump received is 43.8%. For those in favor
# of impeachment, the mean percentage is 32%.

Filter out only the members who are a yes for impeachment. Then of those “Yes” member, how many won their 2018 election by less than 5 percentage points (margin_flag_2018) vs. more?

impeach %>% 
  filter(for_impeachment == "YES") %>% 
  group_by(margin_flag_2018) %>% 
  summarize(count = n())

# A tibble: 2 × 2
  margin_flag_2018   count
  <chr>              <int>
1 5_points_or_less      17
2 more_than_5_points   192

# For those in favor of impeachment, 17 won by less than 5 percentage points, and 192 won by more than
# 5 percentage points.

Come up with another breakdown of your choosing for how you’d like to examine this dataset. Say what you’ll look at, and then put the code below to find the answer.

# Of House Democrats against impeachment, how many were from the South? (For this information, I will be 
# referencing states considered the South by U.S. federal government.)

impeach %>% 
  filter(for_impeachment == "NO") -> against_impeachment

against_impeachment %>% 
  group_by(state) %>% 
  summarize(count = n())

# A tibble: 20 × 2
   state count
   <chr> <int>
 1 AL        1
 2 AZ        1
 3 FL        2
 4 GA        1
 5 HI        1
 6 IL        1
 7 KS        1
 8 ME        1
 9 MN        1
10 NJ        1
11 NM        1
12 NV        2
13 NY        2
14 OK        1
15 OR        1
16 PA        1
17 SC        1
18 TX        4
19 UT        1
20 WI        1

# Of the 26 against impeachment, 10 were from Southern states.

You’ll see a column in the table called date_announced. For members who came out as a Yes for impeachment, this is the date they announced their support for it.

Use the mutate() function to create two new columns: one that extracts the year from date_announced, and a second that extracts the month.

impeach %>% 
  mutate(year_anc = year(date_announced), month_anc = month(date_announced)) -> new_columns

Using the new columns you created in #7, use grouping to count up how many House Democrats during each month announced their support of impeachment.

new_columns %>% 
  filter(for_impeachment == "YES") %>% 
  group_by(month_anc) %>% 
  summarize(count = n())

# A tibble: 9 × 2
  month_anc count
      <dbl> <int>
1         1     3
2         4     7
3         5    39
4         6    27
5         7    33
6         8    18
7         9    76
8        11     2
9        12     4