Introduction

The vaccination campaign in the United States has sunk from its prior highs. While more than 3 million individuals were getting the injection every day in mid-April, that number is now down to approximately 1.2 million — less than half of what it was at its height. If the vaccine effort fails to gain traction, the United States will be susceptible to a variety of threats, including summer outbreaks, coronavirus resurgences, and the emergence of new and deadly variants of the virus. Therefore, the question and the disposition regarding vaccination have become an important study variable. As a first step in figuring out what will make a difference, it’s critical to understand why a segment of the population, even a minority, isn’t getting vaccinated. This small research exercise wonders if the vaccination preferences (in favor or against) are more predominant in a certain political spectrum of the American society (Democrats or Republicans). In this exercise, we hope to confirm or deny a relationship between the position regarding vaccination and political preferences.

Research Question

Can political preferences (democrats or republicans) predict vaccination opinions in the US ?

Why should there be a relationship between vaccination inclination and political party preference?

The latest Kaiser Family Foundation study found a relationship between racial and political factors and COVID-19 vaccination preferences. The study concluded that, among the third of Americans have not received the COVID-19 vaccine, Americans who say they will definitely not get vaccinated against COVID-19 are “overwhelmingly white and Republican”. This is one of the earliest studies to research a correlation in party and vaccination preferences; in addition to those remarks, this study would like to find extensive statistical proof that will confirm or deny such theory. Furthermore, if political preferences do not have significal correlation, what others variables do?

Variables to be analysed

Dependent variable: Opinion regarding vaccination (V202383x) Independent variable: Presidential vote preferences (V202105x) Independent variable: Education (V201510) Independent variable: Age (v201507x)

#Conducting research in R

Basic package instalation:

install.packages("ggplot2",repos = "http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/7l/6yd6vjv54_bgxbg9nxm3j_zr0000gn/T//RtmpqbZeJ0/downloaded_packages
library("ggplot2")
d=read.csv("anes_2020_data.csv")

As the dependent variable V202383x (“Opinion regarding vaccination”) was an ordinal variable, we decided to evaluate them in terms of Vaccine Acceptance scale. For that reason, respondents that believed that “Vaccination health benefits much greater than risks” had the highest Vaccination Acceptance, whereas respondents who believed that “Vaccination risks were much greater than health benefits” had the lowest. For that reason, I reorganized the variables in order to make a scale that expressed the results more intuitively, being 1 the lowest acceptance and 7 the highest:

d$vaccine = d$V202383x
d$vaccine[d$vaccine < 0]=NA

s = d[,c("V202383x")]
vaccine2 = subset(d,(V202383x>0))

d$vaccine2[d$vaccine==1]=7
d$vaccine2[d$vaccine==2]=6
d$vaccine2[d$vaccine==3]=5
d$vaccine2[d$vaccine==4]=4
d$vaccine2[d$vaccine==5]=3
d$vaccine2[d$vaccine==6]=2
d$vaccine2[d$vaccine==7]=1

We now decided to start evaluating our independent variables. The first step was to get rid off irrelevant data, such as values below 0. Moreover, because the original coding of the variable did separate Republican and Democrat between actual vote and vote intention, it was better to relabel the variable more straightforwardly. By doing so, I only take into account Democratic or Republican intention to vote.

d$vote = as.numeric(as.character(d$V202105))
d$vote[d$vote < 0]=NA
d$vote[d$vote == 10|d$vote== 30] = "Democratic"
d$vote[d$vote == 11|d$vote== 31] = "Republican"
d$vote[d$vote == 12|d$vote== 32] = "Other"

Now, we proceed to create our first linear regression model between our first two variables. Vaccination acceptance and Republican voters.

fit1 = lm(vaccine2 ~vote, data=d)
summary(fit1)
## 
## Call:
## lm(formula = vaccine2 ~ vote, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1821 -0.5215  0.8179  0.8179  1.4785 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     6.18213    0.02659 232.506  < 2e-16 ***
## voteOther      -0.39514    0.12450  -3.174  0.00151 ** 
## voteRepublican -0.66067    0.04020 -16.435  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.581 on 6452 degrees of freedom
##   (1825 observations deleted due to missingness)
## Multiple R-squared:  0.04029,    Adjusted R-squared:  0.03999 
## F-statistic: 135.4 on 2 and 6452 DF,  p-value: < 2.2e-16

From the output, we can observe that the residual distribution is not symmetrical. Thus, we conclude that certain predicted points are away from the actual observed points.

coef(fit1)
##    (Intercept)      voteOther voteRepublican 
##      6.1821267     -0.3951444     -0.6606722

When observing the coefficients, we se that the intercept is 6.1821267, which means that for the average respondent in our data set, the vaccine approval is 6.1821267.

When looking at the second row estimate for “voteOther” we conclude that the vaccine approval for this group is 0.39514 points less than the average. In the same way, we can conclude that the republican group has a 0.66 less vaccine approval on average.

Now, we want to plot on those results using a bar chart, in order to see if it is true that vaccine acceptance varies throughout political parties.

test = data.frame(value = tapply(d$vaccine2, d$vote,mean, na.rm=T))
test$names = rownames(test)
test$names2 = c("1. Democratic candidate","2. Republican candidate","3. Other")
ggplot(test,aes(x = names2,y=value - 1)) + 
  geom_bar(stat = "identity") + 
  labs(x="",y="Vaccine acceptance",title="Figure 1. Presidential vote preference and vaccine opinion")  

What the barchart portrays is that, on average, Democrats tend to have a higher vaccine acceptance, whereas Republicans, in average, have a lower vaccine acceptance.

It is also important to note that, the difference in the average vaccine approval between those two does not variate greatly, which can make sense, considering that we choosed to plot the mean. But, in order to see, if the share varies greatly, more in-depth graphs would be necessary.

First of all, we will conduct a chi-squared test to see if there is a high correlation between those variables.

table(d$vaccine2,d$vote)
##    
##     Democratic Other Republican
##   1         49     7        126
##   2        106     2        131
##   3         56     3         74
##   4        337    31        480
##   5        160     7        167
##   6        513    34        585
##   7       2315    85       1187
TAB = table(d$vaccine2,d$vote)
chisq.test(TAB, simulate.p.value = TRUE)
## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  TAB
## X-squared = 347.84, df = NA, p-value = 0.0004998

After the chi-squared test, we conclude that the p-value is far from 1; which means we have low correlation between our variables. However, it is also important to note that the result is the dependent on the domain of our analysis. Thus, it makes sense that the correlation is not as high as expected, and, on the contrary is very low. Considering that the US only has two major parties and a large population, it is to be expected that there is larger diversity and randomization inside them

Now, we would proceed to look at our initial results in Table 1 more deeply. We know want to see the share of vaccine acceptance within the parties:

barplot(TAB, beside=T, legend=T,
        legend.text = TRUE, 
        args.legend = list(x = "topleft"),
        xlab = "Political Preference", ylab = "Vaccination Acceptance",
        main = "Figure 2. Vaccine Acceptance and Party Preference ")
## Warning in plot.window(xlim, ylim, log = log, ...): "legend" is not a graphical
## parameter
## Warning in axis(if (horiz) 2 else 1, at = at.l, labels = names.arg, lty =
## axis.lty, : "legend" is not a graphical parameter
## Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...):
## "legend" is not a graphical parameter
## Warning in axis(if (horiz) 1 else 2, cex.axis = cex.axis, ...): "legend" is not
## a graphical parameter

Our results show that, in fact, Democrats have a largest share of their party advocates in favor of vaccine benefits; whereas the share for republicans is notably smaller. It is true that, for republicans, the share of voters that do not believe in vaccine benefits is greater than for democrats; however, in both cases, the most common opinion is “greatly in favor of vaccination benefits”. We can conclude that, compared to Democrats, Republicans have a greater share of vaccination exceptics than Democrats.

Multivariate Analysis

Now, we want to do some multivariate analysis to understand if other variables such as age and education level can be correlated as well.

d$age = d$V201507x
d$age[d$age<18]=NA

d$education=d$V201510
d$education[d$education < 0]=NA
fit3 = lm(vaccine2 ~ vote + age + education, data=d)
summary(fit3)
## 
## Call:
## lm(formula = vaccine2 ~ vote + age + education, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.0593 -0.6419  0.5810  1.0601  2.0101 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     5.438025   0.066436  81.854  < 2e-16 ***
## voteOther      -0.249544   0.126440  -1.974   0.0485 *  
## voteRepublican -0.702304   0.040554 -17.318  < 2e-16 ***
## age             0.013645   0.001173  11.637  < 2e-16 ***
## education       0.008591   0.001952   4.402 1.09e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.557 on 6216 degrees of freedom
##   (2059 observations deleted due to missingness)
## Multiple R-squared:  0.06468,    Adjusted R-squared:  0.06408 
## F-statistic: 107.5 on 4 and 6216 DF,  p-value: < 2.2e-16

In fact, when we correlate age, education level and party preference, we find higher correlation with vaccine acceptance (0.06408) or 6% accuracy. Now, I would like to see if there is even a stronger correlation between vaccine approval and education.

fit2 = lm(vaccine2 ~ education, data=d)
summary(fit2)
## 
## Call:
## lm(formula = vaccine2 ~ education, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.6291 -0.8268  1.1552  1.1732  1.2183 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5.772669   0.021954 262.940  < 2e-16 ***
## education   0.009015   0.001900   4.744 2.13e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.635 on 7308 degrees of freedom
##   (970 observations deleted due to missingness)
## Multiple R-squared:  0.00307,    Adjusted R-squared:  0.002934 
## F-statistic: 22.51 on 1 and 7308 DF,  p-value: 2.133e-06

What we find is that, it is, in fact (0.002934) lower than party preference (0.03999). Still, it may be worth to plot our findings and see if a higher education level makes respondents more pro-vaccination.

test = data.frame(value = tapply(d$vaccine2, d$education,mean, na.rm=T))
test$names = rownames(test)
test$names2 = c("Less","High School","Some College", "Occupational","Academic","Bachelor's","Masters", "Phd", "Other")

ggplot(test,aes(x = names2,y=value - 1)) + 
  geom_bar(stat = "identity") + 
  labs(x="",y="Vaccine acceptance",title="Figure 3. Highest education level and vaccine acceptance")

We find that respondents with a professional degree, such as a Phd show the greatest vaccine acceptance. Similarly, as the educational level decreseases of increases or decreases, the vaccine acceptance follows the trend. For instance, respondents with less than a high school diploma, also have the lowest vaccine acceptance results.

boxplot(vaccine2 ~ education, data = d,
        xlab = "Highest Educational Degree", ylab = "Vaccine Acceptance", main = "Figure 4. Vaccine Acceptance and Level of ducation",
        names=c("Less","HS","SColl", "Occ","Acad","Bach","MS", "Phd", "Other"))

Box plots are used to show overall patterns of response for a group. The box plot reveals that from respondents holding a bachelor degree and Master, the boxplot is relatively small. Moreover, for respondents with a Phd, there are no outliers. In other words, this suggests that overall respondents with higher educational level have a higher level of agreement with each other.

The box plot is comparatively tall from Academic degree and lower. This suggests that they hold quite different opinions about vaccination benefits.

Conclusion and Findings

On average, Democrats tend to have a higher vaccine acceptance, whereas Republicans, in average, have a lower vaccine acceptance. When looked into detail, Democrats have a largest share of their party advocates in favor of vaccine benefits; whereas the share for republicans is notably smaller. It is true that, for republicans, the share of voters that do not believe in vaccine benefits is greater than for democrats; however, in both cases, the most common opinion is “greatly in favor of vaccination benefits”. We can conclude that, compared to Democrats, Republicans have a greater share of vaccination exceptics than Democrats.

Moving to educational level, as the educational level increases, respondents are more pro-vaccination; similarly, as the education level decreases, respondents show a lower vaccination approval.

Finally, the party preference variable has a stronger correlation to vaccination approval than the education variable. However, when conducting multivariate analysis (education + age + party preference), the variables have a stronger correlation to vaccination opinion.

Responding to our research question: “Can political preferences (democrats or republicans) predict vaccination opinions in the US ?”, political preference can predict vaccination opinios, however it is to a very limited extent and with a very high magin of error. It would be more benefitial to conduct multivariate analysis correlating more variables instead.