%>%
midwest ggplot(aes(x = percollege,
y = percbelowpoverty,
color = state,
size = poptotal,
alpha = percpovertyknown)) +
geom_point() + facet_wrap(vars(state))
Module 6: Data Visualization as a Tool for Analysis
We introduce ggplot package and learn how they can explore and analyze data through more complex visualization.
Data Visualization as a Tool for Analysis
Download a copy of Module 6 slides
Download data for Module 6 lab and tutorial
Lab 6
In this lab, you will work with midwest.dta
.
General Guidelines:
You will encounter a few functions we did not cover in the lecture video. This will give you some practice on how to use a new function for the first time. You can try following steps:
- Start by typing
?new_function
in your Console to open up the help page - Read the help page of this new_function. The description might be too technical for now. That’s OK. Pay attention to the Usage and Arguments, especially the argument
x
orx
,y
(when two arguments are required) - At the bottom of the help page, there are a few examples. Run the first few lines to see how it works
- Apply it in your lab questions
It is highly likely that you will encounter error messages while doing this lab Here are a few steps that might help get you through it.
- Locate which line is causing this error first
- Check if you may have a typo in the code. Sometimes another person can spot a typo faster than you.
- If you enter the code without any typo, try googling the error message
- Scroll through the top few links see if any of them helps
- Try working on the next few questions while waiting for answers by TAs
Questions
Recall ggplot works by mapping data to aesthetics and then telling ggplot how to visualize the aesthetic with geoms. Like so:
- Which is more highly correlated with poverty at the county level, college completion rates or high school completion rates? Is it consistent across states? Change one line of code in the above graph.
geoms
For the following, write code to reproduce each plot using midwest
- Notice here
inmetro
is numeric, but I want it to behave like a discrete variable so I usex = as.character(inmetro)
. Uselabs(title = "Asian population by metro status")
to create the title.
Use
geom_boxplot()
instead ofgeom_point()
for “Asian population by metro status”Use
geom_jitter()
instead ofgeom_point()
for “Asian population by metro status”Use
geom_jitter()
andgeom_boxplot()
at the same time for “Asian population by metro status”. Does order matter?Histograms are used to visualize distributions. What happens when you change the bins argument? What happens if you leave the bins argument off?
%>%
midwest ggplot(aes(x = perchsd)) +
geom_histogram(bins = 100) +
labs(title = "distribution of county-level hs completion rate")
Remake “distribution of county-level hs completion rate” with
geom_density()
instead ofgeom_histogram()
.Add a vertical line at the median
perchsd
usinggeom_vline
. You can calculate the median directly in the ggplot code.
Aesthetics
For the following, write code to reproduce each plot using midwest
- Use
x
,y
,color
andsize
- Use
x
,y
,color
andsize
- When making bar graphs,
color
only changes the outline of the bar. Change the aestethic name to fill to get the desired result
%>%
midwest count(state) %>%
ggplot(aes(x = state,
y = n,
color = state)) +
geom_col()
- There’s a geom called
geom_bar
that takes a dataset and calculates the count. Read the following code and compare it to thegeom_col
code above. Describe howgeom_bar()
is different thangeom_col
%>%
midwest ggplot(aes(x = state,
color = state)) +
geom_bar()
Well done! You’ve learned how to work with R to create some awesome looking visuals!
Want to improve this tutorial? Report any suggestions/bugs/improvements on here! We’re interested in learning from you how we can make this tutorial better.