Jacob Jameson
Summer 2021
Objectives
Key points
ggplot is an implementation of the Grammar of Graphics by Leland Wilkinson for data visualization
To use ggplot functions, first load tidyverse
library(tidyverse)
In ggplot the simplest structure of the code for plots can often be summarized as
ggplot(data = [dataset],
mapping = aes(x = [x-variable],
y = [y-variable]) +
...
geom_xxx() +
other options
insurance = read.csv('insurance.csv')
ggplot(data = insurance,
mapping = aes(x = bmi,
y = charges)) +
geom_point()
How would you describe this relationship? What other variables would help us understand data points that don't follow the overall trend?
To display values, map variables in the data to visual properties of the geom (aesthetics)
ggplot(data=insurance,
mapping=aes(x=bmi,
y=charges,
color=smoker,
size=age)) +
geom_point() +
labs(title="BMI vs. Charges", x= "BMI", y="Charges")
Questions:
Univariate data analysis
Bivariate data analysis
Multivariate data analysis
Numerical variables
Categorical variables
numerical
categorical
ggplot(data = insurance, mapping = aes(x = bmi)) +
geom_histogram(binwidth = 1)
ggplot(data = insurance, mapping = aes(x = bmi)) +
geom_density()
ggplot(data = insurance, mapping = aes(x = region)) +
geom_bar()
num. vs num.
num. vs cat.
cat. & cat.
ggplot(data=insurance,
mapping=aes(x=bmi,
y=charges)) +
geom_point(size=3)
ggplot(data=insurance,
mapping=aes(x=bmi,
y=charges))+
geom_point(size=3)+
geom_smooth(se=F)
ggplot(data = insurance, mapping = aes(y = bmi, x = region)) +
geom_boxplot()
ggplot(data = insurance, mapping = aes(x = region, fill = smoker)) +
geom_bar()