Jacob Jameson
Fall 2021
# example of a function
circle_area <- function(r) {
pi * r ^ 2
}
“You should consider writing a function whenever you’ve copied and pasted a block of code more than twice (i.e. you now have three copies of the same code)” - Hadley Wickham, R for Data Science
data %>%
mutate(a = (a - min(a)) / (max(a) - min(a)),
b = (b - min(b)) / (max(b) - min(b)),
c = (c - min(c)) / (max(c) - min(c)),
d = (d - min(d)) / (max(d) - min(d)))
rescale_01 <- function(x) {
(x - min(x)) / (max(x) - min(x))
}
data %>%
mutate(a = rescale_01(a),
b = rescale_01(b),
c = rescale_01(c),
d = rescale_01(d))
The anatomy of a function is as follows:
function_name <- function(arguments) {
do_this(arguments)
}
A function consists of
We can assign the function to a name like any other object in R.
x
(x - min(x)) / (max(x) - min(x))
rescale_01
rescale_01 <- function(x) {
(x - min(x)) / (max(x) - min(x))
}
Note that we don’t need to explicitly call return()
You start writing code to say Hello to all of your friends.
print("Hello Kashif!")
[1] "Hello Kashif!"
print("Hello Zach!")
[1] "Hello Zach!"
print("Hello Deniz!")
[1] "Hello Deniz!"
# and so on...
Start with the body.
Ask: What part of the code is changing?
Start with the body.
Rewrite the code to accommodate the parameterization
# print("Hello Kashif!") becomes ...
name <- "Kashif"
print(paste0("Hello ", name, "!"))
[1] "Hello Kashif!"
Check several potential inputs to avoid future headaches
# name <- "Kashiif"
# print(paste0("Hello ", name, "!"))
function(name) {
print(paste0("Hello ", name, "!"))
}
function(name) {
print(paste0("Hello ", name, "!"))
}
Try to use names that actively tell the user what the code does
verb_thing()
calc_size()
or compare_prices()
prices()
, calc()
, or fun1()
.# name <- "Kashif"
# print(paste0("Hello ", name, "!"))
say_hello_to <- function(name) {
print(paste0("Hello ", name, "!"))
}
Test out different inputs!
say_hello_to('Kashif')
[1] "Hello Kashif!"
say_hello_to('Zach')
[1] "Hello Zach!"
say_hello_to('Deniz')
[1] "Hello Deniz!"
# Cool this function is vectorized!
say_hello_to(c("Jason", "Devina", "Andrew"))
[1] "Hello Jason!" "Hello Devina!" "Hello Andrew!"
Question: does name exist in my R environment after I run this function? Why or why not?
Like other R objects functions have types.
Primative functions are of type “builtin”
typeof(`+`)
[1] "builtin"
typeof(sum)
[1] "builtin"
Like other R objects functions have types.
User defined functions, functions loaded with packages and many base R functions are type “closure”:
typeof(say_hello_to)
[1] "closure"
typeof(mean)
[1] "closure"
This is background knowledge that might help you understand an error.
For example, you thought you assigned a number to the name “c” and want to calculate ratio.
ratio <- 1 / c
“Error in 1/c : non-numeric argument to binary operator”
as.integer(c)
“Error in as.integer© : cannot coerce type 'builtin' to vector of type 'integer'”
“builtin” or “closure” in this situation let you know your working with a function!
Your stats prof asks you to simulate a central limit theorem, by calculating the mean of samples from the standard normal distribution with increasing sample sizes.
mean(rnorm(1))
[1] -1.289279
mean(rnorm(3))
[1] 0.7522052
mean(rnorm(30))
[1] 0.2555396
The number is changing, so it becomes the argument.
calc_sample_mean <- function(sample_size) {
mean(rnorm(sample_size))
}
The number is the sample size, so I call it sample_size. n would also be appropriate.
The body code is otherwise identical to the code you already wrote.
For added clarity you can unnest your code and assign the intermediate results to meaningful names.
calc_sample_mean <- function(sample_size) {
random_sample <- rnorm(sample_size)
sample_mean <- mean(random_sample)
return(sample_mean)
}
return()
explicitly tells R what the function will return.
If the function can be fit on one line, then you can write it without the curly brackets like so:
calc_sample_mean <- function(n) mean(rnorm(n))
Some settings call for anonymous functions, where the function has no name.
function(n) mean(rnorm(n))
function(n) mean(rnorm(n))
Try to foresee the kind of input you expect to use.
calc_sample_mean(1)
[1] -0.8116221
calc_sample_mean(1000)
[1] 0.02135588
We see below that this function is not vectorized. We might hope to get 3 sample means out but only get 1.
# read ?rnorm to understand how rnorm
# inteprets vector input.
calc_sample_mean(c(1, 3, 30))
[1] 0.165544
If we don’t want to change our function, but we want to use it to deal with vectors, then we have a couple options: Here we are going to use the function rowwise
#creating a vector to test our function
sample_tibble <- tibble(sample_sizes = c(1, 3, 10, 30))
#using rowwise groups the data by row, allowing calc_smple_mean
sample_tibble %>%
rowwise() %>%
mutate(sample_means = calc_sample_mean(sample_sizes))
# A tibble: 4 x 2
# Rowwise:
sample_sizes sample_means
<dbl> <dbl>
1 1 -0.712
2 3 -0.727
3 10 -0.182
4 30 0.0264
If we want to be able to adjust the details of how our function runs we can add arguments
calc_sample_mean <- function(sample_size, our_mean, our_sd) {
sample <- rnorm(sample_size,
mean = our_mean,
mean(sample),
sd = our_sd)
}
We usually set default values for “detail” arguments.
calc_sample_mean <- function(sample_size,
our_mean=0,
our_sd=1) {
sample <- rnorm(sample_size,
mean = our_mean,
sd = our_sd)
mean(sample)
}
# uses the defults
calc_sample_mean(sample_size = 10)
[1] -0.4997093
# we can change one or two defaults.
# You can refer by name, or use position
calc_sample_mean(10, our_sd = 2)
[1] -0.3240156
calc_sample_mean(10, our_mean = 6)
[1] 5.634323
calc_sample_mean(10, 6, 2)
[1] 6.032516
This won’t work though:
calc_sample_mean(our_mean = 5)
“Error in rnorm(sample_size, mean = our_mean, sd = our_sd) : argument "sample_size” is missing, with no default"
For more: See Functions Chapter in R for Data Science