`# FunctionName(arg.1 = value.1, arg.2 = value.2, ..., arg.n = value.n)`

# Section 1 - Intro to API 222 and R

Notes build on previous TFs (Ibou Dieye, Laura Morris, Amy Wickett & Emily Mower)

The following code is meant as a first introduction to R. It is therefore helpful to run it one line at a time and see what happens. To run one line of code in RStudio, you can highlight the code you want to run and hit “Run” at the top of the script.

On a mac, you can highlight the code you want to run and hit Command + Enter. On a PC, you can highlight the code you want to run and hit Ctrl + Enter. If you ever forget how a function works, you can type `?`

followed immediately (e.g. with no space) by the function name to get the help file.

# Part 1: R Fundamentals

Values can be assigned names and used in subsequent operations. Instead of the traditional “=” sign, the convention in R is “`<-`

”. The “`=`

” sign also works, but I will use “`<-`

” to be consistent with convention.

The general form for calling R functions is:

# Part 2: Vectors

First, we will learn how to make different types of vectors. Our first vector will contain integers 1 through 4:

```
.1 <- c(1, 2, 3, 4)
vecprint(vec.1)
```

`[1] 1 2 3 4`

In R, we use square brackets for indexing. So, for example, if we want to print the 1st element of our vector:

`print(vec.1[1])`

`[1] 1`

If we want to print the 4th element of our vector:

`print(vec.1[4])`

`[1] 4`

If we want to print the 1st and the 4th elements in one go, we make a vector with the desired indices and place that index vector within square brackets:

`print(vec.1[c(1,4)])`

`[1] 1 4`

An alternative way to create `vec.1`

would be using the `seq()`

command, which allows us to generate a vector according to a sequence:

`print(seq(4))`

`[1] 1 2 3 4`

`print(seq(-4))`

`[1] 1 0 -1 -2 -3 -4`

```
.2 <- seq(4)
vecprint(vec.2)
```

`[1] 1 2 3 4`

`print(seq(from = 100, to = 120, by = 5))`

`[1] 100 105 110 115 120`

```
# Help file for seq function
?seq
# Breaking the sequence command into multiple lines
print(seq(from = 100,
to = 120,
by = 5))
```

`[1] 100 105 110 115 120`

`print(1:10)`

` [1] 1 2 3 4 5 6 7 8 9 10`

`print(10:1)`

` [1] 10 9 8 7 6 5 4 3 2 1`

Vectors don’t have to be numeric. They could also be character/string vectors:

```
<- c("Hello", "Time To", "Learn", "R", "!")
word.vec print(word.vec)
```

`[1] "Hello" "Time To" "Learn" "R" "!" `

```
# Using the which() function
print(which(word.vec == "Learn"))
```

`[1] 3`

`print(which(word.vec == "Hi!"))`

`integer(0)`

```
# Finding the length of a vector
print(length(vec.1))
```

`[1] 4`

`print(length(seq(10)))`

`[1] 10`

`print(length(seq(from = 100, to = 120, by = 2)))`

`[1] 11`

```
# Calculating statistics about vectors
print(mean(vec.1))
```

`[1] 2.5`

`print(median(vec.1))`

`[1] 2.5`

`print(min(vec.1))`

`[1] 1`

`print(max(vec.1))`

`[1] 4`

```
# Variance and standard deviation
print(var(vec.1))
```

`[1] 1.666667`

`print(sd(vec.1))`

`[1] 1.290994`

```
# Comparing vectors
.4 <- c(1, 4, 9, 16)
vecprint(vec.4)
```

`[1] 1 4 9 16`

`print(vec.1 == vec.4)`

`[1] TRUE FALSE FALSE FALSE`

```
# Checking if two vectors are exactly the same
print(all.equal(vec.1, vec.2))
```

`[1] TRUE`

`print(all.equal(vec.1, vec.4))`

`[1] "Mean relative difference: 2.222222"`

# Part 3: Logical statements

“if statements” can be very useful. They work as follows:

```
if (the logical statement in these parentheses is TRUE) {
do this}
else {
do that
}
```

Let’s try it.

```
## Example 1:
if (2 + 2 == 5) {
print("Yikes")
else {
} print("Good job!")
}
```

`[1] "Good job!"`

```
## Example 2:
if (vec.1[2] == 2) {
print("Hello")
else {
}print("Goodbye")
}
```

`[1] "Hello"`

# Part 5: Matrices

To make a matrix, use the `matrix()`

command. The first element fed in is the data you want to put in matrix form. Then, you specify the number of rows and columns. By default, it fills information down the columns, but you can tell it to do by row

```
.1 <- matrix(vec.1, nrow = 2, ncol = 2)
mtxprint(mtx.1)
```

```
[,1] [,2]
[1,] 1 3
[2,] 2 4
```

```
.2 <- matrix(vec.1, nrow = 2, ncol = 2, byrow = TRUE)
mtxprint(mtx.2)
```

```
[,1] [,2]
[1,] 1 2
[2,] 3 4
```

Note that `mtx.2`

is the transpose of `mtx.1`

. If you want to transpose a matrix, you can use `t()`

`print(mtx.1)`

```
[,1] [,2]
[1,] 1 3
[2,] 2 4
```

`print(t(mtx.1))`

```
[,1] [,2]
[1,] 1 2
[2,] 3 4
```

As with vectors, you can check if two matrices are equal

`print(mtx.1 == mtx.2)`

```
[,1] [,2]
[1,] TRUE FALSE
[2,] FALSE TRUE
```

`print(all.equal(mtx.1, mtx.2))`

`[1] "Mean relative difference: 0.4"`

`print(mtx.1 == t(mtx.2))`

```
[,1] [,2]
[1,] TRUE TRUE
[2,] TRUE TRUE
```

`print(all.equal(mtx.1, t(mtx.2)))`

`[1] TRUE`

Matrices are indexed by `[row,column]`

`print(mtx.1[1,2])`

`[1] 3`

`print(mtx.2[1,2])`

`[1] 2`

Let’s make a bigger matrix

```
.3 <- matrix(c(vec.1, vec.4), nrow = 4, ncol = 2)
mtxprint(mtx.3)
```

```
[,1] [,2]
[1,] 1 1
[2,] 2 4
[3,] 3 9
[4,] 4 16
```

# Part 6: Random numbers

You can also generate random numbers in R. However, one concern with analyses. done using random numbers is that you might not be able to reproduce them. One way to avoid this is to set the “seed”. Here’s a reference for random seeds: https://en.wikipedia.org/wiki/Random.seed

`set.seed(222)`

We can generate random numbers from all kinds of distributions. For now, we will generate a random normal variable. If I don’t specify a mean or variance, it will assume mean = 0, standard deviation = 1.

```
<- rnorm(1)
norm.var1 print(norm.var1)
```

`[1] 1.487757`

Alternatively, we can specify the mean and standard deviation

```
<- rnorm(1, mean = 100, sd = 10)
norm.var2 print(norm.var2)
```

`[1] 99.98108`

The first element inside the parentheses is how many random variables I want to draw. For example, I could draw 10

```
<- rnorm(10, mean = 5, sd = 1)
norm.vec print(norm.vec)
```

```
[1] 6.381021 4.619786 5.184136 4.753104 3.784439 6.561405 5.427310 3.798976
[9] 6.052458 3.694936
```

You can also draw from other distributions, like the uniform distribution

```
<- runif(1)
uni.var1 print(uni.var1)
```

`[1] 0.2442779`

# Part 7: Data frames

There are lots of great datasets available as part of R packages. Page 14 of Introduction to Statistical Learning with Applications in R. Table 1.1 lays out 15 data sets available from R packages. You can install a package in R using the `install.packages()`

function. Once a package is installed you may use the `library`

function to attach it so that it can be used. Then, every time you want to use the package, you use `library(package.name)`

`library(ISLR)`

Sometimes we will use outside datasets, not contained in R. In order to read data from a file, you have to know what kind of file it is. The table below lists functions that can import data from common plain-text formats.

Data Type | Function |
---|---|

comma separated | `read.csv()` |

tab separated | `read.delim()` |

other delimited formats | `read.table()` |

fixed width | `read.fwf()` |

```
<- College
college.data ?College
```

Let’s learn about our data. To get the names of the columns in the dataframe, we can use the function colnames()

`colnames(college.data)`

```
[1] "Private" "Apps" "Accept" "Enroll" "Top10perc"
[6] "Top25perc" "F.Undergrad" "P.Undergrad" "Outstate" "Room.Board"
[11] "Books" "Personal" "PhD" "Terminal" "S.F.Ratio"
[16] "perc.alumni" "Expend" "Grad.Rate"
```

To find out how many rows and columns are in the dataset, use `dim()`

. Recall that this gives us Rows followed by Columns

`dim(college.data)`

`[1] 777 18`

To find out what type of data is in each column, we can use `typeof()`

`typeof(college.data[,1])`

`[1] "integer"`

`typeof(college.data[,2])`

`[1] "double"`

You can also look in the “environment” tab, press the blue arrow next to college.data and it will drop down showing the column names with their types and first few values. For college, all columns except the first are numeric. The first column is a factor column, which means it’s categorical. To get a better sense of the data, let’s look at it

`View(college.data)`

To grab a column from a dataframe in R, you have 3 popular options:

```
df$column.name
df[,column.number]
df[,"column.name"]
```

This will be useful so we can separate our outcome column from the feature columns. Let’s try! So that we aren’t overwhelmed by output, we will also use the function `head()`

, which prints only the first few entries

`head(college.data$PhD)`

`[1] 70 29 53 92 76 67`

`head(college.data[,13])`

`[1] 70 29 53 92 76 67`

`head(college.data[,"PhD"])`

`[1] 70 29 53 92 76 67`

# Time to Practice!

## Exercise 1: Basic Operations and Functions

Create a new vector: Create a vector `my.vector`

containing any five numbers. Print the vector. Basic calculations: Find the sum, product, and average of the numbers in `my.vector`

.

## Sample Solution

```
# Create a vector with five numbers
<- c(1, 3, 5, 7, 9)
my.vector print(my.vector)
# Perform basic calculations
sum(my.vector)
prod(my.vector)
mean(my.vector)
```

Use a built-in function: Use the `length()`

function to find the length of `my.vector`

.

## Sample Solution

```
# Use a built-in function
length(my.vector)
```

## Exercise 2: Vector Manipulation

Create and modify a vector: Create a numeric vector `numbers`

from 1 to 20. Then, extract and print the first 5 elements.

## Sample Solution

```
# Create and modify a vector
<- 1:20
numbers print(numbers[1:5])
```

Logical indexing: From `numbers`

, create a new vector `even.numbers`

that contains only the even numbers. Print `even.numbers`

.

## Sample Solution

```
# Logical indexing for even numbers
<- numbers[numbers %% 2 == 0]
even.numbers print(even.numbers)
```

Vector arithmetic: Create a new vector that is the square of each element in `numbers`

.

## Sample Solution

```
<- numbers^2
squared.numbers print(squared.numbers)
```

## Exercise 3: Matrices

Create a matrix: Convert `numbers`

into a \(4 \times 5\) matrix `matrix.1`

. Print `matrix.1`

.

## Sample Solution

```
.1 <- matrix(numbers, nrow = 4, ncol = 5)
matrixprint(matrix.1)
```

Matrix transposition: Print the transpose of `matrix.1`

.

## Sample Solution

```
# Matrix transposition
print(t(matrix.1))
```

Matrix indexing: Extract and print the element in the 2nd row and 3rd column of `matrix.1`

.

## Sample Solution

```
# Matrix indexing
print(matrix.1[2, 3])
```

## Exercise 4: Logical Statements

Simple if-else: Write an if-else statement that prints “Big” if the average of `numbers`

is greater than 10, and “Small” otherwise.

## Sample Solution

```
# Simple if-else
if (mean(numbers) > 10) {
print("Big")
else {
} print("Small")
}
```

Nested if-else: Modify the above to include a check if the average is exactly 10, printing “Exactly 10”.

## Sample Solution

```
# Nested if-else
if (mean(numbers) == 10) {
print("Exactly 10")
else if (mean(numbers) > 10) {
} print("Big")
else {
} print("Small")
}
```

## Exercise 5: Random Numbers

Generate random numbers: Generate a vector of 5 random numbers drawn from a normal distribution with mean 0 and standard deviation 1. Print the vector.

## Sample Solution

```
# Generate random numbers
<- rnorm(5)
random.numbers print(random.numbers)
```

Reproducibility: Set a seed of your choice and generate the same vector of random numbers as above.

## Sample Solution

```
# Generate random numbers
set.seed(222) # Set seed for reproducibility
<- rnorm(5)
random.numbers print(random.numbers)
```

## Exercise 6: Data Frames

Explore `college.data`

: Print the first 6 rows of `college.data.`

## Sample Solution

```
# Explore `college.data`
head(college.data)
```

Column operations: Calculate the mean of the PhD column in `college.data`

.

## Sample Solution

```
# Column operations
mean(college.data$PhD)
```

Subsetting: Create a new data frame small.college that only includes colleges with less than 5000 students (use `college.data$Enroll`

for enrollment numbers).

## Sample Solution

```
# Subsetting data frame
<- college.data[college.data$Enroll < 5000, ]
small.college print(small.college)
```