# Run this code to work on the exercise problems.
set.seed(3746)
x <- runif(n = 30, min = 0, max = 1)
x # see what's inside xLecture 1: Exercise Problems and Solutions
1 Exercise: Vector
The following code randomly samples 30 numbers from a uniform distribution between 0 and 1, and stores the result in x.
Questions
- Extract the 10th and the 15th elements of
x. - Extract elements larger than \(0.5\).
- Replace the 10th and the 15th elements of
xto 0. - If an element of
xis larger than \(0.9\), replace it with \(1\). - Count the elements larger than \(0.6\).
# === Part 1 === #
x[c(10, 15)]
# === Part 2 === #
x[x > 0.5]
# === Part 3 === #
x[c(10, 15)] <- 0
# === Part 4 === #
x[x > 0.9] <- 1
# === Part 5 === #
sum(x > 0.6)2 Exercise: Matrix
Use the following matrix:
set.seed(3746)
num <- runif(n = 30, min = 0, max = 1)
mat <- matrix(data = num, nrow = 6)
colnames(mat) <- c("A", "B", "C", "D", "E")
rownames(mat) <- c("a", "b", "c", "d", "e", "f")
mat # see what's inside mat- Extract the element in the 2nd row and 3rd column.
- Extract the 2nd row.
- Subset the rows where column “A” is larger than 0.5. (Use logical indexing).
# === Part 1 === #
mat[2, 3]
# === Part 2 === #
mat[2, ]
# === Part 3 === #
mat[mat[, "A"] > 0.5, ]3 Exercise: Data Frame
We will use the built-in dataset mtcars for this exercise. Run the following code to load the data.
# --- Load data --- #
data(mtcars)
?mtcars # to see the description of the yield_data
# --- Take a look at the data --- #
# head() function shows the first several rows of the data
head(mtcars)Extract the rows corresponding to the cars with the row numbers 1, 5, and 10 using numeric indexing
Add a new column to the
mtcarsdata frame calledpower_to_weight_ratio, which is calculated as the ratio of horsepower (hp) to weight (wt).Create a new data frame called
efficient_carsthat contains cars withmpggreater than20and power-to-weight ratio less than 5.(Optional) Sort the efficient_cars data frame by the
power_to_weight_ratiocolumn in ascending order and display the result. [Hints: (1)useorder()function to sort the data frame. (2) Useorder(efficient_cars$power_to_weight_ratio)as an index vector.]
# === Part 1 === #
mtcars[c(1, 5, 10), ]
# === Part 2 === #
mtcars$power_to_weight_ratio <- mtcars$hp / mtcars$wt
# === Part 3 === #
efficient_cars <- mtcars[mtcars$mpg > 20 & mtcars$power_to_weight_ratio < 5, ]
# === Part 4 === #
efficient_cars[order(efficient_cars$power_to_weight_ratio), ]4 Exercise: Vector, Comprehensive
Create a sequence of numbers from 20 to 50 and name it
x. Let’s change the numbers that are multiples of 3 to 0.sample()is commonly used in Monte Carlo simulation in econometrics. Run the following code to creater. What does it do? Use?sampleto find out what the function does.
set.seed(12345) #don't worry about this
r <- sample(1:100, size=20, replace = TRUE)- Find the value of mean and SD of vector
rwithout usingmean()andsd() - Figure out which position contains the maximum value of vector
r. (usewhich()function. Run?which()to find out what the function does.) - Extract the values of
rthat are larger than 50. - Extract the values of
rthat are larger than 40 and smaller than 60. - Extract the values of
rthat are smaller than 20 or larger than 70.
# === Part 1 === #
x <- 20:50
# using `:` operator is the most basic way to create a sequence of numbers, but it only works with integer numbers with a step of 1.
# seq() function is more flexible. For example, you can create a sequence of numbers, , incremented by 0.5.
# x <- seq(from = 20, to = 50, by = 0.5)
x[x %% 3 == 0] <- 0
# === Part 2 === #
# In this code, sample() function creates a random sample of numbers with size 20 (size=20) from a range 1 to 100 (x = 1:100) allowing replacement (replace = TRUE).
# === Part 3 === #
# mean
mean_r <- sum(r) / length(r)
# SD
sd_r <- sqrt(sum((r - mean_r)^2) / (length(r) - 1))
# === Part 4 === #
max_index <- which(r == max(r))
# === Part 5 === #
r_50 <- r[r > 50]
# === Part 6 === #
r_40_60 <- r[r > 40 & r < 60]
# === Part 7 === #
r_20_70 <- r[r < 20 | r > 70]5 Exercise: Data Frame, Comprehensive
- Load the file
nscg17small.dta. You can find the data in theDatafolder.- This data is a subset of the National Survey of College Graduates (NSCG) 2017, which collects data on the educational and occupational characteristics of college graduates in the United States.
- Each row corresponds to a unique respondent. Let’s create a new column called “ID”. There are various ways to create an ID column. Here, let’s create an ID column that starts from 1 and increments by 1 for each row.
- To take a quick look at the summary statistics of a specific column,
summary()function is useful. Usesummary()to create a table of the descriptive statistics for salary. You’ll provide salary column tosummary()as a vector. - Create a new variable in your data that represents the z-score of the hours worked (use
hrswkvariable). \[Z = (x - \mu)/\sigma\] , where \(Z = \text{standard score}\), \(x =\text{observed value}\), \(\mu = \text{mean of sample}\), and \(\sigma = \text{standard deviation of the sample}\). - Calculate the share of observations in your data sample with above average hours worked.
# === Part 1 === #
library(rio)
nscg17 <- import("Data/nscg17small.dta")
# === Part 2 === #
nscg17$ID <- 1:nrow(nscg17)
# === Part 3 === #
summary(nscg17$salary)
# === Part 4 === #
nscg17$z_hrswk <- (nscg17$hrswk - mean(nscg17$hrswk)) / sd(nscg17$hrswk)
# or using with() function, you can write the code more concisely
# nscg17$z_hrswk2 <- with(nscg17, (hrswk - mean(hrswk)) / sd(hrswk))
# Note: For part 2 and 3, you can use within() function to create new columns more concisely.
# nscg17 <-
# within(
# nscg17, {
# ID <- 1:nrow(nscg17)
# z_hrswk <- (hrswk - mean(hrswk)) / sd(hrswk)
# }
# === Part 5 === #
# create a logical vector that indicates whether the hours worked is above average
above_avg_hrswk <- with(nscg17, z_hrswk > mean(z_hrswk)) # you can get the same result by using `hrswk`.
# subset the data
nscg17_above_avg_hrswk <- nscg17[above_avg_hrswk, ]
# calculate the share of observations with above average hours worked
share_above_avg_hrswk <- nrow(nscg17_above_avg_hrswk) / nrow(nscg17)
share_above_avg_hrswk