Translating Stata to R
Published:
I’m learning to work with data.tables in R. This reference provides a translation between data.table and Stata commands and should be helpful for any moving between the two languages.
Stata | R | Description |
---|---|---|
clear all | rm(list = ls()) | Clear memory |
global dir = “[path]” | dir <- “[path]” | |
cd “$dir” | setwd(dir) | |
ssc install charlist | pacman::p_load(data.table) | Install/load a package |
import delimited using “$dir/[FILE].csv”, clear varnames(1) | dt <- fread(paste(dir, “[FILE].csv”, sep = “”)) | |
rename (old1 old2) (new1 new2) | setnames(dt, c(“old1”, “old2”), c(“new1”, “new2”)) | Rename fields |
gen var1 = 5 | dt[, var1 := 5] | |
drop var1 var2 | dt[, c(“var1”, “var2”) := NULL] | |
keep var1 var2 | dt <- dt[, c(“var1”, “var2”)] | |
keep if var1 != 0 | dt <- dt[var1 != 0] | |
keep if inlist(var1, “str1”, “str2”) | dt <- dt[var1 %in% c(“str1”, “str2”)] | |
replace var1 = 2 if var2 == “Y” | dt[var2 == “Y”, var1 := 2] | |
replace var1 = subinstr(var1, “/”, “”, .) | dt[grep(“/”, var1), var1 := gsub(“/”, “”, var1)] | Remove all “/” from a string. |
sort var1 var2 | setorder(dt, var1, var2) | |
collapse (sum) var1, by(var2) | dt[, .(sum(var1)), by = var2] | Roll up data |
collapse (sum) var1 var2, by(var3) | dt[, lapply(.SD, sum), by = var3, .SDcols = c(“var1”, “var2”) | |
bysort var2: egen tot_var1 = total(var1) | dt[, tot_var1 := sum(var1), by = var2] | Calculate totals by group |
replace var1 = var1[_n-1] if missing(var1) | dt[, var1 := nafill(var1, type = “locf”)] | Fill down missing values |
Here are two ways to apply the same function to several columns:
# Replace negative values with NA
NAifNeg <- function(x) {
return(ifelse(x >= 0, x, NA))
}
cols <- c("V1", "V2", "V3")
for (col in cols) set(dt, j = col, value = NAifNeg(dt[[col]])
# Deflate values based on an index
dt[, (cols) := lapply(.SD, "/", CPI / 100), .SDcols = cols]