Translating Stata to R

Published:

I’m learning to work with data.tables in R. This reference provides a translation between data.table and Stata commands and should be helpful for any moving between the two languages.

StataRDescription
clear allrm(list = ls())Clear memory
global dir = “[path]”dir <- “[path]” 
cd “$dir”setwd(dir) 
ssc install charlistpacman::p_load(data.table)Install/load a package
import delimited using “$dir/[FILE].csv”, clear varnames(1)dt <- fread(paste(dir, “[FILE].csv”, sep = “”)) 
rename (old1 old2) (new1 new2)setnames(dt, c(“old1”, “old2”), c(“new1”, “new2”))Rename fields
gen var1 = 5dt[, var1 := 5] 
drop var1 var2dt[, c(“var1”, “var2”) := NULL] 
keep var1 var2dt <- dt[, c(“var1”, “var2”)] 
keep if var1 != 0dt <- dt[var1 != 0] 
keep if inlist(var1, “str1”, “str2”)dt <- dt[var1 %in% c(“str1”, “str2”)] 
replace var1 = 2 if var2 == “Y”dt[var2 == “Y”, var1 := 2] 
replace var1 = subinstr(var1, “/”, “”, .)dt[grep(“/”, var1), var1 := gsub(“/”, “”, var1)]Remove all “/” from a string.
sort var1 var2setorder(dt, var1, var2) 
collapse (sum) var1, by(var2)dt[, .(sum(var1)), by = var2]Roll up data
collapse (sum) var1 var2, by(var3)dt[, lapply(.SD, sum), by = var3, .SDcols = c(“var1”, “var2”) 
bysort var2: egen tot_var1 = total(var1)dt[, tot_var1 := sum(var1), by = var2]Calculate totals by group
replace var1 = var1[_n-1] if missing(var1)dt[, var1 := nafill(var1, type = “locf”)]Fill down missing values

Here are two ways to apply the same function to several columns:

# Replace negative values with NA
NAifNeg <- function(x) {
	return(ifelse(x >= 0, x, NA))
}

cols <- c("V1", "V2", "V3")
for (col in cols) set(dt, j = col, value = NAifNeg(dt[[col]])

# Deflate values based on an index
dt[, (cols) := lapply(.SD, "/", CPI / 100), .SDcols = cols]