I’m learning to work with data.tables in R. This reference provides a translation between data.table and Stata commands and should be helpful for any moving between the two languages.

Stata R Description
clear all rm(list = ls()) Clear memory
global dir = “[path]” dir <- “[path]”  
cd “$dir” setwd(dir)  
ssc install charlist pacman::p_load(data.table) Install/load a package
import delimited using “$dir/[FILE].csv”, clear varnames(1) dt <- fread(paste(dir, “[FILE].csv”, sep = “”))  
rename (old1 old2) (new1 new2) setnames(dt, c(“old1”, “old2”), c(“new1”, “new2”)) Rename fields
gen var1 = 5 dt[, var1 := 5]  
drop var1 var2 dt[, c(“var1”, “var2”) := NULL]  
keep var1 var2 dt <- dt[, c(“var1”, “var2”)]  
keep if var1 != 0 dt <- dt[var1 != 0]  
keep if inlist(var1, “str1”, “str2”) dt <- dt[var1 %in% c(“str1”, “str2”)]  
replace var1 = 2 if var2 == “Y” dt[var2 == “Y”, var1 := 2]  
replace var1 = subinstr(var1, “/”, “”, .) dt[grep(“/”, var1), var1 := gsub(“/”, “”, var1)] Remove all “/” from a string.
sort var1 var2 setorder(dt, var1, var2)  
collapse (sum) var1, by(var2) dt[, .(sum(var1)), by = var2] Roll up data
collapse (sum) var1 var2, by(var3) dt[, lapply(.SD, sum), by = var3, .SDcols = c(“var1”, “var2”)  
bysort var2: egen tot_var1 = total(var1) dt[, tot_var1 := sum(var1), by = var2] Calculate totals by group
replace var1 = var1[_n-1] if missing(var1) dt[, var1 := nafill(var1, type = “locf”)] Fill down missing values

Here are two ways to apply the same function to several columns:

# Replace negative values with NA
NAifNeg <- function(x) {
	return(ifelse(x >= 0, x, NA))
}

cols <- c("V1", "V2", "V3")
for (col in cols) set(dt, j = col, value = NAifNeg(dt[[col]])

# Deflate values based on an index
dt[, (cols) := lapply(.SD, "/", CPI / 100), .SDcols = cols]