top of page
2016 Atlantic Causal Inference Conference Competition:
Is Your SATT Where It's At?
DATA!!!!!!! (May 22, 2017)
We are now releasing the full data in conjunction with the competition  covariates, treatment assignment, and potential outcomes. Here is the link (after clicking on the link you will have the option to download the file).
Correction (May 4, 2017)
Due to an error in the way simulations were generated and stored for the doityourself submissions, the doityourself results posted below were incorrect. The results now show a clear doityourself winner, DR w/GBM+MDIA by J.R. Lockwood. The post below has been modified to reflect these changes.
Latest News (Jun 1, 2016)
The poster shown at the conference has been made available online, but we've decided to keep the contest open for a while longer. The new deadline for both doityourself and black box submissions is June 30, 2016. Email your submissions to vjd4@nyu.edu. Current results are shown on this page, and the original contest specification is reproduced at the bottom.
Current Results
The winner in the doityourself portion of the competition is DR w/GBM+MDIA by J.R. Lockwood (Educational Testing Service). The winners in the black box part are BART, calCause, and SL + TMLE, with an honorable mention for the H2O Ensemble approach. BART was submitted independently by Douglas Galagate (University of Maryland) and Nicole Bohme Carnegie (University of WisconsinMilwaukee), calCause jointly by Chen Yanover and Omer Weissbrod (IBM Research  Haifa), SL + TMLE jointly by Susan Gruber (Harvard University) and Mark van der Laan (University of California  Berkeley), and H2O Ensemble by Hyunseung Kang (Stanford University).
DoItYourself
The following plots show the results for the doityourself portion aggregated over all 20 data sets. Also included are the black box estimates, to the right of the dashed line. PEHE stands for "precision in estimation of heterogeneous effects" and is only reported for methods that provide estimates of individual treatment effects. It is defined as the rootmeansquared of individual treatment effects for a single simulation, divided by the standard deviation of the response.
Black Box
The black box portion contains averages across a full 77 simulation settings, with 100 replications per setting. For the bias plot, the lines for each method show the innerquartilerange.
Contest Motivation
Causal inference researchers are constantly striving to create robust estimation procedures that will reliably estimate treatment effects across a wide variety of circumstances. This has led to a wide variety of methods that all purport to do be able to achieve this goal. However, typical papers in this field compare just two or three methods at a time. Moreover, these papers typically are written by researchers who, however well meaning, are interested in showcasing their own method. Thus it is unclear that such comparisons are entirely fair to the comparison methods considered. We would like to facilitate a broader comparison of methods in a setting in which the method considered is being implemented by someone who wants to show that method in its best light.
Therefore we are announcing a Causal Inference challenge, “Is your SATT where it’s at?” to better understand which approaches to causal inference perform well in particular observational study settings (described below) with pointintime treatments. The goal is for individual researchers or research teams to obtain the best estimate of the treatment effect (specifically the effect of the treatment on the treated) for each dataset across a range of datasets.
In addition, there is increasing interest in developing methods for causal inference that are highly automated to decrease the burden on applied researchers, and yet produce accurate, precise, and reproducible estimates. Therefore there will be a portion of the contest devoted to such automated methods (see Option 2 below).
Structure of the Challenge:

77 datasets have been created by the organizers (only a subset of 20 will be used for the doityourself competition option below). The 58 available covariates in these datasets are drawn from a real study (to be revealed after the contest is over) and will be the same across the datasets. The binary treatment assignment and continuous outcome will be simulated for each dataset. The datasets will vary along the following dimensions:

Level of nonlinearity (including discontinuities) of the assignment mechanism and response surface

Level of treatment effect heterogeneity

Ratio of treated to control observations

Lack of overlap between treatment group and control group (there will be always be a reasonable amount of common support for the treatment group but there may exist controls in neighborhoods of the covariate space there no treated observations exist)

Dimensionality of confounder space

Magnitude of the treatment effect


All of the datasets will have the following features

Observations will be independent of each other and identically distributed (conditional on covariates)

Ignorability (selection on observables, all confounders measured, no hidden bias…) will hold

Not all covariates will be true confounders

How to compete:
There are two ways to compete.
Option 1: Do it yourself! A subset of 20 datasets from the full set of 77 will be used for this competition.

Download the data here.

For each dataset estimate the ATT/TOT* and send us:

Treatment effect estimate

Confidence interval

Computational time (type of computer and wall time, if on a cluster how much time per node)

Optional: Individuallevel treatment effects (if available; we realize that not all methods are conducive to computing this measure)

A written description of the method used, including any relevant references


Send your submissions to vjd4@nyu.edu.
Option 2: We’ll do it for you! All 77 simulation settings are used in this competition.

Create an executable or script. It can be in any of the following languages: R, Stata, Matlab, Python, C++ (we can try to accommodate other requests).

Your executable should take two inputs, the name of data file and the name of an output file. The data file will be in csv format and match the following specification:

Column 1 is a binary treatment variable

Column 2 is a continuous response variable

Columns 3 and above are covariates; factors are coded with letters A/B/C/…, binary variables are 0/1, and other columns are real numbers


Your output should consist of a csv file containing just the estimate of the treatment effect on the treated, a lower bound for a 95% confidence interval, and the corresponding upper bound

(Optional) If you method is capable of providing individual effect estimates, those can be submitted as well. In this case, your executable should take three inputs: the input file, an output file for the estimated population effect, and an output file for the individual effects. The format for the individual effects should also be a csv file with individual estimates, one per row in the same format as above

An example in R including test data and output is available here.

Send us your script at vjd4@nyu.edu.
How To Win
We recognize that as hard as we have tried to create a wide range of settings that it is impossible to know how representative they are of the “real world.” Therefore it seems unfair to have just one winner.

We will evaluate based on several criteria: RMSE, bias, confidence interval coverage, confidence interval length, computational time, and ability to capture treatment effect heterogeneity. These could vary across the types of scenarios reflected in the datasets.

Prizes will be awarded across different categories.
Challenge results were revealed on May 26th at the 7PM event.
Questions
If there are aspects of the competition that are as yet still unclear please feel free to contact us at vjd4@nyu.edu.
*Specifically, if we let Z denote binary treatment assignment and Y(0), Y(1) denote the continuous potential outcomes with respect to that treatment, the estimand of interest is E[Y(1)Y(0)  Z=1] where the expectation is taken over the sample. Technically then this is the Sample Average effect of the Treatment on the Treated (SATT).
bottom of page