Massively parallel functional dissection of mammalian

Supplementary Information for:

Massively parallel functional dissection of mammalian enhancers in vivo Rupali P Patwardhan1, Joseph B Hiatt1, Daniela M Witten2, Mee J Kim3, Robin P Smith3, Dalit May4, Choli Lee1, Jennifer M Andrie1, Su-In Lee1,5, Gregory M Cooper6, Nadav Ahituv3*, Len A Pennacchio4,7*, Jay Shendure1* 1

Department of Genome Sciences, University of Washington, Seattle, WA

2

Department of Biostatistics, University of Washington, Seattle, WA

3

Department of Bioengineering and Therapeutic Sciences, and Institute for Human Genetics, University of California San Francisco, San Francisco, CA

4

Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA

5

Department of Computer Science, University of Washington, Seattle, WA

6

HudsonAlpha Institute for Biotechnology, Huntsville, AL

7

DOE Joint Genome Institute, Walnut Creek, CA

Supplementary Note 1 Supplementary Tables 1-4 Supplementary Figures 1-10 Supplementary Methods

Supplementary Note 1. Multiple linear regression on entire haplotypes While linear models constructed on a position-by-position basis best represent the effect size of individual mutations, they may not perform optimally as predictors of the transcriptional activity of entire haplotypes, which contain many such mutations. To assess the ability of models constructed from our data to predict overall haplotype activity, we built two multiple linear regression models for each enhancer. The first model was composed of n binary variables (where n is the length of the enhancer) for whether or not a position was wild-type in an enhancer haplotype, and the second model was composed of 3n binary variables for whether a position was a particular mutant nucleotide in an enhancer haplotype (Supplementary Table 2). While all the models were significant as measured by comparison of mean squared error calculated from actual versus data versus data with the outcome vector permuted (p