Plasmid design and construction
A summary of plasmid constructs are in Supplementary Table 1 and plasmid sequences are in Supplementary Data 2. Unless otherwise specified, cloning was performed by Gibson Assembly of PCR-amplified or commercially synthesized gene fragments (from Integrated DNA Technologies or Twist Bioscience) using NEBuilder Hifi Master Mix (NEB, E262), and final plasmids sequence-verified by Sanger sequencing of the open reading frame and/or commercial whole-plasmid sequencing service provided by Primordium.
Protein expression constructs
To summarize, denAsCas12a-KRAB, multiAsCas12a-KRAB, multiAsCas12a and enAsCas12a-KRAB open reading frames were embedded in the same fusion protein architecture consisting of an N-terminal 6xMyc-NLS29 and C-terminal XTEN80-KRAB-P2A-BFP103. The denAsCas12a open reading frame was PCR amplified from pCAG-denAsCas12a(E174R/S542R/K548R/D908A)-NLS(nuc)-3xHA-VPR (RTW776) (Addgene, plasmid 107943 (ref. 30)). AsCas12a variants described were generated by using the denAsCas12a open reading frame as starting template and introducing the specific mutations encoded in overhangs on PCR primers that serve as junctions of Gibson assembly reactions. opAsCas12a (ref. 29) is available as Addgene plasmid 149723, pRG232. 6xMyc-NLS was PCR amplified from pRG232. KRAB domain sequence from KOX1 was previously reported42. The lentiviral backbone for expressing Cas12a fusion protein constructs expresses the transgene from an SFFV promoter adjacent to UCOE and is a gift from Marco Jost and Jonathan Weissman, derived from a plasmid available as Addgene 188765. XTEN80 linker sequence was taken from a previous study51 and was originally from Schellenberger et al.111. For constructs used in piggyBac transposition, the open reading frame was cloned into a piggyBac vector backbone (Addgene, 133568) and expressed from a CAG promoter. Super PiggyBac Transposase (PB210PA-1) was purchased from System Biosciences.
dAsCas12a-3xKRAB open reading frame sequence is from a construct originally referred to as SiT-ddCas12a-[Repr]27. We generated SiT-ddCas12a-[Repr] by introducing the DNase-inactivating E993A by PCR-based mutagenesis using SiT-Cas12a-[Repr] (Addgene, 133568) as template. Using Gibson Assembly of PCR products, we inserted the resulting ddCas12a-[Repr] open reading frame in-frame with P2A-BFP in a piggyBac vector (Addgene, 133568) to enable direct comparison with other fusion protein constructs cloned in the same vector backbone (crRNAs are encoded on separate plasmids as described below).
Fusion protein constructs described in Supplementary Fig. 8b–f were assembled by subcloning the protein-coding sequences of AsCas12a and KRAB into a lentiviral expression vector using the In-Fusion HD Cloning system (TBUSA). AsCas12a mutants were cloned by mutagenesis PCR on the complete wild-type AsCas12a vector to generate the final lentiviral expression vector.
crRNA expression constructs
All individually cloned crRNA constructs and their expression vector backbone are listed in Supplementary Table 1. Unless otherwise specified, individual single and 3-plex crRNA constructs were cloned into the human U6 promoter-driven expression vector pRG212 (Addgene, 149722 (ref. 29)), which contains wildtype (WT) direct repeats (DR). Library 1, Library 2, and some 3-plex and all 4-plex, 5-plex and 6-plex As. crRNA constructs were cloned into pCH67, which is derived from pRG212 by replacing the 3’ DR with the variant DR8 (ref. 28). For constructs cloned into pCH67, the specific As. DR variants were assigned to each position of the array as follows, in 5′ to 3′ order:
3-plex: WT DR, DR1, DR3, DR8
4-plex: WT DR, DR1, DR10, DR3, DR8
5-plex: WT DR, DR1, DR16, DR10, DR3, DR8
6-plex: WT DR, DR1, DR16, DR18, DR10, DR3, DR8
8-plex: WT DR, DR1, DR16, DR_NS1, DR17, DR18, DR10, DR3, DR8
10-plex: WT DR, DR1, DR16, DR_NS1, DR4, DR_NS2, DR17, DR18, DR10, DR3, DR8
DR sequences are as follows: WT DR = AATTTCTACTCTTGTAGAT, DR1 = AATTTCTACTGTCGTAGAT, DR16 = AATTCCTACTATTGTAGGT, DR_NS1 = AATTCCTCCTCTTGGAGGT, DR4 = AATTTCTACTATTGTAGAT, DR_NS2 = AATTCCTCCTATAGGAGGT, DR17 = AATTTCTCCTATAGGAGAT, DR18 = AATTCCTACTCTAGTAGGT, DR10 = AATTCCTACTCTCGTAGGT, DR3 = AATTTCTACTCTAGTAGAT, DR8 = AATTTCTCCTCTAGGAGAT. Sequences for DR variants were previously reported28, except for DR_NS1 and DR_NS2, which were newly designed based on combining previously reported variants28. The rationale for selecting specific DR variants was to minimize homology across variants and maintain high crRNA activity based on prior analysis28.
1-plex,3-plex, 8-plex, and 10-plex crRNA constructs were cloned by annealing sets of complementary oligos with compatible overhangs in spacer regions, phosphorylation by T4 polynucleotide kinase (NEB M0201S), and ligated with T4 DNA ligase (NEB M0202) into BsmbI site of vector backbones. 4-plex, 5-plex and 6-plex crRNA arrays were ordered as double-stranded gene fragments and cloned into the BsmbI site of vector backbones by Gibson Assembly using the NEBuilder HiFi DNA Assembly Master Mix (NEB, E2621). Functions for designing oligos or gene blocks for cloning crRNA arrays are available as an R package at https://github.com/chris-hsiung/bears01.
Design of individual crRNAs
All spacer and PAM sequences are provided in Supplementary Table 1. For cloning individual crRNA constructs targeting TSS’s, CRISPick (https://portals.broadinstitute.org/gppx/crispick/public) was used in the enAsCas12a ‘CRISPRi’ mode (by providing gene name) or ‘CRISPRko’ mode (by providing sequence for TSS-proximal regions) to design spacers targeting canonical (TTTV) or non-canonical PAMs generally located within −50-bp to +300-bp region around the targeted TSS whenever possible, but some sites farther from the annotated TSS can show successful CRISPRi activity and were used. We manually selected spacers from the CRISPick output by prioritizing the highest on-target efficacy scores while avoiding spacers with high off-target predictions. The same non-targeting spacer was used throughout the individual well-based experiments and was randomly generated and checked for absence of alignment to the human genome by BLAT112.
The hg19 genomic coordinates for MYC enhancers are e1 chr8:128910869-128911521, e2 chr8:128972341-128973219 and e3 chr8:129057272-129057795. DNA sequences from those regions were downloaded from the UCSC Genome Browser and submitted to CRISPick. The top three spacers targeting each enhancer were picked based on CRISPick on-target efficacy score, having no Tier I or Tier II Bin I predicted off-target sites, and considering proximity to peaks of ENCODE110 DNase hypersensitivity signal (UCSC Genome Browser113 accession # wgEncodeEH000484, wgEncodeUwDnaseK562RawRep1.bigWig) and H3K27Ac ChIP-seq signal (UCSC Genome Browser accession # wgEncodeEH000043, wgEncodeBroadHistoneK562H3k27acStdSig.bigWig). These DNase hypersensitivity and H3K27Ac ChIP-seq tracks were similarly used to nominate candidate enhancer regions at the CD55 locus, whose genomic sequences are provided in Supplementary Table 1.
Cell culture, lentiviral production, lentiviral transduction and cell line engineering
C4-2B cells114 were gifted by F. Feng, originally gifted by L. Chung. All cell lines were cultured at 37°C with 5% CO2 in tissue culture incubators. K562 and C4-2B cells were maintained in RPMI-1640 (Gibco, 22400121) containing 25 mM HEPES, 2 mM L-glutamine and supplemented with 10% FBS (VWR), 100 U ml−1 streptomycin, and 100 mg ml−1 penicillin. For pooled screens using K562 cells cultured in flasks in a shaking incubator, the culture medium was supplemented with 0.1% Pluronic F-127 (Thermo Fisher, P6866). HEK 293T cells were cultured in media consisting of DMEM, high glucose (Gibco 11965084, containing 4.5 g ml−1 glucose and 4mM L-glutamine) supplemented with 10% FBS (VWR) and 100 units/mL streptomycin, 100 mg ml−1 penicillin. Adherent cells were routinely passaged and harvested by incubation with 0.25% trypsin-EDTA (Thermo Fisher, 25200056) at 37°C for 5–10 min, followed by neutralization with media containing 10% FBS.
Unless otherwise specified below, lentiviral particles were produced by transfecting standard packaging vectors (pMD2.G and pCMV-dR8.91) into HEK293T using TransIT-LT1 Transfection Reagent (Mirus, MIR2306). At <24 h after transfection, culture medium was exchanged with fresh medium supplemented with ViralBoost (Alstem Bio, VB100) at 1:500 dilution. Viral supernatants were harvested ~48–72 h after transfection and filtered through a 0.45 mm PVDF syringe filter and either stored in 4°C for use within <2 weeks or stored in −80°C until use. Lentiviral infections included polybrene (8 µg/ml). MOI was estimated from the fraction of transduced cells (based on fluorescence marker positivity) by the following equation115,116: MOI =−ln(1 − fraction of cells transduced).
For experiments described in Supplemental Fig. 8a–f, lentivirus was produced by transfecting HEK293T cells with lentiviral vector, VSVG and psPAX2 helper plasmids using polyethylenimine. Medium was changed ~6–8 h post transfection. Viral supernatant was collected every 12 h five times and passed through 0.45-µm PVDF filters. Lentivirus was added to target cell lines with 8 µg ml−1 polybrene and centrifuged at 650 ×g for 25 min at room temperature. Medium was replaced 15 h after infection. An antibiotic (1 µg ml−1 puromycin) was added 48 h after infection.
For piggyBac transposition of fusion protein constructs, cells were electroporated with ~210 ng of AsCas12a fusion protein plasmid and ~84 ng of Super PiggyBac Transposase Expression Vector (PB210PA-1, Systems Biosciences) using the SF Cell Line 4D-Nucleofector X Kit (V4XC-2032, Lonza Bioscience) and the 4D-Nucleofector X Unit as per manufacturer’s instructions (FF-120 program for K562 cells; EN-120 program for C4-2B cells).
Antibody staining and flow cytometry
The following antibodies were used for flow cytometry at 1:100 dilution: CD55-APC (BioLegend, 311312), CD55-PE (BioLegend, 311308), CD81-PE (BioLegend, 349506), CD81-AlexaFluor700 (BioLegend, 349518), B2M-APC (BioLegend, 316311), KIT-PE (BioLegend, 313204), KIT-BrilliantViolet785 (BioLegend, 313238) and FOLH1-APC (BioLegend, 342508). Cells were stained with antibodies were diluted in FACS Buffer (PBS with 1% BSA) and washed with FACS Buffer, followed by data acquisition on the Attune NxT instrument in 96-well plate format unless otherwise specified. For CRISPRi experiments, all data points shown in figures are events first gated for single cells based on FSC/SSC, then gated on GFP-positivity as a marker for cells successfully transduced with crRNA construct, as exemplified in Supplementary Fig. 1. For CRISPRi experiments in C4-2B cells, propensity score matching on BFP signal was performed using the MatchIt v4.5.3 R package.
For cell fitness competition assays, the percentage of cells expressing the GFP marker encoded on the crRNA expression vector is quantified by flow cytometry. log2 fold-change of percentage of GFP-positive cells was calculated relative to day 2 (for experiments targeting the Rpa3 locus in Supplementary Fig. 8) or day 6 (for experiments targeting the MYC locus in Fig. 6b). For experiments targeting the Rpa3 locus, flow cytometry was performed on the Guava Easycyte 10 HT instrument.
Pooled crRNA library design
For all crRNAs in Library 1 and Library 2, we excluded in the analysis spacers with the following off-target prediction criteria using CRISPick run in the CRISPRi setting: 1) off-target match = ‘MAX’ for any tier or bin, or 2) # Off-Target Tier I Match Bin I Matches > 1). The only crRNAs for which this filter was not applied are the non-targeting negative control spacers, which do not have an associated CRISPick output. All crRNA sequences were also filtered to exclude BsmbI sites used for cloning and three or more consecutive T’s, which mimic RNA Pol III termination signal.
Library 1 (single crRNAs)
To design crRNA spacers targeting gene TSS’s for Library 1, we used the −50-bp to +300-bp regions of TSS annotations derived from capped analysis of gene expression data and can include multiple TSSs per gene67. We targeted the TSSs of 559 common essential genes from DepMap with the strongest cell fitness defects in K562 cells based on prior dCas9-KRAB CRISPRi screen67. We used CRISPick with enAsCas12a settings to target all possible PAMs (TTTV and 44 non-canonical PAMs) in these TSS-proximal regions. Except for the criteria mentioned in the previous paragraph, no other exclusion criteria were applied. For the TSS-level analyses shown in Fig. 4d,e, each gene was assigned to a single TSS targeted by the crRNA with the strongest fitness score for that gene.
Negative controls in Library 1 fall into two categories: 1) 524 intergenic negative controls, and 2) 445 non-targeting negative controls that do not map to the human genome. Target sites for intergenic negative controls were picked by removing all regions in the hg19 genome that are within 10 kb of annotated ensembl genes (retrieved from biomaRt from https://grch37.ensembl.org) or within 3 kb of any ENCODE DNase hypersensitive site (wgEncodeRegDnaseClusteredV3.bed from http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeRegDnaseClustered/). The remaining regions were divided into 1-kb fragments. 90 such 1-kb fragments were sampled from each chromosome. Fragments containing ≥20 consecutive Ns were removed. The remaining sequences were submitted to CRISPick run under CRISPRi settings. The CRISPick output was further filtered for spacers that meet these criteria: 1) off-target prediction criteria described in the beginning of this section, and 2) on-target Efficacy Score ≥0.5 (the rationale is to maximize representation by likely active crRNAs to bias for revealing any potential cell fitness effects from nonspecific genotoxicity due to residual DNA cutting by multiCas12a-KRAB), 3) mapping uniquely to the hg19 genome by Bowtie117 using ‘-m 1’ and otherwise default parameters, 3) filtered once more against those whose uniquely mapped site falls within 10 kb of annotated ensembl genes or any ENCODE DNase hypersensitive site.
Non-targeting negative control spacers were generated by 1) combining non-targeting negative controls in the Humagne C and D libraries (Addgene accession numbers 172650 and 172651), 2) taking 20-nt non-targeting spacers from the dCas9-KRAB CRISPRi_v2 genome-wide library67, removing the G in the 1st position and appending random 4-mers to the 3’ end. This set of spacers were then filtered for those that do not map to the hg19 genome using Bowtie with default settings.
Library 2 (6-plex crRNAs)
Sublibrary A (42,600 constructs designed): Test position spacers were encoded at each position of the 6-plex array, with remaining positions referred to as context positions and filled with negative control spacers. Test positions encodes one of 506 intergenic negative control spacers and 914 essential TSS-targeting spacers. The essential TSS-targeting spacers were selected from among all spacers targeting PAMs within −50-bp to +300-bp TSS-proximal regions of 50 common essential genes with the strongest K562 cell fitness defect in prior dCas9-KRAB CRISPRi screen67 and must have ≥0.7 CRISPick on-target efficacy score. Negative control context spacers consist of five 6-plex combinations; three of these combinations consist entirely of non-targeting negative controls, and two of the combinations consist entirely of intergenic negative controls.
Sublibrary B (6,370 constructs designed): crRNA combinations targeting cis-regulatory elements at the MYC locus were assembled from a subset of combinations possible from 15 starting spacers (3 targeting MYC TSS, 3 targeting each of 3 enhancers, and 3 intergenic negative control spacers). The three enhancer elements are described in the subsection ‘Design of individual crRNAs.’ These 15 starting spacers were grouped into 5 3-plex combinations, each 3-plex combination exclusively targeting one of the four cis-regulatory elements, or consisting entirely of intergenic negative controls. Each 3-plex was then encoded in positions 1–3 of 6-plex arrays, and positions 4–6 were filled with all possible 3-plex combinations chosen from the starting 15 spacers. All 6-plex combinations were also encoded in the reverse order in the array.
All-negative control constructs (2,000 constructs designed): 1,500 6-plex combinations were randomly sampled from the intergenic negative control spacers described for Library 1. 500 6-plex combinations were randomly sampled from non-targeting negative control spacers described for Library 1.
Intergenic negative controls and non-targeting negative controls are defined the same as in Library 1.
As Library 2 was designed and cloned prior to the completion of the Library 1 screen, the majority of Library 2 contains constructs encoding for spacers in the test position that in hindsight do not produce strong phenotypes as single crRNAs in the Library 1 screen.
Both Library 1 and Library 2 were constructed from pooled oligonucleotide libraries designed to contain crRNA constructs designed for exploratory analysis for a separate unpublished study. Sequencing reads from those non-contributory constructs are present in the raw fastq files, do not affect interpretation of Library 1 and Library 2 screen cell fitness scores, and are excluded from analysis in the present study.
crRNA library construction
All PCRs were performed with NEBNext Ultra II Q5 Master Mix (NEB M0544). For Library 1, ~140 fmol pooled oligo libraries from Twist were subjected to 10 cycles of PCR amplification using primers specific to adaptor sequences flanking the oligos and containing BsmbI sites. The PCR amplicons were cloned into a crRNA expression backbone (pCH67) by Golden Gate Assembly with ~1:1 insert:backbone ratio using ~500 fmol, followed by bacterial transformation to arrive at an estimated 778× coverage in the final plasmid Library 1. For Library 2, 915 fmol of pooled oligo libraries from Twist was subjected to 18 cycles of PCR amplification and agarose gel purification of the correctly sized band before proceeding to Golden Gate Assembly. The estimated coverage of plasmid Library 2 from bacterial colony forming units is ~60×. Additional details are described in Supplementary Information.
Illumina sequencing library preparation
Primer sequences are provided in Supplementary Table 2. Sequences of the expected PCR amplicons for Illumina sequencing are in Supplementary Data 2. crRNA inserts were amplified from genomic DNA isolated from screens using 16 cycles of first round PCR using pooled 0-8nt staggered forward and reverse primers, treated with ExoSAP-IT (Thermo Fisher, 78201.1.ML), followed by 7 cycles of round 2 PCR to introduce Illumina unique dual indices and adaptors. Sequencing primer binding sites, unique dual indices, P5 and P7 adaptor sequences are from Illumina Adaptor Sequences Document #1000000002694 v16. PCR amplicons were subject to size selection by magnetic beads (SPRIselect, Beckman, B23318) prior to sequencing on an Illumina NovaSeq6000 using SP100 kit (PE100) for Library 1 or SP500 kit (PE250) for Library 2. Sequencing of plasmid libraries were performed similarly, except 7 cycles of amplification were each used for Round 1 and Round 2 PCR. The size distribution of the final library was measured on an Agilent TapeStation system. We noted that even after magnetic bead selection of Round 2 PCR-amplified Library 2 plasmid library (colonies from which were Sanger sequencing verified) and genomic DNA from screens, smaller sized fragments from PCR amplification during Illumina sequencing library preparation persisted. Thus, the majority of unmapped reads likely reflect undesired PCR by-products, though lentiviral recombination could contribute at an uncertain but relatively low frequency as well.
Cell fitness screens
Library 1 screen: K562 cells engineered by piggyBac transposition to constitutively express denAsCas12a-KRAB or multiAsCas12a-KRAB were transduced with lentivirally packaged Library 1 constructs at MOI ~0.15. Transduced cells were then selected using 1 µg/ml puromycin for 2 days, followed by washout of puromycin. On Day 6 after transduction, initial (T0) time point was harvested, and the culture was split into 2 replicates that are separately cultured henceforth. 10 days later (T10), the final time point was harvested (8.6 total doublings for multiAsCas12a-KRAB cells, 9.15 total doublings for denasCas12a-KRAB cells). A cell coverage of >500× was maintained throughout the screen. Library 2 screen: K562 cells engineered by piggyBac transposition to constitutively express multiAsCas12a-KRAB were transduced with lentivirally packaged Library 2 constructs at MOI ~0.15. The screen was carried out similarly as described for Library 1 screen, except the screen was carried out for 14 days (T14) or 13.5 total doublings and maintained at a cell coverage of >2,000× throughout. Genomic DNA was isolated using the NucleoSpin Blood XL Maxi kit (Machery-Nagel, 740950.50).
Screen data processing and analysis
Summary of library contents are in Supplementary Fig. 18.
For Library 1, reads were mapped to crRNA constructs using sgcount (https://noamteyssier.github.io/sgcount/), requiring perfect match to the reference sequence. For Library 2, reads were mapped using an algorithm (detailed in Supplementary Information) requiring perfect match to the reference sequence, implemented as ‘casmap constructs‘ command in a package written in Rust, available at https://github.com/noamteyssier/casmap.
Starting from read counts, the remainder of analyses were performed using custom scripts in R. Constructs that contained less than 1 reads per million (RPM) aligned to the reference library in either replicates at T0 were removed from analysis. From the constructs that meet this read coverage threshold, a pseudocount of 1 was added for each construct and the RPM recalculated and used to obtain a fitness score118 that can be interpreted as the fractional defect in cell fitness per cell population doubling:
$${\gamma }=\log_2\left(\frac{\left({\mathrm{RPMfinal}}/{\mathrm{negctrlmedianRPMfinal}}\right)}{\left({\mathrm{RPMinitial}}/{\mathrm{negctrlmedianRPMinitial}}\right)}\right)\Big/{\mathrm{totaldoublings}},$$
where RPM is the read count per million reads mapped to reference (initial = at T0, final = at end of screen), negctrlmedian is the median of RPM of intergenic negative control constructs, totaldoublings is the total cell population doublings in the screen. For Library 1, data from a single T0 sample was used to calculate the fitness score for both replicates due to an unexpected global loss of sequencing read counts for one of two originally intended T0 replicate samples. For each screen replicate in Library 2, data from two separate sequencing library preps from the same Round 1 PCR material subjected to separate Round 2 PCRs and sequenced on separate runs were pooled together for analysis.
Indel analysis by Illumina short-read sequencing
K562 cell lines engineered with the corresponding Cas12a protein constructs were transduced with crRNAs and sorted for transduced cells based on GFP-positivity. 200,000 cells were collected 14 or 15 days after crRNA transduction and genomic DNA was isolated using NucleoSpin Blood (Macherey-Nagel, 740951.50). For analysis of CD55 and CD81 loci, PCRs for loci of interest were run using Amplicon-EZ (Genewiz) partial Illumina adapters and amplicons were processed using NucleoSpin Gel and PCR Clean-up Kit (Macherey-Nagel, 740609.250). Paired-end (2 × 250 bp) sequencing was completed at GENEWIZ (Azenta Life Sciences). Raw fastq files were obtained from GENEWIZ and aligned to reference sequences using CRISPResso2 (ref. 119). Quantification diagrams were generated in R. For analysis at the KIT locus, cells were lysed using QuickExtract DNA Solution (Lucigen) and amplicons were generated using 15 cycles of PCR to introduce Illumina sequencing primer binding sites and 0-8 staggered bases to ensure library diversity. After reaction clean-up using ExoSAP-IT kit (Thermo Fisher, 78201), an additional 15 cycles of PCR was used to introduce unique dual indices and Illumina P5 and P7 adaptors. Libraries were pooled and purified by SPRIselect magnetic beads before paired-end sequencing using an Illumina MiSeq at the Arc Institute Multi-Omics Technology Center. Sequencing primer binding sites, unique dual indices (from Illumina TruSeq kits), P5 and P7 adaptor sequences are from Illumina Adaptor Sequences Document #1000000002694 v16. Bioinformatic analysis of indel frequencies and simulation of indel impacts on gene expression, accounting for DNA copy number of the target region in the K562 genome65, are detailed in Supplementary Information. Primer sequences are in Supplementary Table 2.
Nanopore long-read sequencing analysis of deletion frequencies
Genomic DNA was harvested from 20 million cells using the Qiagen Genomic Tips Kit (10243). As detailed in Supplementary Information, we used a custom protocol adapted from the Nanopore Cas9 Sequencing Kit user’s manual (SQK-CS9109, though this kit was not actually used) to enrich for genomic DNA surrounding crRNA target sites for Nanopore sequencing using Kit 14 chemistry. Cas9 guide spacer sequences are in Supplementary Table 1.
fastq files generated by MinKNOW version 23.07.15 (Oxford Nanopore Technologies) were aligned to the ~20-kb regions (defined by the outermost Cas9 sgRNA protospacer sites flanking each targeted locus) surrounding each crRNA target site in MinKNOW to generate bam files. Bam files for each sample were merged using samtool merge (samtools v1.6 (ref. 120)). Merged bam files were filtered for alignments that overlap the start and end coordinates of the protospacer region of the Cas12a crRNA using bamtools filter -region (bamtools v2.5.1 (ref. 121)). Filtered bam files were loaded into the Integrative Genomics Viewer 2.17.0 (ref. 122) for visualization of individual read alignments. pysamstats –fasta –type variation (pysamstats v1.1.2) was used to extract per base total read coverage and deletion counts. The fraction of aligned reads harboring a deletion at each base was plotted using custom scripts in R.
3’ RNA-seq
Approximately 200,000 to 1 million cells were harvested, resuspended in 300 µl RNA Lysis Buffer (Zymo, R1060), and stored at −70°C until further processing for RNA isolation using the Quick-RNA Miniprep Kit (Zymo, R1055). 3′ RNA-seq was batch processed together with samples unrelated to this study using a QuantSeq-Pool Sample-Barcoded 3′ mRNA-Seq Library Prep Kit for Illumina (Lexogen cat#139) in accordance with the manufacturer’s instructions. 10 ng of each purified input RNA was used for first-strand cDNA synthesis with an oligo(dT) primer containing a sample barcode and a unique molecular identifier. Subsequently, barcoded samples were pooled and used for second strand synthesis and library amplification. Amplified libraries were sequenced on an Illumina HiSeq4000 with 100-bp paired-end reads. The QuantSeq-Pool data was demultiplexed and preprocessed using an implementation of pipeline originally provided by Lexogen (https://github.com/Lexogen-Tools/quantseqpool_analysis). The final outputs of this step are gene level counts for all samples (including samples from multiple projects multiplexed together). Downstream analyses were performed using DESeq2 (ref. 123) for differential expression analysis, crisprVerse124 for off-target analysis, and custom R scripts for plotting as detailed in Supplementary Information.
RT-qPCR
For the CRISPRi experiments targeting the HBG1/HBG2 TSSs or HS2 enhancer, K562 cells engineered (by lentiviral transduction at MOI ~ 5) for constitutive expression of multiAsCas12a-KRAB were transduced with crRNAs and sorted, followed by resuspension of ~200,000 to 1 million cells in 300 µl RNA Lysis Buffer from the Quick-RNA Miniprep Kit (Zymo, R1055) and stored in −70°C. RNA isolation was performed following the kit’s protocols, including on-column DNase I digestion. 500 ng RNA was used as input for cDNA synthesis primed by random hexamers using the RevertAid RT Reverse Transcription Kit (Thermo Fisher, K1691), as per manufacturer’s instructions. cDNA was diluted 1:4 with water and 2 µl used as template for qPCR using 250 nM primers using the SsoFast EvaGreen Supermix (BioRad, 1725200) on an Applied Biosystems ViiA 7 Real Time PCR System. Data was analyzed using the ddCT method, normalized to GAPDH and no crRNA sample as reference. qPCR primer sequences are in Supplementary Table 2.
Transient transfection experiments
For co-transfection experiments, the day before transfection, 100,000 HEK293T cells were seeded into wells of a 24-well plate. The following day, we transiently transfected 0.6 µg of each protein construct and 0.3 µg gRNA construct per well (in duplicate) in Mirus TransIT-LT1 (MIR 2304) transfection reagent according to manufacturer’s instructions. Mixtures were incubated at room temperature for 30 min and then added in dropwise fashion into each well. 24 h after transfection, cells were replenished with fresh media. 48 h after transfection, BFP and GFP-positive cells (indicative of successful delivery of protein and crRNA constructs) were sorted on a BD FACSAria Fusion and carried out for subsequent flow cytometry experiments.
Western blotting
Approximately 400,000 cells per sample were washed with 1 ml cold PBS and resuspended in 400 µl Pierce RIPA Buffer supplemented with Halt Protease and Phosphatase inhibitor cocktail (Thermo Fisher, 1861281) on ice. Samples were rotated for 15 min at 4°C, followed by centrifugation at 20,000 g for 15 min to pellet cell debris. The supernatant was collected and mixed with 4x Bolt LSD Sample Buffer (Thermo Fisher, B0007) supplemented with 50 mM DTT, followed by heating for 10 min at 70°C. Samples were electrophoresed on Bolt 4%–12% Bis-Tris Plus Gels (Thermo Fisher), and transferred using the BioRad TurboTransfer system onto Trans-Blot Turbo Mini 0.2 µm Nitrocellulose Transfer Packs (1704158). Membranes were blocked with 6% BSA in TBST (Tris-buffered saline, 0.1% Tween 20) at room temperature for ~1 h, followed by incubation at 4°C overnight with antibodies against anti-HA-tag rabbit antibody (Cell Signaling Technology, 3724 S) at 1:1,000 dilution and anti-GAPDH rabbit antibody (Cell Signaling Technology, 2118) at 1:3,000 dilution in 6% BSA in TBST. Membranes were washed with TBST at room temperature three times for 5 min. each, followed by incubation with IRDye secondary antibody for 1 h at room temperature, washed three times with TBST 5 min for each and two times with PBS. Blots were imaged using Odyssey CLx (LI-COR).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.