Sequence-based proteoform identification
To demonstrate the power of this new approach, we decided to reanalyze pulldowns of exogenously expressed HRAS wild-type (HRASWT) and HRASG12D constructs in three cell lines (PXD01946917) because they represent highly expressed and stable peptide forms characterized by protein sequence alterations. The aim of the original study was to identify cancer-specific protein-protein interactions, particularly those resulting from mutations in 31 frequently mutated genes, as therapeutic targets. By focusing on HRASWT and HRASG12D cell lines, we found that these samples were highly enriched for HRAS bait proteins (Figure 1B, data 118) and were detected by numerous PSMs, achieving 98% coverage of the bait proteins (Figure 2A, data 2). Many of these peptides would not have been detected by a traditional closed search, as they were only captured in modified forms (Figure 2B, data 3), thus reducing the total protein coverage. To confirm this, we searched the same data with an MSFragger closed search, which identified only 2964 PSMs that mapped to HRAS across all samples (Data 4), whereas the open search identified 4909 PSMs (Data 5). This highlights the power of the open search to reveal more data about the peptide forms of the bait proteins. Furthermore, we showed that the bait PSMs correctly clustered samples by both cell line and HRAS mutation status, highlighting that the HRAS peptide forms captured by MSFragger differed between samples (Figure 1C, Data 6). In total, we identified 4041 proteins from 18846 peptides, of which 2887 were found in modified forms. Specifically, we identified 26 peptides that map to HRAS, which were present in multiple peptide forms, as they were detected at a range of delta masses (Data 5). Of these, the LVVVGAGGVGK peptide was detected in its unmodified, carbamylated (+43), and G12D mutated (+58) forms (Figure 1C, Data 5). Although the LVVVGAGGVGK + 58 G12D peptide was not detected in the original study, its presence validated the researchers’ experimental design and results17. This demonstrated that combining IP-MS data with open searches can provide additional information about the bait protein that would otherwise be missed. To systematically quantify these changes in the bait protein, we applied the SAINTexpress method8 for each cell line separately. We identified that the peptides LVVVGAGGVGK (unmodified) and LVVVGAGGVGK (+58.00 Da G->D substitution) were differentially present between HRAS WT and G12D cell lines (Figure 1D-F, Data 7-9). Furthermore, MSFragger localized the +58 modification to the G12 position, which is indicated in lowercase within the peptide, providing further evidence and interpretability of this modification (Figure 1D-F, Data 5). Finally, the location of differentially modified peptides may indicate isoform switching, and as expected, mapping the different peptides between HRAS WT and HRAS G12D in CAL-33 showed little difference downstream of the G12 position, strongly localized to the N-terminus of the protein (Figure 2C, Data 10). To improve confidence in the differentially present peptide forms identified in the open search, we investigated data including peptide N-terminal carbamylation and G->D variable modifications (both modifications informed by MSFragger delta mass localization). Indeed, of the 557 LVVVGAGGVGK PSMs, 163 were unmodified peptides, 288 were mutant peptides, and the rest were carbamylated (Data 11). These results increase the confidence in the open search results and highlight the vast untapped potential of IP-MS data without the use of open searches. Such positive controls highlight the power and reproducibility of our approach and its contribution to the detection of sequence-based peptide forms differentially present in IP-MS data.
Figure 2
Descriptors for the HRAS IP-MS case study. (A) Protein coverage percentage for all preys and baits. This is calculated as the percentage of amino acids detected per gene according to the longest isoform across all HRAS IP-MS samples and all peptide forms. The dotted line highlights the bait (HRAS) protein coverage. (B) Relative positions of all HRAS bait peptides detected and whether they were unmodified, modified, or detected with multiple delta masses. In this representation, peptides with isotopic error delta masses were considered unmodified. (C) Relative positions of peptides with statistical significance and fold change vs. CAL-33 bait peptides. Encoded by size and color, respectively.
Identification of distinct proteoforms in response to perturbations
The recent discovery and application of KRASG12C-specific inhibitors is particularly promising. However, the emergence of resistance to AMG-51013 has already been reported19 and represents an emerging area of cancer research. Nolan et al.20 characterized the differential interactome of KRASG12C mutant proteins, aiming to understand whether the mechanism of action of the AMG-510 inhibitor is mediated by changes in the KRAS interactome. To complement this study with KRAS differential proteoform information upon inhibitors, we investigated the differential peptide forms of exogenously expressed KRASWT and KRASG12C upon AMG-510 treatment (PXD043536)20. We checked the quality of the IP-MS data by ensuring that KRAS was highly enriched in the samples (Fig. 3A, data 12) and that high protein coverage was achieved (Fig. 4A-B, data 13-14). We identified 32 peptides mapping to KRAS, isoforms P01116-1 and P01116-2, which existed in multiple peptide form states, suggesting high proteoform complexity in the analyzed samples (Data 15). Interestingly, the addition of AMG-510 appeared to decrease KRAS PSMs, which the original study attributed to the inhibitor interfering with KRAS trypsin digestion (Figure 3A, Data 15). WT KRAS was not significantly affected by treatment, consistent with the specificity of AMG-510 for KRASG12C (Data 16). KRASG12C bait PSMs clustered correctly between treated and control samples, indicating that treatment with AMG-510 induced a change in the KRASG12C proteoform state (Figure 4C, Data 17). Following AMG-510 treatment, many of the modified peptides of KRASG12C were significantly different (Figure 3B, data 18), confirming an alteration in the KRASG12C proteoform state. In particular, the C118 position (Figure 3C), which is known to be oxidized to release GDP and contribute to carcinogenesis, appeared to be affected by AMG-510 treatment15,21. Notably, C118 modification-containing peptides were found to be differentially present in the samples, modified to cysteic acid (+47.98 Da) and sulfinic acid (+32.00 Da) in the presence of AMG-510. The delta masses reported by MSFragger of -9 and -25, respectively, are due to the absence of carbamidomethylation as a fixed modification on these cysteine residues (+57.02), which is a chemical derivative that arises when samples are treated with iodoacetamide22, but is not added when cysteines are oxidized23.
Figure 3
Detection of perturbation-induced and endogenous peptide forms by iPTM. (A) Pre-normalized PSM counts of prey (density plot) and bait (dot) proteins for KRASWT and KRASG12C mutant cell lines in AMG-510-treated (blue) and untreated (salmon) samples for each replicate IP-MS experiment. (B) Different bait peptide forms detected between AMG-510 and untreated KRASG12C samples according to SAINTexpress (defined as BFDR <0.05). Only bait peptide forms were included in the SAINTexpress analysis. (C) Structural model of the position of C118 relative to the AMG-510 and GDP binding sites in the KRASG12C protein. (D) Pre-normalized PSM counts of endogenously enriched BRD4 prey (density plot) and bait (dot) proteins across IgG control (top) and cell lines for each replicate IP-MS experiment. (E) Clustering of normalized PSM counts of highly abundant peptide forms belonging to the bait BRD4 protein, excluding IgG control samples. Unsupervised clustering was performed using default hclust parameters, and peptide forms not detected in a given sample were assigned a zero value after log transformation. (F) Different bait BRD4 peptide forms detected between K-562 and MOM-13 samples according to SAINTexpress. Defined as BFDR <0.05. Only bait peptide forms were included in the SAINTexpress analysis.
Figure 4
Descriptors for the KRAS and BRD4 IP-MS case study. (A) Percentage protein coverage of all preys and baits calculated as the percentage of amino acids detected per gene according to the longest isoform across all KRAS IP-MS samples and all peptide forms. The dotted line highlights the bait (KRAS) protein coverage. (B) Relative positions of all detected KRAS bait peptides and whether they were detected unmodified, modified, or at multiple delta masses. Peptides with isotopic error delta masses were considered unmodified in this representation. (C) Clustering of normalized PSM counts of highly abundant peptide forms belonging to the bait KRAS G12C protein. In unsupervised clustering performed with default hclust parameters, peptide forms that were not detected in a given sample were assigned a zero value after log transformation. (D) Percentage protein coverage of all preys and baits calculated as the percentage of amino acids detected per gene according to the longest isoform across all BRD4 IP-MS samples and all peptide forms. Dotted lines highlight bait (BRD4) protein coverage. (E) Relative position of BRD4 bait peptides between K-562 and MOLM-13. Peptides with statistical significance and fold change are size and color encoded, respectively. (F) Different bait BRD4 peptide forms detected between K-562 and MEG-01 samples according to SAINTexpress. Defined as BFDR < 0.05. Only bait peptide forms were included in the SAINTexpress analysis.
These data highlight the effect of drug treatment on proteoform abundance and the ability of our methodology to capture these changes.
Identification of endogenous proteoforms
Finally, exogenously expressed proteins increase yields in IP-MS experiments but may also increase false positives by altering the physiological state of the bait protein24,25. To validate our approach in an endogenous setting, we investigated endogenous BRD4 pulldowns previously published in four different leukemia cell lines (PXD012715)26. The goal of the original study was to characterize novel BRD4 interactors. There, we identified that methylenetetrahydrofolate dehydrogenase 1 (MTHFD1), a central enzyme in one-carbon metabolism, interacts with BRD4 on chromatin to regulate nucleotide availability and cancer-associated transcriptional control. In a reanalysis of the open search, we observed different levels of BRD4 enrichment in the four different cell lines (Figure 3D, data19). This is a detail not described in our previous analysis and suggests that baseline BRD4 expression differs between cell lines26. In our analysis workflow, we mitigated this effect by normalizing all bait PSMs per sample to make the comparison fair. Despite low protein coverage (35%, Figure 4D, Data 20), which may be a result of endogenous pulldown rather than overexpressed bait, the BRD4 bait PSMs correctly clustered 15 of 16 samples, demonstrating the presence of distinct proteoform states across cell lines (Figure 3E, Data 21). Among the samples, we identified 1442 proteins by 6070 peptides, of which 1081 were present in modified peptide forms. Specifically, we identified 31 BRD4 peptides that would have been missed by a closed search because they were present in multiple peptide form modification states (Data 22). We focused on K-562 and MOLM-13 cell lines and identified differential presence of phosphorylated peptides (Figure 3F, Data 23). This highlights the power of our approach to detect PTMs in endogenously expressed proteins. S1126 phosphorylation, detected unbiasedly by MSFragger open search, has been previously characterized27, thus increasing our confidence in its identification. Interestingly, the majority of differential peptides in BRD4 seemed to map to the C-terminus of the protein, with very few peptides detected at the N-terminus (Figure 4E, data 24). Finally, although it was possible to detect differentially present unmodified peptides (Figure 4F, data 25), in the absence of modified counterparts these peptides are difficult to interpret, as it is unclear whether they represent isoform switching, changes in cleavage sites, or differential PTM occupancy of the corresponding peptide forms that were not captured.
In conclusion, our novel methodology provides insight into modified peptide forms through enrichment with IP-MS data, providing a bridge between protein interactome and proteoform states.