De novo protein sequence analysis

De novo protein sequence analysis software#

The input amino acid strings represented the sequences of the aggregated paths generated from those data sets by Twister as described in Section 4.3 and Section 4.4 further details can be found in. Tolerance for comparing mass offsets: ε a b s = 10 ppm.

De novo protein sequence analysis software#

The corresponding extended version of the Twister software tool can be downloaded from. Its performance is illustrated on top-down data sets for carbonic anhydrase 2 (CAH2) and the Fab region of alemtuzumab the sequence fragments passed to it as input comprise the amino acid sequences of the aggregated paths generated by Twister from the respective data set. In this work, we apply the concept of tag convolution introduced in for the case of bottom-up MS/MS data to develop a method for combining sequence fragments of the proteins from the sample into even longer, possibly gapped, amino acid sequences matching those of the proteins being analyzed. Very recently, the Twister approach, which allows for the retrieval of long and highly accurate sequence fragments of the target protein(s) from a set of top-down MS/MS spectra, has been presented and implemented in a software tool freely available on the web. The next algorithm, somehow profiting from top-down MS/MS spectra, was TBNovo, which exploited those as a scaffold to assemble overlapping peptides reconstructed from bottom-up data. capitalizing on the complementarity of collisionally activated dissociation (CAD) and electron capture dissociation (ECD), which has never become publicly available as a software program. Until the last year, the only method for de novo sequencing of proteins solely from top-down MS/MS data was the one by Horn et al. Top-down mass spectrometry opened new horizons in the analysis of intact proteins, particularly antibodies, but the number of algorithmic solutions developed for processing this kind of data still remains very limited. However, the de novo strategy represents the only option for sequencing complementarity determining regions (CDRs) of antibodies, proteins from organisms with unknown genomes, and novel splice variants. Despite those achievements, database search is commonly considered as a substantially more reliable approach to protein identification, and remains the choice of preference if a database is available the most widely-used tools to this end in the bottom-up and top-down case are Sequest and Mascot, and ProSightPC/ProSight PTM and MS-Align+, respectively.

Most of the effort has been invested in retrieving target peptide sequences from bottom-up MS/MS data, leading to several handy software tools such as PEAKS, PepNovo, pNovo, Lutefisk, Sherenga, Vonode, Novor, the ALPS system, and a special-purpose program UVnovo, as well as a few alternative strategies that benefit from multiple enzyme digest, or pairs or triples of spectra acquired using different fragmentation techniques. De novo sequencing of peptides and proteins from tandem (MS/MS) mass spectrometry data is an important and challenging problem, which has been attracting the attention of specialists in the field for a few decades.