Veit Schwämmle, computational proteomics

Head of Group

Assoc. Prof. Veit Schwämmle

Research Interests

The Computational Proteomics Group develops and applies computational solutions for improved data analysis in large-scale omics experiments with focus on proteins and their post-translational modifications (PTMs). The aim is to better understand the functional protein states in order to determine, confirm and predict their contribution to cell behavior and disease. Our main research interests include

Developing software and workflows for mass spectrometry data
Studying biology through post-translational modifications
Creating statistical tools for quantifying and interpreting omics data
Implementing deep learning models for mass spectrometry data processing
Modelling and interpreting signalling in molecular pathways

For the full publication record, see Google Scholar.

Software & Resources

Proteomics informatics

All our software is open source and available on our repositories. We mostly develop:

R and Python scripts and libraries for functional analysis of omics data
New algorithms for computational mass spectrometry
Deep learning models for MS data
Portable and scalable workflows for HPC and cloud
Statistical methods for proteoform characterizations

Data resources

The ELIXIR Tools and Data Service Registry, bio.tools, launched in January 2015, hosts details about tens of thousands of software in the life sciences.

Collaboration with the PRIDE repository to enhance metadata annotations using SDRF

Interactive and user-friendly software

We develop web applications for interactive data analysis (see e.g. Web Applications). We have a strong focus on using smart and extensive visualization for deep exploration of omics data. Most of these applications are also available as stand-alone versions via docker and conda, and several as libraries for direct integration into data analysis workflows.

Quantitative analysis
- General literature
  For an overview of methods used in proteomics, see our PTMomics review as well as our editorial about reproducibility. We also published a paper about transparent reporting of the different steps in the data analysis workflow: ref. We published tutorials for CrossTalkMapper, VSClust + complex analysis and general PTM analysis, and have further training material on GitHub.
- Tool suites for quantitative Omics
  Check out our tool suite facilitating data processing and further analysis with PolySTest, VSClust et al. (ref).
  We also collaborated with the development of a web application for LC-MS metabolomics data: .
- Statistical tests: PolySTest
  New statistical test for data with low replicates numbers and many missing values, combined with well-performing statistical tests for high confidence detection of differentially regulated features (ref.) .
  The old LimmaRP approach where we showed the power of combining limma and rank products (ref.) is still accessible here:
- Biopharma LC-MS applications
  Recent work demonstrates and compares LC-MS-based host-cell-protein quantitation methods in the USP 1132.1 context, including qualification and benchmarking across workflows (ref).
- Data clustering
  VSClust considers feature variance leading to more accurate clustering results (ref. and tutorial). . See also how to estimate appropriate parameter values for fuzzy c-means clustering (ref). (old app: )
- Protein Complexes
  to investigate the behavior of protein complexes in your proteomics data set (ref). to look for co-regulatory patterns within hundreds of human cell types (ref).
- PTM crosstalk
  Large-scale estimation of crosstalk between nearby residues (ref. and ref.). Interplay scores estimate the quantitative crosstalk between PTMs on a protein (ref. and ref.). Crosstalk patterns can be visualized by CrosstalkMapper.
Chromatin Biology
- Internal fragment ions
  Internal fragments ions are considered crucial for the identification of proteoforms in top-down mass spectrometry but are ubiquitous and noisy. We show how they can be used to validate proteoforms (ref).
- Middle-down mass spectrometry
  We develop and apply a workflow to quantify PTMs on histone tails (refs). ProteoformQuant is under active development for improved quantification of proteoforms in middle-down MS data.
- CrosstalkDB
  With quantitative data from middle-down and top-down mass spectrometry, the web server collects and analyzes the input files, followed by statistical assessment of the crosstalk between measured PTMs (refs).
- Computational models
  Taking simple rules for writing, propagating and deleting histone PTMs on chromatin, we were able to reproduce global patterns measured by ChIP-seq experiments. The implementation of crosstalk rules results in a rich spatial and temporal behavior (ref).
Deep learning in proteomics
- Peptide representations for deep learning
  In collaboration with the Röttger group, we developed an environment to retrieve millions of mass spectra (MS1 and MS2) from the public repository PRIDE, to categorize them in a database, and to create data representations that can be directly used for machine learning purposes: MS2AI (refs).
- Variability in MS
  
  Blind application of deep learning methods to MS leads to high bias and inaccurate predictions. We investigated the impact of data variability on the prediction bias in (ref).
- Registry
  
  See also the AIMe registry to report AI-based biomedical results in a standardized and reproducible manner (ref) and our community paper about machine learning in proteomics (ref).
Workflows and standards
- Standardization and community efforts
  See the proposed notation of proteoforms: ProForma (ref). Accurate metadata annotation is crucial for ensuring reproducibility and data repurposing. Together with the EuBIC community, we developed a data standard for proteomics metadata: (refs.). As part of ELIXIR DK, we annotate proteomics tools for bio.tools (refs.) and are part of the proteomics community (see also white paper, ref). We also recently contributed to a community perspective on preventing proteomics data tombs via shared standards, incentives, and stewardship (ref).
- WOMBAT-P
  We implemented four scalable and portable workflows for label-free data analysis as part of an ELIXIR implementation study. They allow to systematically compare the performance of different data analysis workflows (ref).
- Antibodies
  Antibodies are notoriously difficult to identify due to their large sequence variety. Taking advantage of the Observed Antibody Space, we developed a workflow to identify antibodies in mass spectrometry data (ref).
- ProtProtocols
  As part of a project within the EuBIC initiative, we developed a framework for fully reproducible, documented and user-friendly pipelines for specific cases of proteomics data analysis. Within this framework, we created IsoProt, a full data analysis pipeline for iTRAQ/TMT data (ref). Check it out here: IsoProt at GitHub – download it via our docker-launcher
Complex Systems
- Biological evolution
  See my former studies of aging, sympatric speciation and competitive cellular automata (refs).
- Simulations
  Almost anything can be simulated on the computer including sand dunes, opinion dynamics and linguistics (refs).
- Statistical Mechanics
  See my work on generalized entropies and Fokker-Planck equations (refs).

Quantitative Proteomics

An overview of the main methods for analysis of data from peptide mass spectrometry and other -omics data.

Online course material

Proteomics sandbox

Courses at SDU

My group runs and co-teaches multiple courses at the Department of Biochemistry and Molecular Biology and beyond:

BMB830 – Biostatistics in R I (Master’s)
BMB831 – Biostatistics in R II (Master’s)
BMB211 – Workshops in Applied Bioinformatics (PhD)
BMB547 – Molecular Data Science (Bachelor, co-teacher)
BMB511 – Bioinformatics I (Bachelor, co-teacher)
SDC Biostatistics & Experimental Design (former course) – Sino-Danish Centre

Student Projects

We offer projects for Bachelor and Master students. To get an idea, take a look at our research topics. Interested? Contact us.

Past student projects
First year bachelor projects
- Functional analysis of a fish oil diet
- How strong are our muscles?
Bachelor projects
- Investigation of PTM crosstalk in mice to resolve age- and tissue-dependent patterns
- Large-scale investigation of protein variance in cancer tissues
- Enhanced and animated visualization of temporal changes on the histone PTM landscape
- Optimization of the data analysis pipeline to characterize combinatorial PTMs of histones
- A proteomics analysis of protein abundance variations in cancer
- Optimization of multi-threading capabilities in data clustering approaches
Master projects
- Determination of cellular age as a method to assess aging effects in multicellular organisms
- Bioinformatics in proteomics – supervised data analysis focused on protein complexes
- Intrinsically Disordered Protein Domains and Post-Translational Modifications – A Computational Biology Study
- Web-based application for visualization of proteins and their PTMs based on their quantification
- A fully reproducible and user-friendly workflow for the analysis of PTM-omics data
- Implementation and optimization of a fully automatized pipeline for the analysis of middle-down MS data
- Tandem-Mass spectrometry prediction based on LC-MS chromatograms using deep neural networks
- Implementation of fully reproducible and scalable data analysis workflows in bioinformatics using Nextflow
- Computational proteomics analysis of histones and their post-translational modifications
- Development of a statistical workflow to determine the relative post-translational modification changes

Web Applications

All applications are hosted on this server and accessible directly from your browser. If an app is temporarily unavailable due to high load, please try again later or run it locally. Local options include Docker containers (image names follow the pattern veitveit/<app_name>), access via the UCloud / SDU Cloud platform, and R packages (available for PolySTest and VSClust).

Automated pre-processing workflow with handshakes to other tools for quantitative omics datasets.

Launch GitHub

Variance-sensitive clustering of quantitative data with integrated statistical testing and pathway analysis.

Launch GitHub

Combined statistical testing for multi-omics data with few replicates and missing values.

Launch GitHub

Quality control and quantification of protein complexes in proteomics datasets.

Launch GitHub

Explore co-regulation of protein complexes and groups in human cells using ProteomicsDB data.

Launch Bitbucket

Interactive protein inference, summarization and visualization using parsimonious inference and factor-analysis-based summarization.

Launch GitHub