Head of group:
Assoc. Prof. Veit Schwämmle
EmailThe Computational Proteomics
Group develops and applies computational solutions
for improved data analysis in large-scale omics
experiments with focus on proteins and their
post-translational modifications (PTMs). The aim is
to better understand the functional protein states
in order to determine, confirm and predict their
contribution to cell behavior and disease.
Our main research interests include
Developing software and workflows for mass spectrometry
data
Studying biology through post-translational modifications
Creating statistical tools for quantifying and interpreting
omics data
Implementing deep learning models for mass spectrometry data
processing
Modelling and interpreting signalling in molecular pathways
For the full publication record, see
Google Scholar.
All software is open source and available through the following repositories:
Bitbucket, GitHub
(1/2),
and GitHub (2/2)
R and Python scripts for functional analysis of omics data
Algorithms for middle-down and top-down mass spectrometry
Deep learning for enhanced feature detection in MS data
Portable pipelines and automated workflow composition for HPC and cloud
Statistical methods for proteoform and protein complex characterizations
The Elixir Tools and
Data Service Registry, bio.tools, launched in
January 2015, hosts details about tens of thousands
of software in the life sciences.
Our new MS2AI
tool allows creating and categorizing millions
of mass spectra.
Visit CrosstalkDB hosting quantitative data for crosstalk between histone proteins measured by middle-down mass spectrometry.
Collaboration with the PRIDE repository to enhance metadata annotations using SDRF
We develop web applications for interactive data analysis (see e.g.
Shiny Apps)
We have a strong focus on using smart and extensive visualization for deep exploration of omics data.
Most of these applications are also available as stand-alone
versions via docker and conda, and as
command-line tools for direct integration into data analysis workflows.
General literature
For an overview
of methods used in PTMomics see our review
as well as our editorial about
reproducibility.
We published tutorials for CrossTalkMapper,
VSClust
+ complex analysis and general
PTM analysis, and have further training material on
.
Tool suites for quantitative Omics
Check out our new tool suite in development:
OmicsQ facilitating data processing and further analysis with PolySTest,
VSClust et al.
We also collaborated with the development of a Shiny app for LC-MS metabolomics data:
MetaboLink.
Statistical tests: PolySTest
New statistical test for data with low replicates numbers and many
missing values, combined with
well-performing statistical tests for high
confidence detection of differentially
regulated features (ref.)
PolySTest.
The old LimmaRP approach where we showed the power of combining limma and rank products ( ref.)
is still accessible here:
LimmaRP
Data clustering
VSClust considers feature variance
leading to more accurate clustering results
(ref.
and tutorial).
VSClust.
See also how to estimate appropriate parameter
values for fuzzy c-means clustering (ref).
(old app:
FuzzyClust),
Protein Complexes
ComplexBrowser to investigate the
behavior of protein complexes in your
proteomics data set (ref
).
CoExpresso to look for co-regulatory
patterns within hundreds of human cell types
(ref).
PTM crosstalk
Large-scale estimation of crosstalk
between nearby residues (ref.
and ref.).
Interplay scores estimate the quantitative
crosstalk between PTMs on a protein (ref.
and ref.).
Crosstalk patterns can be
visualized by CrosstalkMapper.
Internal fragment ions
Internal fragments ions are considered crucial for the
identification of proteoforms in top-down mass spectrometry but are ubiquitous and noisy. We show
how they can be used to
validate proteoforms (ref).
Middle-down mass spectrometry
We
develop and apply a workflow to quantify PTMs
on histone tails
(refs).
Coming soon: ProteoformQuant for improved quantification of proteoforms in middle-down MS
data.
CrosstalkDB
With quantitative data
from middle-down and top-down mass
spectrometry, the web server collects and
analyzes the input files, followed by
statistical assessment of the crosstalk between
measured PTMs
(refs).
Access CrosstalkDB
Computational models
Taking simple
rules for writing, propagating and deleting
histone PTMs on chromatin, we were able to
reproduce global patterns measured by ChIP-seq
experiments. The implementation of crosstalk
rules results in a rich spatial and temporal
behavior (ref).
Peptide representations for deep
learning
In collaboration with the
Röttger group, we developed an environment
to retrieve millions of mass spectra (MS1 and
MS2) from the public repository PRIDE, to
categorize them in a database, and to create
data representations that can be directly used
for machine learning purposes: MS2AI
(refs).
Blind application of deep learning methods to MS leads to high bias and inaccurrate predictions. We investigated the impact of data variability on the prediction bias in (ref).
See also the AIMe registry to report AI-based biomedical results in a standardized and reproducible manner (ref) and our community paper about machine learning in proteomics (ref).
Standardization and community efforts
See the proposed notation of proteoforms:
ProForma (ref).
Accurate metadata annotation is crucial for ensuring reproducibility and data repurposing.
Together with the EuBIC community, we developed
a data standard for proteomics metadata: (refs.).
As part of ELIXIR
DK, we annotate proteomics tools for
bio.tools
(refs.)) and are part of the proteomics
community (see also white paper, ref).
WOMBAT-P
We implemented four
scalable and portable workflows for the
analysis or label-free data as part of an
ELXIR implementation study. They allow to
systematically compare the performance of
different data analysis workflows (ref).
Antibodies
Antibodies are notoriously difficult to identify due to their large sequence variety. Taking
advantage of the
Observed Antibody Space, we developed a
workflow to
identify antibodies in mass spectrometry data (ref).
ProtProtocols
As part of a project
within the EuBIC
initiative, we developed a framework
for fully reproducible, documented and
user-friendly pipelines for specific cases of
proteomics data analysis. Within this
framework, we created IsoProt, a full
data analysis pipeline for iTRAQ/TMT data
(ref).
Check it out here: IsoProt
at GitHub download it via our
docker-launcher
Biological evolution
See my former
studies of aging, sympatric speciation and
competitive cellular automata (refs).
Simulations
Almost anything can be
simulated on the computer including sand dunes,
opinion dynamics and linguistics (refs).
Statistical Mechanics
See my work on
generalized entropies and Fokker-Planck
equations (refs).
Old course on quantitative data analysis of proteomics data
This presents presents an overview of the main methods for analysis of data from peptide mass spectrometry and other -omics data.
Click on the picture for course material.
Courses
My group is running two Master's courses for Biostatistics in R (BMB830 and BMB831) at the Department of Biochemistry and Molecular Biology, a PhD course for Workshops in Applied Bioinformatics (BMB209), co-teaches the bachelor course Molecular Data Science (BMB547), co-teaches the bachelor course Bioinformatics I (BMB511), and taught Biostatistics and Experimental Design as part of the Master's programme Life Science Engineering and Informatics of the Sino-Danish University
Click on the picture for more information about Biostatistics in R.
Biostatistics in R
Lecture I: Introduction
Corresponding
R script
Lecture II: Basics
Corresponding R
script
Lecture III: Arrays, matrices and data frames
Corresponding R
script
Lecture IV: Data manipulation
Corresponding R
script
Lecture V: Visual methods
Corresponding
R script
Lecture VI: Basic statistics
Corresponding
R script
Lecture VII: Data modeling
Corresponding R
script
Lecture VIII: Statistical tests
Corresponding R
script
Lecture IX: Multi-variate analysis
Corresponding R
script
Lecture X: Interactivity in R
Corresponding R
script
Exercises
Course description: Lecture I: Data
analysis of omics data: general aspects
Lecture II: Proteomics data
Lecture III: Transcriptomics data
Lecture IV: Epigenomics data
Lecture V: Metabolomics data
Lecture VI: Data interpretation
Projects
We offer projects for Bachelor and Master students. To get an idea, please talk a look at our research. If you are interested, please contact me.
Click on the picture for a list of old projects.
Former or current student projects
First year bachelor projects
Functional analysis of a fish oil diet
How strong are our muscles?
Bachelor projects
Investigation of PTM cross talk in mice to resolve age- and tissue dependent patterns.
Large-scale investigation of protein variance in cancer tissues
Enhanced and animated visualization of temporal changes on the histone PTM landscape
Optimization of the data analysis pipeline to characterize combinatorial post-translational mod- i cation (PTMs) of histones
A proteomics analysis of protein abundance variations in cancer
Optimization of multi-threading capabilities in data clustering approaches
Master projects (individual study projects and full theses
Determination of cellular age as a method to assess aging effects in multicellular organisms
Bioinformatics in proteomics - supervised data analysis focused on protein complexes.
Intrinsically Disordered Protein Domains and Post-Translational Modifications - A Computational Biology Study
Wed-based application for visualization of proteins and their post translational modifications (PTMs) based on their quantification.
A fully reproducible and user-friendly workflow for the analysis of PTM-omics data
Implementation and optimization of a fully automatized pipeline for the analysis of middle-down MS data
Tandem-Mass spectrometry prediction based on Liquid Chromatography-Mass Spectrometry chromatogram using deep neural networks
Implementation of fully reproducible and scalable data analysis work ows in bioinformatics using Next ow
Computational proteomics analysis of histones and their post-translational modifications
Development of a statistical workflow to determine the relative post-translationally modification changes.
All apps are accessible as web services on this server. This means that they might be temporarily inaccessble due to too high usage. In this case, please try later or run the app(s) locally. For local implementations, we provide docker containers (usually "veitveit/app_name_in_lower_case"), access through the SDU Cloud (will only work when you are related to a Danish institution) or as conda packages (only PolySTest and VSClust).
Investigate the quantitiative behavior of protein complexes and arbitrary protein groups in human cells based on the data from ProteomicsDB. Find the source code here.
Elixir
(European Infrastructure for Biological
Information)
The Danish node
of the ELIXIR
ESFRI consortium implemented the bio.tools registry of
software in the life sciences (ref). The registry is now mainly maintained at SDU and adapts rich
annotations by basing software descriptions on the
EDAM
ontology. We are involved in multiple projects within the ELIXIR Tools Platform
aiming to improve and extend the FAIRness of software
tools, and the ELIXIR Proteomics
Community to ensure more standardization and benchmarking.
ELIXIR Denmark also organizes the Annual Danish Bioinformatics Conferences.
EuBIC
(European Bioinformatics Initiative)
Initiative of bioinformaticians in Europe to
improve support and coordination of training and
software development in proteomics informatics.
Conference: We are organizing conferences,
hackathons and workshops in Computational
Proteomics.
EuPA
(European Proteomics Association)
EuPA heads the national proteomics societies and organizes the EuPA conferences as well as multiple events like Summer and Winter Schools.