Computational Proteomics

Veit Schwämmle

Associate professor Email
Protein Research Group
Department of Biochemistry and Molecular Biology
University of Southern Denmark

Research

The Computational Proteomics Group develops and applies computational solutions for improved data analysis in large-scale omics experiments with focus on proteins and their post-translational modifications (PTMs). The aim is to better understand the functional protein states in order to determine, confirm and predict their contribution to cell behavior and disease.
Our main research interests are

Software development for data from protein mass spectrometry experiments

Chromatin biology: regulatory control by histone modifictions

Tools for quantification and interpretation of omics data

Simulations of molecular pathways
For the full publication record, see Google Scholar.

Your browser cannot display svg

Software development

Proteomics data analysis

- R scripts, Shiny apps and jupyter notebooks for functional analysis of proteomics and PTMomics data

- New software solutions for middle-down and top-down mass spectrometry

- Portable software pipelines and automatic workflow composition

- Co-expression of proteins and protein complexes

- Visualization of PTM behavior and their crosstalk

Databases

Visit the CrosstalkDB hosting quantitative data for crosstalk between histone proteins measured by middle-down mass spectrometry.
The Elixir Tools and Data Service Registry, bio.tools, launched in January 2015, hosts details about thousands of databases and bioinformatics tools.

Shiny apps

The Shiny framework facilitates embedding R-scripts into simple GUIs. We develop web applications for interactive data analysis (see Shiny Apps)
Try our new tool for variance-sensitive clustering of large data sets: Launch VSClust
and check out our tools to characterize protein complexes based on your data Launch ComplexBrowser or generally in human cells Launch CoExpresso

Unsplashed background img
Quantitative analysis

  • Review and Tutorial For an overview of methods used in PTMomics see (refs).

  • Characterization of PTM crosstalk Large-scale estimation of crosstalk between nearby residues (refs). Interplay scores provide quantitative information about negative and positive crosstalk between PTMs on a peptide (refs).

  • Statistical tests The complexity of high-throughput mass spectrometry experiments often leads to low replicates numbers and many missing values. We implemented a combination of statistical tests for high confidence detection of differentially regulated features (a new and better approach is under development) (ref). Launch LimmaRP

  • Data clustering After providing a simple way to estimate appropriate parameter values for fuzzy c-means clustering (ref). (old app: Launch FuzzyClust), find the new tool VSClust that additionally considers feature variance leading to more accurate clustering results(ref). Launch VSClust

  • Protein Complexes We developed tools to quantify protein complexes and to quantitatively characterize the behavior of their subunits. Launch ComplexBrowser to investigate the behavior of protein complexes in your proteomics data set (manuscript in preparation). Launch CoExpresso to look for co-expression pattern within up to 150 human cell types (preprint).

Chromatin Biology
  • Middle-down mass spectrometry We developed a workflow to quantify PTMs on histone tails (refs).

  • CrosstalkDB With quantitative data from middle-down and top-down mass spectrometry, the web server collects and analyzes the input files, followed by statistical assessment of the crosstalk between measured PTMs (refs).
    Access CrosstalkDB
    your browser cannot display svg-files

  • Computational models Taking simple rules for writing, propagating and deleting histone PTMs on chromatin, we were able to reproduce global patterns measured by ChIP-seq experiments. The implementation of crosstalk rules results in a rich spatial and temporal behavior (ref).
    your browser cannot display svg-files

  • Current activities Development of automatized software pipeline for the analysis more middle-down MS data and new smart visualizations to exhibit important patterns with complex and heavily interconnected data.

Workflows and standards

  • Standardization and community efforts See the proposed notation of proteoforms: ProForma (ref). As part of ELIXIR DK, we annotate proteomics tools for bio.tools and are part of the proteomics use case (see also white paper, ref)

  • ProtProtocols As part of a project within the EuBIC initiative, we developed a framework for fully reproducible, documented and user-friendly pipelines for specific cases of proteomics data analysis. Within this framework, we just finished IsoProt, a full data analysis pipeline for iTRAQ/TMT data (preprint). Check it out here: IsoProt at GitHub download it via our docker-launcher

no image
Complex Systems
  • Biological evolution See my former studies of aging, sympatric speciation and competitive cellular automata (refs).

  • Simulations Almost anything can be simulated on the computer including sand dunes, opinion dynamics and linguistics (refs).

  • Statistical Mechanics See my work on generalized entropies and Fokker-Planck equations (refs).

no image
Unsplashed background img

Old course on data analysis of proteomics data

This presents presents an overview of the main methods for analysis of data from peptide mass spectrometry and other -omics data.


Click on the picture for course material.

Data analysis of proteomics data


Biostatistics in R

2 Courses (BMB830 and BMB831) at the Department of Biochemistry and Molecular Biology.


Click on the picture for more information.

Biostatistics in R

Biostatistics in R I

Course description: SDU web page

Lecture I: Introduction
Corresponding R script
Lecture II: Basics
Corresponding R script
Lecture III: Arrays, matrices and data frames
Corresponding R script
Lecture IV: Data manipulation
Corresponding R script
Lecture V: Visual methods
Corresponding R script
Lecture VI: Basic statistics
Corresponding R script
Lecture VII: Data modeling
Corresponding R script
Lecture VIII: Statistical tests
Corresponding R script
Lecture IX: Multi-variate analysis
Corresponding R script
Lecture X: Interactivity in R
Corresponding R script
Exercises

Biostatistics in R II

Course description: TODO: check linkesSDU web page

Lecture I: Data analysis of omics data: general aspects
Lecture II: Proteomics data
Lecture III: Transcriptomics data
Lecture IV: Epigenomics data
Lecture V: Metabolomics data
Lecture VI: Data interpretation

1st Year Bachelor Project

Functional analysis of a fish oil diet.


Click on the picture for course material.

Unsplashed background img
VSClust: Variance-sensitive clustering

Improved clustering of any quantitative data, statistical testing and pathway analysis. Find the source code here.

Detection of differentially regulated features

Combined statistical testing for data with few replicates and missing values. Find the source code here.

QC and quantification of protein complexes

Carry out quality control of quantification in your dataset and investigate the behavior of protein complexes. Source code will be available soon.

Co-regulation of protein groups in human cells

Investigate the quantitiative behavior of protein complexes and arbitrary protein groups in human cells based on the data from ProteomicsDB. Find the source code here.

Unsplashed background img

Elixir
(European Infrastructure for Biological Information)

The Danish node of the Elixir consortium implements and maintains the registry of software tools in life science (ref). The registry adapts rich annotations by basing software descriptions on the EDAM ontologies. I am involved in several projects aiming to improve and extend curation of software tools, the proteomics use case, automatic synthesis of workflows from the registry content, as well as maintenance of the EDAM ontology.

no image

EuBIC
(European Bioinformatics Initiative)

Initiative of bioinformaticians in Europe to improve support and coordination of training and software development in proteomics informatics.
Conference: We are organizing conferences, hackathons and workshops in Computational Proteomics.

Unsplashed background img