Unsplashed background img
Research

The Computational Proteomics Group develops and applies computational solutions for improved data analysis in large-scale omics experiments with focus on proteins and their post-translational modifications (PTMs). The aim is to better understand the functional protein states in order to determine, confirm and predict their contribution to cell behavior and disease.
Our main research interests include


Developing software and workflows for mass spectrometry data


Studying biology through post-translational modifications


Creating statistical tools for quantifying and interpreting omics data


Implementing deep learning models for mass spectrometry data processing


Modelling and interpreting signalling in molecular pathways



For the full publication record, see Google Scholar.

Your browser cannot display svg
Software development
Proteomics informatics

All software is open source and available through the following repositories:
Bitbucket, GitHub (1/2), and GitHub (2/2)

  • R and Python scripts for functional analysis of omics data

  • Algorithms for middle-down and top-down mass spectrometry

  • Deep learning for enhanced feature detection in MS data

  • Portable pipelines and automated workflow composition for HPC and cloud

  • Statistical methods for proteoform and protein complex characterizations

Data resources

The Elixir Tools and Data Service Registry, bio.tools, launched in January 2015, hosts details about tens of thousands of software in the life sciences.
Our new MS2AI tool allows creating and categorizing millions of mass spectra.

Visit CrosstalkDB hosting quantitative data for crosstalk between histone proteins measured by middle-down mass spectrometry.


Collaboration with the PRIDE repository to enhance metadata annotations using SDRF

Interactive and user-friendly software

We develop web applications for interactive data analysis (see e.g. Shiny Apps)
We have a strong focus on using smart and extensive visualization for deep exploration of omics data.
Most of these applications are also available as stand-alone versions via docker and conda, and as command-line tools for direct integration into data analysis workflows.

Unsplashed background img
Quantitative analysis
  • General literature
    For an overview of methods used in PTMomics see our review as well as our editorial about reproducibility. We published tutorials for CrossTalkMapper, VSClust + complex analysis and general PTM analysis, and have further training material on .

  • Tool suites for quantitative Omics
    Check out our new tool suite in development: OmicsQ facilitating data processing and further analysis with PolySTest, VSClust et al.
    We also collaborated with the development of a Shiny app for LC-MS metabolomics data: MetaboLink.

  • Statistical tests: PolySTest
    New statistical test for data with low replicates numbers and many missing values, combined with well-performing statistical tests for high confidence detection of differentially regulated features (ref.) PolySTest.
    The old LimmaRP approach where we showed the power of combining limma and rank products ( ref.) is still accessible here: LimmaRP

  • Data clustering
    VSClust considers feature variance leading to more accurate clustering results (ref. and tutorial). VSClust. See also how to estimate appropriate parameter values for fuzzy c-means clustering (ref). (old app: FuzzyClust),

  • Protein Complexes
    ComplexBrowser to investigate the behavior of protein complexes in your proteomics data set (ref ). CoExpresso to look for co-regulatory patterns within hundreds of human cell types (ref).

    your browser cannot display svg-files
  • PTM crosstalk
    Large-scale estimation of crosstalk between nearby residues (ref. and ref.). Interplay scores estimate the quantitative crosstalk between PTMs on a protein (ref. and ref.). Crosstalk patterns can be visualized by CrosstalkMapper.

Chromatin Biology
your browser cannot display svg-files
  • Internal fragment ions
    Internal fragments ions are considered crucial for the identification of proteoforms in top-down mass spectrometry but are ubiquitous and noisy. We show how they can be used to validate proteoforms (ref).

  • Middle-down mass spectrometry
    We develop and apply a workflow to quantify PTMs on histone tails (refs). Coming soon: ProteoformQuant for improved quantification of proteoforms in middle-down MS data.

  • CrosstalkDB
    With quantitative data from middle-down and top-down mass spectrometry, the web server collects and analyzes the input files, followed by statistical assessment of the crosstalk between measured PTMs (refs).
    Access CrosstalkDB
    your browser cannot display svg-files

  • Computational models
    Taking simple rules for writing, propagating and deleting histone PTMs on chromatin, we were able to reproduce global patterns measured by ChIP-seq experiments. The implementation of crosstalk rules results in a rich spatial and temporal behavior (ref).

Deep learning in proteomics
  • Peptide representations for deep learning
    In collaboration with the Röttger group, we developed an environment to retrieve millions of mass spectra (MS1 and MS2) from the public repository PRIDE, to categorize them in a database, and to create data representations that can be directly used for machine learning purposes: MS2AI (refs).

  • Variability in MS

    Blind application of deep learning methods to MS leads to high bias and inaccurrate predictions. We investigated the impact of data variability on the prediction bias in (ref).

  • Registry

    See also the AIMe registry to report AI-based biomedical results in a standardized and reproducible manner (ref) and our community paper about machine learning in proteomics (ref).

Workflows and standards

your browser cannot display svg-files
  • Standardization and community efforts
    See the proposed notation of proteoforms: ProForma (ref). Accurate metadata annotation is crucial for ensuring reproducibility and data repurposing. Together with the EuBIC community, we developed a data standard for proteomics metadata: (refs.). As part of ELIXIR DK, we annotate proteomics tools for bio.tools (refs.)) and are part of the proteomics community (see also white paper, ref).

  • WOMBAT-P
    We implemented four scalable and portable workflows for the analysis or label-free data as part of an ELXIR implementation study. They allow to systematically compare the performance of different data analysis workflows (ref).

    no image
  • Antibodies
    Antibodies are notoriously difficult to identify due to their large sequence variety. Taking advantage of the Observed Antibody Space, we developed a workflow to identify antibodies in mass spectrometry data (ref).

  • ProtProtocols
    As part of a project within the EuBIC initiative, we developed a framework for fully reproducible, documented and user-friendly pipelines for specific cases of proteomics data analysis. Within this framework, we created IsoProt, a full data analysis pipeline for iTRAQ/TMT data (ref). Check it out here: IsoProt at GitHub download it via our docker-launcher

Complex Systems
  • Biological evolution
    See my former studies of aging, sympatric speciation and competitive cellular automata (refs).

  • Simulations
    Almost anything can be simulated on the computer including sand dunes, opinion dynamics and linguistics (refs).

  • Statistical Mechanics
    See my work on generalized entropies and Fokker-Planck equations (refs).

no image
Unsplashed background img

Old course on quantitative data analysis of proteomics data

This presents presents an overview of the main methods for analysis of data from peptide mass spectrometry and other -omics data.


Click on the picture for course material.

Data analysis of proteomics data


Courses

My group is running two Master's courses for Biostatistics in R (BMB830 and BMB831) at the Department of Biochemistry and Molecular Biology, a PhD course for Workshops in Applied Bioinformatics (BMB209), co-teaches the bachelor course Molecular Data Science (BMB547), co-teaches the bachelor course Bioinformatics I (BMB511), and taught Biostatistics and Experimental Design as part of the Master's programme Life Science Engineering and Informatics of the Sino-Danish University


Click on the picture for more information about Biostatistics in R.

Biostatistics in R

Biostatistics in R I

Lecture I: Introduction
Corresponding R script
Lecture II: Basics
Corresponding R script
Lecture III: Arrays, matrices and data frames
Corresponding R script
Lecture IV: Data manipulation
Corresponding R script
Lecture V: Visual methods
Corresponding R script
Lecture VI: Basic statistics
Corresponding R script
Lecture VII: Data modeling
Corresponding R script
Lecture VIII: Statistical tests
Corresponding R script
Lecture IX: Multi-variate analysis
Corresponding R script
Lecture X: Interactivity in R
Corresponding R script
Exercises

Biostatistics in R II

Course description: Lecture I: Data analysis of omics data: general aspects
Lecture II: Proteomics data
Lecture III: Transcriptomics data
Lecture IV: Epigenomics data
Lecture V: Metabolomics data
Lecture VI: Data interpretation

Projects

We offer projects for Bachelor and Master students. To get an idea, please talk a look at our research. If you are interested, please contact me.


Click on the picture for a list of old projects.

Former or current student projects

First year bachelor projects

Functional analysis of a fish oil diet

How strong are our muscles?

Bachelor projects

Investigation of PTM cross talk in mice to resolve age- and tissue dependent patterns.

Large-scale investigation of protein variance in cancer tissues

Enhanced and animated visualization of temporal changes on the histone PTM landscape

Optimization of the data analysis pipeline to characterize combinatorial post-translational mod- i cation (PTMs) of histones

A proteomics analysis of protein abundance variations in cancer

Optimization of multi-threading capabilities in data clustering approaches

Master projects (individual study projects and full theses

Determination of cellular age as a method to assess aging effects in multicellular organisms

Bioinformatics in proteomics - supervised data analysis focused on protein complexes.

Intrinsically Disordered Protein Domains and Post-Translational Modifications - A Computational Biology Study

Wed-based application for visualization of proteins and their post translational modifications (PTMs) based on their quantification.

A fully reproducible and user-friendly workflow for the analysis of PTM-omics data

Implementation and optimization of a fully automatized pipeline for the analysis of middle-down MS data

Tandem-Mass spectrometry prediction based on Liquid Chromatography-Mass Spectrometry chromatogram using deep neural networks

Implementation of fully reproducible and scalable data analysis work ows in bioinformatics using Next ow

Computational proteomics analysis of histones and their post-translational modifications

Development of a statistical workflow to determine the relative post-translationally modification changes.

Unsplashed background img
Availability

All apps are accessible as web services on this server. This means that they might be temporarily inaccessble due to too high usage. In this case, please try later or run the app(s) locally. For local implementations, we provide docker containers (usually "veitveit/app_name_in_lower_case"), access through the SDU Cloud (will only work when you are related to a Danish institution) or as conda packages (only PolySTest and VSClust).

VSClust: Variance-sensitive clustering

Improved clustering of any quantitative data, statistical testing and pathway analysis. Find the source code here.

PolySTest: Detection of differentially regulated features

Combined statistical testing for data with few replicates and missing values. Find the source code here.

QC and quantification of protein complexes

Carry out quality control of quantification in your dataset and investigate the behavior of protein complexes. Find the source code here.

Co-regulation of protein groups in human cells

Investigate the quantitiative behavior of protein complexes and arbitrary protein groups in human cells based on the data from ProteomicsDB. Find the source code here.

Interactive tool for protein inference, summarization, and visualization

Run different ways of (parsimonious) protein inference and optimized summarization based on factor analysis. The results can be extensively assessed both in numbers and visually. Find the source code here.

Unsplashed background img

Elixir
(European Infrastructure for Biological Information)

The Danish node of the ELIXIR ESFRI consortium implemented the bio.tools registry of software in the life sciences (ref). The registry is now mainly maintained at SDU and adapts rich annotations by basing software descriptions on the EDAM ontology. We are involved in multiple projects within the ELIXIR Tools Platform aiming to improve and extend the FAIRness of software tools, and the ELIXIR Proteomics Community to ensure more standardization and benchmarking.
ELIXIR Denmark also organizes the Annual Danish Bioinformatics Conferences.

no image

EuBIC
(European Bioinformatics Initiative)

Initiative of bioinformaticians in Europe to improve support and coordination of training and software development in proteomics informatics.
Conference: We are organizing conferences, hackathons and workshops in Computational Proteomics.

EuPA
(European Proteomics Association)

EuPA heads the national proteomics societies and organizes the EuPA conferences as well as multiple events like Summer and Winter Schools.

ga('create', 'UA-54594747-2', 'auto'); ga('send', 'pageview');