Paper

Paper

Review

Understanding sequencing data as compositions: an outlook and review

Quinn, Thomas P.; Erb, Ionas; Richardson, Mark F.; Crowley, Tamsyn M.

数学コンピューター

Mathematics, Computer Science

BIOINFORMATICS

2018

VL / 34 - BP / 2870 - EP / 2878

abstract

Motivation: Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the assay technology itself, whereby the number of counts recorded for each sample is constrained by an arbitrary total sum (i.e. library size). Consequently, sequencing data, as compositional data, exist in a non-Euclidean space that, without normalization or transformation, renders invalid many conventional analyses, including distance measures, correlation coefficients and multivariate statistical models. Results: The purpose of this review is to summarize the principles of compositional data analysis (CoDA), provide evidence for why sequencing data are compositional, discuss compositionally valid methods available for analyzing sequencing data, and highlight future directions with regard to this field of study. Contact: contacttomquinn@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.

AccesS level

Green published, Gold other

Among papers in Mathematics

19^th

Effects of demand control on the complex dynamics of electric power system blackouts

Influratio 70

20^th

Very Fast Tree: speeding up the estimation of phylogenies for large alignments through parallelization and vectorization strategies

Influratio 65

21^st

Understanding sequencing data as compositions: an outlook and review

Influratio 59

22^nd

An information-theoretic approach to study spatial dependencies in small datasets

Influratio 59

23^rd

Analyzing the potential impact of BREXIT on the European research collaboration network

Influratio 59

Among papers in Computer Science

82^nd

Design of optimal nonlinear network controllers for Alzheimer's disease

Influratio 59

83^rd

Towards the use of similarity distances to music genre classification: A comparative study

Influratio 59

84^th

Understanding sequencing data as compositions: an outlook and review

Influratio 59

85^th

An ensemble-based method for the selection of instances in the multi-target regression problem

Influratio 58

86^th

Fragments of peer review: A quantitative analysis of the literature (1969-2015)

Influratio 58

PROYECTO FINANCIADO POR PLAN NACIONAL DE INVESTIGACIÓN AGENCIA ESTATAL DE INVESTIGACIÓN, MINISTERIO DE CIENCIA E INNOVACIÓN. PID2019-109127RB-I00

Más información

Influscience

Rankings

Web por Si2 Soluciones