Article Text

Download PDFPDF
Brief guide to the analysis, interpretation and presentation of microbiota data
  1. Stefan Zalewski1,
  2. Christopher J Stewart2,
  3. Nicholas D Embleton1,
  4. Janet Elizabeth Berrington1
  1. 1 Newcastle Neonatal Service, Royal Victoria Infirmary, Newcastle upon Tyne, UK
  2. 2 Department of Molecular Virology and Microbiology, Alkek Center for Metagenomics and Microbiome Research, Baylor College of Medicine, Houston, Texas, USA
  1. Correspondence to Dr Stefan Zalewski, Newcastle Neonatal Service, Royal Victoria Infirmary, Newcastle upon Tyne NE1 4LP, UK; Stefan.Zalewski{at}

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction to terminology and technologies

There has been an increase in research using new ‘omic’ technologies1 (those allowing the study of a large biological data set) designed to define and describe the microorganisms we carry, and their impact on health and disease. Omic technologies generate enormous quantities of complex data, so a major challenge is interstudy and intrastudy comparisons. This article provides an overview of terminology and data generation, and uses one of our study data sets to demonstrate different presentations of those data.

Bacteria are important in a range of physiological processes in humans2: nutrient assimilation, vitamin production, modification of the nervous system (the gut-brain axis) and development of the immune system.3 Pathological changes in gut microbial communities (dysbiosis) have been associated with a wide range of diseases including skin and psychiatric disorders,4 5 as well as diseases with high mortality in preterm infants such as infection and necrotising enterocolitis (NEC). NEC, for example, has been associated with reduced microbial diversity and increase in specific classes of bacteria, such as Gammaproteobacteria.6–8

Identified by culture, bacteria were traditionally classified by physical characteristics into taxonomic ranks—phylum, class, order, family, genus and species. Sequencing-based technologies rely on similarity in DNA sequences to determine organisms’ phylogenetic relatedness to other species.

DNA sequencing is a method for assessing microbial communities which works either through identification of all genomes within a community (metagenomics), or using specific marker genes such as the 16S rRNA gene.9 The bacterial 16S rRNA gene can be used to group sequences by percentage similarity to each other, typically ‘binning’ (collecting together) all sequences with more than 97% similarity as a single operational taxonomic unit (OTU), which is then cross-referenced with databases to identify bacterial genus (or higher taxonomic levels if genus is unavailable). The relatively short read length of 16S rRNA gene …

View Full Text


  • Contributors JEB, NDE, SZ: conception and plan for work. SZ: drafting the article. CJS: provision of data and images for illustration of principals. CJS, NDE, JEB: critical appraisal and review of article. SZ, CJS, NDE, JEB: final approval of the article to be published. SZ, CJS, JEB: response to editorial and reviewers' comments, and appropriate changes.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.