Guidelines
These guidelines form part of the activities developed in the COST Action CA17111 Short-Term Scientific Mission (STSM) entitled: ”Methods for integrating a standardized cistrome database in the latest Pinot grapevine genome browser” (Luis Orduña, under the guidance of Dr. Anne Francoise Adam-Blondon and Nicolas Francillonne).
Quick Guidelines
Under Construction…
Complete Guidelines for First-time Users
Goals
In the post-genomics Era, the enormous amount of data generated by omics technologies being stored in public databases considerably exceeds the analytical capacities of humans, making the use of computational resources imperative to store and analyse this information. Indeed, the amount of data within the grapevine community has been continuously increasing after this species genome was sequenced and released more than ten years ago. The aim of the COST action INTEGRAPE is to facilitate the access and the re-use of these data by making them FAIR. For genomes and their annotations, two sets of actions are developed: facilitate the deposition of well documented sequence data sets in EMBL-ENA archives and access to community JBrowses for the visualisation of the results. The aim of the STSM was to work on these two aspects for a data set of DAP-Seq. It also allowed us to update the current version of the JBrowse of the grapevine reference genome.
JBrowse organizes in tracks the different information of the dataset that is displayed. For instance, in a RNA-Seq experiment where gene expression in condition A is being compared against gene expression in control, one track could be showing the gene expression of the condition A and a different track could be showing the gene expression in control. Therefore, each track has to be properly annotated in order to be self-explanatory and understable to the JBrowse users.
This tutorial proposes guidelines for the standardization of metadata associated with tracks of genome annotations (genes, alignment of features, etc). The formats proposed are designed for a direct utilization in JBrowse genome browser. It is based on an example implemented at thePlant bioinformatics facility of URGI for the JBrowse of the grapevine reference genome sequence. In order to achieve the objective previously outlined, this tutorial describes in first place a series of general recommendations to follow, regardless of the type of dataset that is wanted to be displayed. Second, it demonstrates examples of properly annotated metadata for two different omic techniques (i.e. RNA-Seq and DNA Affinity Purification Sequencing [DAP-Seq] experiments).
General recommendations
How to organize the information
JBrowse is a genomic and transcriptomic data visualisation tool that allows the users to show different kinds of tracks. In this document, three types of tracks are going to be taken in consideration:
- Alignment tracks: They contain alignment results, where each alignment output (BAM, CRAM, BigWig file, etc) will be shown in an independent alignment track.
- Reference genome tracks: This kind of tracks will be used to show the sequence of a reference assembly genome.
- Annotations tracks: This kind of tracks will be used to show the annotations of the assembly they are referring to.
JBrowse allows the user to organize the tracks into categories, giving them the ability to locate their tracks of interest. Therefore, it is recommended to organize properly the tracks based on which type of track they are, in order to construct a user-friendly JBrowse with easy access to the tracks the user is interested in (see instructions below).
About the metadata
JBrowse manages the metadata in a very simple way. To add metadata, the first step is to create a .csv file containing as many lines as tracks are present. The .csv file is used as a template, from which a JBrowse presents a clear structure to display the track’s metadata. By default, JBrowse does not include any kind of metadata to the track. However, we feel that some information should be mandatory and we propose below a standard for these metadata:
- Label: Label of the track in the trackList.json file
- Track: Name of the track that will be displayed. It is important to mention that this track name has to be the same label given to the track in the trackList.json file of the JBrowse. Try to be as explicit as possible for an outsider of your project while keeping simple.
- Category: name of the category to which the track has been assigned. As explained before, the category of the track should allow the user to easily select the information to be displayed. Therefore, all the annotations and reference genome tracks will be in the annotations and reference sequences categories respectively. However, regarding the alignment tracks, the classification is more complex, as there are many techniques with an alignment step in their protocol. We recommend to classify alignment tracks depending on the omic technique it is coming from, followed by the name of the experiment. For instance, the category for a DAP-Seq experiment for the MYB14 TF will be “DAP-Seq data / MYB14 experiment”.
- Description: Explanation of the information that is displayed in the track. This description has to be brief and concise, in order to allow the JBrowse users to understand the track data without overwhelming them with too much information. This field will vary depending on the omic technique of origin of the data displayed in the track (see examples below).
- Source: Link to the database where the data set supporting the track is available.
- Species: Name of the species against which the track reads have been aligned.
- Genome-build: Name of the assembly against which the track reads have been aligned.
- Principal investigator: Name of the researcher responsible for the study in which the dataset displayed on the track was generated.
- PubMed References: Link to the paper in which the data of the track was published. It does not necessarily have to be a Pubmed References link.
This structure allows the users to upload to the JBrowser datasets coming from different omics techniques, such as RNA-Seq or DAP-Seq, as long as the information provided in the Description column is clear enough to describe the information that is being displayed (see examples below).
About the track display
As mentioned previously, the JBrowse displays the result of an alignment in what is called an alignment track. Alignment tracks can be displayed with different appearances depending on the track input. For instance, if the input is a CRAM file, the track will be called CRAM track and it will not have the same appearance and information as if the input is a coverage file (BigWIG file), that will be called histogram coverage track.
Regardless of the omic technique that is being shown, the most easily understandable way of displaying alignments is through histogram coverage tracks. The BAM or specially the CRAM tracks can contain more information and more ways of showing the information. However, these types of tracks also require more skill with the JBrowse software on the part of the user to display the information in an informative way, which has to be weighted in terms of cost/benefice.
In relation with coverage histogram tracks, it is highly advisable to normalize the coverage across all the genome using normalizing methods such as RPKM, that take into account the library sequencing depth, so that gene expression can be compared between different tracks.
Metadata examples
Label Track Category Description Source Species Genome-build Principal investigator PubMed Reference
DNA DNA Reference sequence Track with the reference genome https://urgi.versailles.inra.fr/Species/Vitis Vitis vinifera Vitis vinifera 12X.2 Anne-Françoise Adam-Blondon Doi
The most important point is making sure that both the category and the description show clearly what version of the reference genome assembly is displayed. As shown in the example, the Genome-build field must show the name of the assembly.
Label Track Category Description Source Species Genome-build Principal investigator PubMed Reference
VCost_v27 VCost_v25 annotations Track with the Vcost v27 annotations https://urgi.versailles.inra.fr/Species/Vitis Vitis vinifera Vitis vinifera 12X.2 Anne-Françoise Adam-Blondon Doi
As in the reference genome tracks, both the category and description of the track must show clearly and unequivocally that the track contains genome annotations and what version of the annotation is displayed. Additionally, the category must also indicate what kind of annotations are shown (gene annotations, Transposable Elements annotations, etc). Remarkably, in the annotation tracks the Genome-build field indicates the genome assembly to which annotations belong.
Label Track Category Description Source Species Genome-build Principal investigator PubMed Reference
MYB13 histogram MYB13 histogram DAP-seq data / MYB13-14-15 gDNA extracted from cv. ‘Pinot Noir’ challenged against MYB13 TF (Vitvi05g01732 ) amplified from cv. ‘Pinot Noir’ – Vitis vinifera Vitis vinifera 12X.2 Tomas Matus –
MYB13 control histogram MYB13 control histogram DAP-seq data / MYB13-14-15 gDNA extracted from cv ‘Pinot Noir’ challenged against pHALO affinity tag – Tomas Matus –
MYB13 CRAM MYB13 CRAM DAP-seq data / MYB13-14-15 gDNA extracted from cv. ‘Pinot Noir’ challenged against MYB13 TF (Vitvi05g01732 ) amplified from cv. ‘Pinot Noir’ – Tomas Matus –
MYB13 control CRAM MYB13 control CRAM DAP-seq data / MYB13-14-15 gDNA extracted from cv ‘Pinot Noir’ challenged against pHALO affinity tag – Tomas Matus –
As for something that can be applicable to all alignment tracks, the Species and Genome-build fields should make reference to the name of the species against which the track reads have been aligned and the name of the assembly against which the track reads have been aligned respectively.
As explained in the metadata general description, in alignments tracks the category is used to describe both the type of data shown in the track and the name of the experiment (DAP-Seq data / MYB14 experiment).
Independently, the Description field is used to describe the DAP-Seq experiment. In order to describe it properly, the structure of the field should be as follows: “gDNA extracted from cv. ‘Pinot Noir’ challenged against MYB14 TF (Vitvi05g01732) amplified from cv. ‘Pinot Noir’”.
RNA-Seq examples: under construction