GO PaD: Gene Ontology Partition Database

Home Back to Menu

Database Tutorial

The Gene Ontology Specificity Quantifier Database allows for simple as well as more complex (compound) queries. The Gene Ontology Specificity Quantifier Database was written with a PHP front-end to a MySQL database.

Gene Ontology Database Advance Search Option (Figure 1) shows an example of a more complex query. In this example, the search involves membrane-related genes with a desire for more general term (information less than 4 bits). The search is narrowed to the cell component branch of the Gene Ontology.

As shown in Gene Ontology Graphical Structure Option (Figure 2), information can be projected onto a scale from general terms (low bits) to high specify (large number of bits). Using this information, gene sets can be partitioned into unbiased sets- with information distributed more evenly across the set (than by using the Gene Ontology’s graphical structure as a proxy for specificity).

The results are shown in Gene Ontology Database Detail Option depicts information on GO terms and information theoretic metrics used (Figure 3). From here, the user can see which partitions a particular GO term belongs to as well as various information theoretic metrics used to determine this. The user can then print using Gene Ontology Database Print Option (Figure 4) or export the results using Gene Ontology Database Export Option (Figure 5) to a variety of formats, including XML, Excel, CSV, and Word (see Figure 4 and Figure 5). Gene Ontology Database Export Option to Excel (Figure 6) shows an example of export to Excel.

An example of how the GO can be partitioned into a set of nodes with similar information (8 nodes in this case) is shown in Figure 7 (with the numbers representing GO identifiers for the unselected nodes). An example of the improvement in selecting uniform partitions is shown in Figure 8 and Figure 9 .  These show histograms of GO level nodes versus GO partitions with a tighter distribution for the GO partition-based information compared to that of graphical structure-derived GO level node information. Another example using partitions for analysis is described next . This example uses the “HOX_LIST_JP” set from Gene Set Enrichment Analysis (GSEA) dataset. This data contains HOX proteins involved in hematopoiesis. Figure 10 shows 6-node GO term partition applied to this set. Several findings are apparent: “regulation of metabolism” and “transcription” are highly enriched, and their p-values are confirmed to be 1.38 x 10^-38 and 7.91 x 10^-38 respectively. However, several smaller clusters of proteins exist within this set, which correspond to annotation with the “response to stimulus,” “organismal physiological process,” and “biopolymer metabolism” nodes. Thus, Figure 10 demonstrates what appear to be several functional sub-classifications of HOX genes.