Pairwise Statistical Significance - Center for Ultra-scale Computing and Information Security

Evaluation of pairwise statistical significance

At CUCIS, we are focused on developing sophisticated solutions to problems relating to scalable processing and I/O; computer security and information assurance; and high performance data mining.

To learn more about these projects, please see our publications page.

Evaluation of pairwise statistical significance

Error per Query (EPQ) versus Coverage plots are often used to evaluate the accuracy of different approaches of statistical significance. To create these plots, the list of pairwise comparisons are sorted based on statistical significance, and subsequently, the lists are examined, from best score to worst. Going down the list, the coverage count is increased by one if the two members of the pair are homologs, and the error count is increased by one if they are not. At a given point in the list, EPQ is the total number of errors incurred so far, divided by the number of queries. Coverage at that point is the fraction of homolog pairs detected at this significance level. For each of the 86 queries, 2771 comparisons are done, and EPQ vs. Coverage curves are plotted.

The EPQ is defined as

EPQ = F_num/Q_num, ---------(1)

where F_numis the total number of non-homologous sequences detected as homologs (i.e., false positives) and Q_num is the total number of queries.
The Coverage can be given by

Coverage = H_d/H_t, ----------(2)

where H_d and H_t are the number of homologous pairs detected and the total number of homologous pairs presented in the sequence database, respectively.

See Ref.1 for more details about the conception of Error per Query (EPQ) versus Coverage

Reference :

1. Agrawal, A., V. Brendel, et al. (2008). "Pairwise statistical significance versus database statistical significance for local alignment of protein sequences." Bioinformatics Research and Applications: 50-61.

» Return to top