TY - JOUR
T1 - Overlap and diversity in antimicrobial peptide databases
T2 - Compiling a non-redundant set of sequences
AU - Aguilera-Mendoza, Longendri
AU - Marrero-Ponce, Yovani
AU - Tellez-Ibarra, Roberto
AU - Llorente-Quesada, Monica T.
AU - Salgado, Jesús
AU - Barigye, Stephen J.
AU - Liu, Jun
N1 - Publisher Copyright:
© The Author 2015. Published by Oxford University Press. All rights reserved.
PY - 2015/8/1
Y1 - 2015/8/1
N2 - Motivation: The large variety of antimicrobial peptide (AMP) databases developed to date are characterized by a substantial overlap of data and similarity of sequences. Our goals are to analyze the levels of redundancy for all available AMP databases and use this information to build a new nonredundant sequence database. For this purpose, a new software tool is introduced. Results: A comparative study of 25 AMP databases reveals the overlap and diversity among them and the internal diversity within each database. The overlap analysis shows that only one database (Peptaibol) contains exclusive data, not present in any other, whereas all sequences in the LAMP-Patent database are included in CAMP-Patent. However, the majority of databases have their own set of unique sequences, as well as some overlap with other databases. The complete set of non-duplicate sequences comprises 16 990 cases, which is almost half of the total number of reported peptides. On the other hand, the diversity analysis identifies the most and least diverse databases and proves that all databases exhibit some level of redundancy. Finally, we present a new parallel-free software, named Dover Analyzer, developed to compute the overlap and diversity between any number of databases and compile a set of non-redundant sequences. These results are useful for selecting or building a suitable representative set of AMPs, according to specific needs.
AB - Motivation: The large variety of antimicrobial peptide (AMP) databases developed to date are characterized by a substantial overlap of data and similarity of sequences. Our goals are to analyze the levels of redundancy for all available AMP databases and use this information to build a new nonredundant sequence database. For this purpose, a new software tool is introduced. Results: A comparative study of 25 AMP databases reveals the overlap and diversity among them and the internal diversity within each database. The overlap analysis shows that only one database (Peptaibol) contains exclusive data, not present in any other, whereas all sequences in the LAMP-Patent database are included in CAMP-Patent. However, the majority of databases have their own set of unique sequences, as well as some overlap with other databases. The complete set of non-duplicate sequences comprises 16 990 cases, which is almost half of the total number of reported peptides. On the other hand, the diversity analysis identifies the most and least diverse databases and proves that all databases exhibit some level of redundancy. Finally, we present a new parallel-free software, named Dover Analyzer, developed to compute the overlap and diversity between any number of databases and compile a set of non-redundant sequences. These results are useful for selecting or building a suitable representative set of AMPs, according to specific needs.
UR - http://www.scopus.com/inward/record.url?scp=84943601205&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btv180
DO - 10.1093/bioinformatics/btv180
M3 - Artículo
C2 - 25819673
AN - SCOPUS:84943601205
SN - 1367-4803
VL - 31
SP - 2553
EP - 2559
JO - Bioinformatics
JF - Bioinformatics
IS - 15
ER -