Skip to main navigation Skip to search Skip to main content

Automatic construction of molecular similarity networks for visual graph mining in chemical space of bioactive peptides: an unsupervised learning approach

  • Longendri Aguilera-Mendoza
  • , Yovani Marrero-Ponce*
  • , César R. García-Jacas
  • , Edgar Chavez
  • , Jesus A. Beltran
  • , Hugo A. Guillen-Ramirez
  • , Carlos A. Brizuela*
  • *Corresponding author for this work
  • Centro de Investigación Científicay de Educación Superior de Ensenada (CICESE)
  • Grupo GINUMED, Corporacion Universitaria Rafael Nuñez. Facultad de Salud
  • Universitat de València
  • University of California at Irvine
  • University of Bern

Research output: Contribution to journalArticlepeer-review

48 Scopus citations

Abstract

The increasing interest in bioactive peptides with therapeutic potentials has been reflected in a large variety of biological databases published over the last years. However, the knowledge discovery process from these heterogeneous data sources is a nontrivial task, becoming the essence of our research endeavor. Therefore, we devise a unified data model based on molecular similarity networks for representing a chemical reference space of bioactive peptides, having an implicit knowledge that is currently not explicitly accessed in existing biological databases. Indeed, our main contribution is a novel workflow for the automatic construction of such similarity networks, enabling visual graph mining techniques to uncover new insights from the “ocean” of known bioactive peptides. The workflow presented here relies on the following sequential steps: (i) calculation of molecular descriptors by applying statistical and aggregation operators on amino acid property vectors; (ii) a two-stage unsupervised feature selection method to identify an optimized subset of descriptors using the concepts of entropy and mutual information; (iii) generation of sparse networks where nodes represent bioactive peptides, and edges between two nodes denote their pairwise similarity/distance relationships in the defined descriptor space; and (iv) exploratory analysis using visual inspection in combination with clustering and network science techniques. For practical purposes, the proposed workflow has been implemented in our visual analytics software tool (http://mobiosd-hub.com/starpep/), to assist researchers in extracting useful information from an integrated collection of 45120 bioactive peptides, which is one of the largest and most diverse data in its field. Finally, we illustrate the applicability of the proposed workflow for discovering central nodes in molecular similarity networks that may represent a biologically relevant chemical space known to date.

Original languageEnglish
Article number18074
JournalScientific Reports
Volume10
Issue number1
DOIs
StatePublished - 1 Dec 2020

Fingerprint

Dive into the research topics of 'Automatic construction of molecular similarity networks for visual graph mining in chemical space of bioactive peptides: an unsupervised learning approach'. Together they form a unique fingerprint.

Cite this