Sharing Chemical Relationships Does Not Reveal Structures


Matlock, M., & Swamidass, S. J. (2013). Sharing Chemical Relationships Does Not Reveal Structures. Journal of chemical information and modeling, 54(1), 37-48.

Abstract: In this study, we propose a new, secure method of sharing useful chemical information from small-molecule libraries, without revealing the structures of the libraries’ molecules. Our method shares the relationship between molecules rather than structural descriptors. This is an important advance because, over the past few years, several groups have developed and published new methods of analyzing small-molecule screening data. These methods include advanced hit-picking protocols, promiscuous active filters, economic optimization algorithms, and screening visualizations, which can identify patterns in the data that might otherwise be overlooked. Application of these methods to private data requires finding strategies for sharing useful chemical data without revealing chemical structures. This problem has been examined in the context of ADME prediction models, with results from information theory suggesting it is impossible to share useful chemical information without revealing structures. In contrast, we present a new strategy for encoding the relationships between molecules instead of their structures, based on anonymized scaffold networks and trees, that safely shares enough chemical information to be useful in analyzing chemical data, while also sufficiently blinding structures from discovery. We present the details of this encoding, an analysis of the usefulness of the information it conveys, and the security of the structures it encodes. This approach makes it possible to share data across institutions, and may securely enable collaborative analysis that can yield insight into both specific projects and screening technology as a whole.