Seminar Abstract:
Machine learning (ML) models have emerged as powerful tools for scientific discovery. Leveraging modern computing hardware allows for the generation of large data sets which can be used to train accurate, generalizable models to predict the physico-chemical behavior of molecular systems. However, the predictive power of such models is bounded by how chemical knowledge is encoded within the model. Unlike general-purpose neural networks, which treat molecular data as abstract numerical arrays, chemically-informed architectures incorporate physical and structural constraints directly into their mathematical form. This talk examines the design principles behind such architectures and traces their application across different intra- and intermolecular scales of organization.
At the level of individual molecules, graph neural networks represent atoms as nodes and bonds as edges, propagating information through the molecular graph in a manner that mirrors molecular topology. We will examine the results of different parameterizations of this information in the context of predicting bond dissociation enthalpies and adiabatic singlet-triplet excitation energies, and present novel analytic tools which connect learned model behaviors with chemical intuition.
We then extend this approach to predict the interaction behaviors of molecular species with their surroundings. We show that learned mixing operators mirroring established chemical intuition can reliably predict nonlinear blending behavior, and introduce a new tool inspired by operator symmetrization to analyze this interaction.
Lastly, we will demonstrate how advances in ML algorithm design can enable the accurate prediction of properties which are not possible to simulate through conventional means. By adapting techniques from computational topology and geometric deep learning, we construct a novel neural network architecture capable of predicting protein solubility from predicted protein structures. We show that the model’s internal representations of protein structure align with those obtained through conventional molecular dynamics simulations.
Taken together, these results reflect a common organizing principle: the most transferable and physically meaningful models are those whose structures recapitulate the interactions that govern chemical behavior.
