Senior Principal Scientist/Computational Toxicologist Genentech South San Francisco, California, United States
Single animal raw data in SEND format, while valuable, is insufficient for robust predictive modeling and comprehensive database searching. SEND data primarily captures individual animal measurements, lacking the crucial context of overall study interpretation. Study reports, however, contain vital information regarding test article-related effects, distinguishing true toxicological findings from incidental or spontaneous observations. Without access to these interpreted conclusions, building accurate predictive models and performing meaningful database searches for structure-activity relationship analyses in computational toxicology become significantly hampered. To overcome this limitation, artificial intelligence (AI) approaches are being leveraged to extract and structure these crucial study conclusions: Natural language processing (NLP) techniques, such as those used in the internally developed HAWK system, can retrospectively mine historical PDF reports, identifying and extracting test article-related findings into a SEND-like "Study Report (SR) Domain" format. This structured data then becomes searchable and usable for downstream applications, including AI/ML model building. Prospectively, natural language generation (NLG) tools like ARK assist study directors in capturing key study outcomes directly within their workflow, generating abbreviated report abstracts and further populating the SR Domain. This combined approach facilitates the creation of a comprehensive database linking chemical structures, observed effects, and dose/exposure information, empowering researchers to build more accurate predictive models and perform more targeted toxicity searches, ultimately improving decision-making in new drug development projects.