Greetings đź‘‹
My name is Nhat Tran, and I also go by Jonny.
I’m a CS Ph.D. graduate from the University of Texas at Arlington with a background in machine learning and bioinformatics. I’m currently leading Extralit, an open-source platform that transforms how researchers extract structured data from scientific literature using AI and human validation workflows.
My research background focuses on bioinformatics with techniques centered around graph neural networks and multimodal data integration. I developed computational methods for combining sequence representation and heterogeneous network interactions among RNA sequences to aid in understanding non-coding RNA functions. This work led to open-source tools like OpenOmics that enable researchers to unleash the untapped potential of these rich, complex datasets.
Previously, as a Research Scientist at the Gates Foundation, I experienced firsthand the challenges research organizations face in extracting structured data from scientific literature. This inspired me to develop Extralit, which dramatically accelerates the process of turning unstructured research papers into analysis-ready datasets. Before that, at Genentech, I worked on large-scale data pipelines and machine learning models to establish QC standards for detecting low-quality sequencing parameters in NGS genomics.
My technical expertise spans developing and deploying optimized machine learning models on GPU-accelerated infrastructure, tailoring graph and NLP algorithms for complex bioinformatics challenges, and transforming unstructured data into actionable insights. Before focusing on machine learning and data science, I trained as a full-stack software developer through various internships and hackathons to build web and mobile applications in the DevOps space.
In my off time, I research espresso coffee science. I use machine learning and software development extensively - feel free to check out some of my works!
Research interests
- Graph neural networks (Heterogeneous graph, Graph representation learning)
- Natural language processing (Text classification, Text generation)
- Machine learning (Deep learning, Transfer learning, Multimodal learning)
- Bioinformatics (Non-coding RNA, RNA-protein interaction, RNA structure)
- Data science (Data engineering, Data visualization, Data integration)
- Software development (Full-stack web development, Mobile development, DevOps)
Recent news
- Mar 2025: Extralit has new affiliations with the Open Science Labs and is participating in the Google Summer of Code 2025 program!
- Jan 2024: LATTE2GO has been accepted to BIBM’23!
- Sep 2023: I joined the Gates Foundation as a Research Scientist in the Institute of Disease Modeling’s malaria team!
- Feb 2023: A new work titled “Protein function prediction by incorporating knowledge graph representation of heterogeneous interactions and gene ontology” has been submitted!
- Dec 2022: I successfully defended my Ph.D. dissertation!
- Aug 2021: I joined Genentech as a Data Scientist intern in the Oncology Bioinformatics group!
- May 2021: “OpenOmics: A bioinformatics API to integrate multi-omics datasets and interface with public databases” is now published in the Journal of Open Source Software!
- Sep 2020: A new preprint entitled “Layer-stacked Attention for Heterogeneous Network Embedding” is now on arXiv!
- Jan 2020: “Network Representation of Large-Scale Heterogeneous RNA Sequences with Integration of Diverse Multi-omics, Interactions, and Annotations Data” is now published in Pacific Symposium on Biocomputing (PSB) 2020!
- Dec 2018: “MicroRNA dysregulational synergistic network: discovering microRNA dysregulatory modules across subtypes in non-small cell lung cancers” is now published in BMC Bioinformatics journal!
- July 2017: “Improved microRNA biomarkers for pathological stages in lung adenocarcinoma via clustering of dysregulated microRNA-target associations” is now published in IEEE EMBC’17 journal!