Greetings đź‘‹

My name is Nhat Tran, and I also go by Jonny.

I’m a CS Ph.D. graduate from the University of Texas at Arlington with a background in machine learning and bioinformatics. I’m currently leading Extralit, an open-source platform that transforms how researchers extract structured data from scientific literature using AI and human validation workflows.

My research background focuses on bioinformatics with techniques centered around graph neural networks and multimodal data integration. I developed computational methods for combining sequence representation and heterogeneous network interactions among RNA sequences to aid in understanding non-coding RNA functions. This work led to open-source tools like OpenOmics that enable researchers to unleash the untapped potential of these rich, complex datasets.

Previously, as a Research Scientist at the Gates Foundation, I experienced firsthand the challenges research organizations face in extracting structured data from scientific literature. This inspired me to develop Extralit, which dramatically accelerates the process of turning unstructured research papers into analysis-ready datasets. Before that, at Genentech, I worked on large-scale data pipelines and machine learning models to establish QC standards for detecting low-quality sequencing parameters in NGS genomics.

My technical expertise spans developing and deploying optimized machine learning models on GPU-accelerated infrastructure, tailoring graph and NLP algorithms for complex bioinformatics challenges, and transforming unstructured data into actionable insights. Before focusing on machine learning and data science, I trained as a full-stack software developer through various internships and hackathons to build web and mobile applications in the DevOps space.

In my off time, I research espresso coffee science. I use machine learning and software development extensively - feel free to check out some of my works!

Research interests

  • Graph neural networks (Heterogeneous graph, Graph representation learning)
  • Natural language processing (Text classification, Text generation)
  • Machine learning (Deep learning, Transfer learning, Multimodal learning)
  • Bioinformatics (Non-coding RNA, RNA-protein interaction, RNA structure)
  • Data science (Data engineering, Data visualization, Data integration)
  • Software development (Full-stack web development, Mobile development, DevOps)

Recent news