Extralit: Structured Data Extraction from Scientific Literature

An open-source platform that helps researchers extract structured data from scientific literature by:

  1. Automating extraction of tables, figures, and unstructured text using AI document processing
  2. Providing schema-driven extraction with human validation workflows for high accuracy
  3. Integrating with existing research workflows for systematic reviews and meta-analyses
  4. Creating analysis-ready datasets that maintain complex relationships between data points

Features:

  • OCR and table extraction pipeline powered by Vision Transformers
  • LLM-assisted structured data extraction with human validation
  • Collaborative annotation interface for quality control
  • Flexible schema definition for any scientific domain
  • Integration with popular data science tools

Built with:

  • Document AI: Marker, PyMuPDF, Table-Transformer
  • ML/AI: LlamaIndex, Hugging Face Transformers
  • Backend: FastAPI, PostgreSQL, Elasticsearch
  • Frontend: Vue.js, TypeScript

Link to Documentation: https://docs.extralit.ai

Link to GitHub: https://github.com/extralit/extralit

Demo: https://demo.extralit.ai