Extralit: Structured Data Extraction from Scientific Literature
An open-source platform that helps researchers extract structured data from scientific literature by:
- Automating extraction of tables, figures, and unstructured text using AI document processing
- Providing schema-driven extraction with human validation workflows for high accuracy
- Integrating with existing research workflows for systematic reviews and meta-analyses
- Creating analysis-ready datasets that maintain complex relationships between data points
Features:
- OCR and table extraction pipeline powered by Vision Transformers
- LLM-assisted structured data extraction with human validation
- Collaborative annotation interface for quality control
- Flexible schema definition for any scientific domain
- Integration with popular data science tools
Built with:
- Document AI: Marker, PyMuPDF, Table-Transformer
- ML/AI: LlamaIndex, Hugging Face Transformers
- Backend: FastAPI, PostgreSQL, Elasticsearch
- Frontend: Vue.js, TypeScript
Link to Documentation: https://docs.extralit.ai
Link to GitHub: https://github.com/extralit/extralit
Demo: https://demo.extralit.ai