Skip to content

Modding Argilla for validating AI-assisted document data extractions

License

Notifications You must be signed in to change notification settings

extralit/extralit-frontend

 
 

Repository files navigation

Argilla
Argilla

Open-source feedback layer for LLM-assisted data extractions

What is Argilla [For Document Data Extraction]?

pipeline

Argilla is a UI interface and platform for LLM-based document data extraction that integrates human and model feedback loops for continuous LLM refinement and data extraction oversight.

With Argilla's Python SDK and adaptable UI, you can create human and model-in-the-loop workflows for:

  • Data extraction validation
  • Supervised fine-tuning
  • Preference tuning (RLHF, DPO, RLAIF, and more)
  • Small, specialized NLP models
  • Scalable evaluation.

🚀 Development Quickstart

Install the Pre-requisites

These steps are required to run and develop Argilla locally.

  1. Install Docker Desktop
  2. Install kind
  3. Install ctlptl
  4. Install Tilt

Set up local infrastructure for Kind

  1. Create a kind cluster
ctlptl create registry ctlptl-registry --port=5005
ctlptl create cluster kind --registry=ctlptl-registry
  1. Apply config to mount local directory
ctlptl apply -f k8s/kind/kind-config.yaml
kubectl taint node kind-control-plane node-role.kubernetes.io/control-plane:NoSchedule-

Start local development

  1. Run Tilt

Select the K8s cluster

kubectl config set-cluster <cluster_name>

Setting the ENV variable to dev enables hot-reloading of Docker containers for 🚀 rapid deployment:

kubectl create ns <namespace>
ENV=dev tilt up --namespace=<namespace>

Start staging/prod K8s deployment

ENV=dev DOCKER_REPO=<remote docker repository> tilt up --namespace <namespace> --context <K8s cluster context>

🛠️ Developer guide

Editing database schema:

Editting the database schema files at src/argilla/server/models/*.py require running these commands to apply revisions to the database.

  1. Create revision
cd src/argilla
alembic revision -m <message>

If you happen to run into errors due to the revisions from upstream argilla-io/argilla repo, set the down-revision tag to their latest in the revision "7552df94427a" at src/argilla/server/alembic/versions

  1. Apply the revision
# Be sure to set environment variables ARGILLA_ELASTICSEARCH and ARGILLA_DATABASE_URL
python -m argilla server database migrate
  1. Update frontend site to the API backend
bash scripts/build_frontend.sh
python setup.py bdist_wheel

🛠️ Project Architecture

Argilla is built on 5 core components:

  • Python SDK: A Python SDK which is installable with pip install argilla. To interact with the Argilla Server and the Argilla UI. It provides an API to manage the data, configuration and annotation workflows.
  • FastAPI Server: The core of Argilla is a Python FastAPI server that manages the data, by pre-processing it and storing it in the vector database. Also, it stores application information in the relational database. It provides a REST API to interact with the data from the Python SDK and the Argilla UI. It also provides a web interface to visualize the data.
  • Relational Database: A relational database to store the metadata of the records and the annotations. SQLite is used as the default built-in option and is deployed separately with the Argilla Server but a separate PostgreSQL can be used too.
  • Vector Database: A vector database to store the records data and perform scalable vector similarity searches and basic document searches. We currently support ElasticSearch and AWS OpenSearch and they can be deployed as separate Docker images.
  • Vue.js UI: A web application to visualize and annotate your data, users and teams. It is built with Vue.js and is directly deployed alongside the Argilla Server within our Argilla Docker image.

About

Modding Argilla for validating AI-assisted document data extractions

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 53.2%
  • Vue 22.3%
  • JavaScript 12.0%
  • TypeScript 11.4%
  • SCSS 1.1%
  • HTML 0.0%