Anuvaad

Service	Build Status
Zuul
NMT
Workflow Manager
Aligner
User Management
Tokeniser
Translator

Anuvaad

Anuvaad is an AI based open source Document Translation Platform to translate documents in Indic languages at scale. Anuvaad provides easy-to-edit capabilities on top the plug & play NMT models. Separate instances of Anuvaad are deployed to Diksha (NCERT), Supreme Court of India (SUVAS) and Supreme Court of Bangladesh (Amar Vasha).

Components

Component	Details
Workflow Manager(WM)	Centralized Orchestrator based on user request.
Auditor	Python package/library used for formatting , exception handling.
File Uploader	Microservice to upload and maintain user documents.
File Converter	Microservice to convert files from one format to other. E.g: .doc to .pdf files.
Aligner	Microservice accepts source and target sentances and align them to form parallel corpus.
Tokenizer	Microservice tokenises pragraphs into independently translatable sentences.
Layout Detector	Microservice interface for Layout detection model.
Block Segmenter	Handles layout detection miss-classifications , region unifying.
Word Detector	Word detection.
Block Merger	An OCR system that extracts texts, images, tables, blocks etc from the input file and makes it avaible in the format which can be utilised by downstream services to perform Translation. This can also be used as an independent product that can perform OCR on files, images, ppts, etc.
Translator	Translator pushes sentences to IndicTrans which are translated and pushed back during the document translation flow.
Content Handler	Repository Microservice which maintains and manages all the translated documents
Translation Memory X(TMX)	System translation memory to facilitate overriding NMT translation with user preferred translation. TMX provides three levels of caching - Global , User , Organisation.
User Translation Memory(UTM)	System tracks and remembers individual user translations or corrected translations and applies automatically when same sentences are encountered again.

AI/ML Assets

Component	Details
PRIMA	Layout detection model.
CRAFT	Used for Line detection.
Tesseract	Custom trained Tesseract used for OCR.
IndicTrans	Custom trained Indic NMT model used for translation.

Technology Stack

Component	Details
Apache Kafka	Translator and IndicTrans are integrated through Kafka messaging.
MongoDB	Primary data storage.
Redis	Secondary in memory storage.
Cloud Storage	Samba storage is used to store user input files.
NGINX	Serve as a redirection server and also takes care of system level configs. Ngnix acts as the gateway.
Zuul	API Gateway to apply filters on client requests,authenticate,authorize,throttle client requests.

Name		Name	Last commit message	Last commit date
Latest commit History 7,410 Commits
.idea		.idea
anuvaad-api		anuvaad-api
anuvaad-corpus-tools		anuvaad-corpus-tools
anuvaad-documentation/images		anuvaad-documentation/images
anuvaad-etl		anuvaad-etl
anuvaad-fe		anuvaad-fe
anuvaad-nmt-inference		anuvaad-nmt-inference
chrome-extension		chrome-extension
specs		specs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

anuvaad-api

anuvaad-api

anuvaad-corpus-tools

anuvaad-corpus-tools

anuvaad-documentation/images

anuvaad-documentation/images

anuvaad-etl

anuvaad-etl

anuvaad-fe

anuvaad-fe

anuvaad-nmt-inference

anuvaad-nmt-inference

chrome-extension

chrome-extension

specs

specs

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Anuvaad

Components

AI/ML Assets

Technology Stack

About

Releases 1

Contributors 42

Languages

License

project-anuvaad/anuvaad

Folders and files

Latest commit

History

Repository files navigation

Anuvaad

Components

AI/ML Assets

Technology Stack

About

Topics

Resources

License

Stars

Watchers

Forks

Languages