Skip to content

timbmg/acl-search

Repository files navigation

A search-as-you-type web app for the ACL Anthology

Architecture

The search engine relies on Elasticsearch, specifically its search-as-you-type feature. Instead of only indexing full words, it creates n-grams of the text for faster retrieval.

The app is built based on the principles outlined in the 12 factor app.

Core Services

Search

The search service provides an API to search through the publication index in Elasticsearch. Currently, we search only for matches the title. Future updates will include search through available abstracts and for all pubications of a specific author.

Index

The index service checks for new files in the ACL Anthology github on a regular basis. If a file update is discovered, publications are imported into an Elasticsearch index. The schedule is implemented in the index-beat service using celery beat. The index-worker service processes and indexes the files that have been updated.

Preview