tokenizer

A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenizer

Here are 1,086 public repositories matching this topic...

adriweb / tivars_lib_cpp

gbenson / dom-tokenizers

hari8github / NLP

stdlib-js / nlp-sentencize

stdlib-js / nlp-tokenize

ByteXenon / The-Tiny-Lua-Compiler

CompLin / nheengatu

gtoffoli / spacy-cameltokenizer

MilanSuk / token_go

adbar / simplemma

ppaanngggg / token-counter

ldaniels528 / oxide

Florian-A / Tokenizer

ksasso1028 / vintage_vectors

spiral / tokenizer

CanvaChen / chinese-llama-tokenizer

jmaczan / bpe-tokenizer

daac-tools / vaporetto

daac-tools / vibrato

royjustus / sn_bpe

Related Topics