tokenizer
A grammar describes the syntax of a programming language, and might be defined in Backus-Naur form (BNF). A lexer performs lexical analysis, turning text into tokens. A parser takes tokens and builds a data structure like an abstract syntax tree (AST). The parser is concerned with context: does the sequence of tokens fit the grammar? A compiler is a combined lexer and parser, built for a specific grammar.
Here are 1,086 public repositories matching this topic...
Natural language tokenizer for English and Japanese documents in Python
-
Updated
Jul 16, 2017 - Python
A platform-agnostic Mash creation algorithm
-
Updated
Sep 10, 2018 - JavaScript
Mastering NLP step by step
-
Updated
Aug 11, 2019 - Jupyter Notebook
-
Updated
Jun 21, 2019 - Jupyter Notebook
Simpler Interactive interpreter
-
Updated
Oct 5, 2020 - Python
.NET Windows Forms app applying concepts of Tokenization, Custom File extension, and Computer Graphics.
-
Updated
Aug 4, 2021 - C#
Movie Recommendation Engine
-
Updated
Nov 15, 2020 - Jupyter Notebook
-
Updated
Jun 21, 2021 - Python
Implementation of C++ lexical analyzer to demonstrate how it actually works as a part of the compiler.
-
Updated
Dec 13, 2021 - C++
Korrektor-Py - python library & wrapper for API of the https://korrektor.uz | Korrektor-Py - https://korrektor.uz loyihasining API'si uchun ishlab chiqilgan python dasturlash tilidagi kutubxona.
-
Updated
Aug 17, 2023 - Python
simple generic lexer / tokeniser, for use in rust programs and beyond
-
Updated
Nov 16, 2022 - Rust
a systems programming language that doesn't suck
-
Updated
Jul 27, 2022 - Rust
A POSIX regex utilities library for C
-
Updated
Aug 3, 2022 - C
- Followers
- 10.1k followers
- Wikipedia
- Wikipedia