Welcome on one of the most ultra-detailed version of the
Frozen-Lake Q-Learning project
Ver. 2.1.0

This project has been made in a studying context so it could have some errors in the code.
(You have a list in the "Bug List" file in the doc folder if you're interested to help the project!)

This project has been done with Gymnasium from Farama-Foundation that is made for the AI Reinforcement Learining and the Q-Learning domains in python.
(If you want to see what is gymnasium click here to go on the Github page of Gymnasium)

If you want more information about Q-Learning and the Frozen Lake game, you could read the article found on medium, he help me a lot to understand how works the Q-Learning: Q-Learning For Beginners by Maxime Labonne

Welcome on one of the most ultra-detailed version of the
Frozen-Lake Q-Learning project
Ver. 2.1.0

Table of content

About

Like his name is telling, the project is an ultra-detailed version of the Frozen-Lake Q-Learning project.
This program allow to train an agent on the Frozen-Lake game in a range of episodes that the user enter at the start of the program. This program use the Exploration X Exploitation method for the training. That means that the agent explore the environment but also use the updated Q-Table to have a better update of the Q-Table at the end.
The program offers the user the possibility of testing the updated Q-Table obtained by following the training.
During the training like during the test, you have a lot of datas that are detailed in the console during the sessions.

Packages

For this project you need some packages to install to run correctly the project:

gymnasium(ToyText): pip install "gymnasium[toytext]"
matplotlib.pyplot: pip install matplotlib.pyplot
numpy: pip install numpy
pygame: pip install pygame
time: pip install time
warning: pip install warning (optional only hide an error)

Obtainables datas

nb_success: Is use in the formula nb_sucess/episodes*100 to calculate the success rate of the training and of the test of the training
best_sequence: List of states in the best (shortest) episode that reach the goal
longest_best_sequence: List of states in the longest episode that reach the goal
longest_sequence: List of states in the longer episode that doesn't reach the goal
shortest_sequence: List of states in the shortest episode that doesn't reach the goal
(All the sequence appeared in the input format (0, 1, 2, 3) and the words format (LEFT, DOWN, RIGHT, UP))
reward_counter: number of time that the agent obtain the reward
reward_episode: List of the episode that the agent obtain the reward
reward_sequence: List of the states in the episodes that the agent obtain the reward
recurent_sequence: Number of the episodes that the agent done the same sequence to reach the goal with the best sequence
total_actions: Total number of actions in the episodes where the agent reach the goal
action_counts[action_words[action]]: Number of Action by types of actions (LEFT, DOWN, RIGHT, UP)

Tools

Maps

2x2 map
4x4 map
8x8 map
16x16 map
(The list of predefined maps and random generations ones are in the map.txt file in the tools folder.)

Q-Injection

The Q-Injection is a functionality that have for goal to test Q-Tables like:

Randomized Q-Table
Trained Q-Table (obtained by a training done by our team)
A start of trained Q-Table (Three Value)

But also to train them to obtain better results using the Exploration X Exploitation method.
(For more information about the Q-Injection read the injection.md file in the tools folder)

For those who are interested by the calculation of the Q-Table here is an explication:
(Hope it helps you to understand the Q-Learning)

qtable[state, action] = qtable[state, action] + alpha * (reward + gamma * np.max(qtable[next_state, :]) - qtable[state, action])

qtable[state, action]: This refers to the current value of action (0, 1, 2, 3 (LEFT, DOWN, RIGHT, UP)) in state (number of the case) of the Q-table. This is the value we will update.
alpha: This is the learning rate. It controls the extent to which new information will be integrated into the old values of the Q-table. A high value means that new information will have a greater impact on existing values, while a low value means they will have a lesser impact.
reward: This is the immediate reward obtained after taking action in state. This reward is equals to a positive float (1.0).
gamma: This is the discount factor. It represents the importance of future rewards compared to immediate rewards. A gamma close to 1 gives great importance to future rewards, while a gamma close to 0 gives similar importance to all rewards, whether immediate or future.
np.max(qtable[next_state, :]): This is the maximum value among all possible actions in the next state (next_state). This represents the best estimate of the future value that the agent can obtain from the next state.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.local/share/virtualenv/py_info/1		.local/share/virtualenv/py_info/1
Doc		Doc
Q-Learning-Results		Q-Learning-Results
Tools		Tools
__pycache__		__pycache__
.breakpoints		.breakpoints
.main.py. Initial Commit~		.main.py. Initial Commit~
.replit		.replit
QInjection.py		QInjection.py
README.md		README.md
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
replit.nix		replit.nix

VOCdevShy/Q-Learning_Frozen_Lake

Folders and files

Latest commit

History

Repository files navigation

Welcome on one of the most ultra-detailed version of the Frozen-Lake Q-Learning project Ver. 2.1.0

Table of content

About

Packages

Obtainables datas

Tools

Maps

Q-Injection

For those who are interested by the calculation of the Q-Table here is an explication: (Hope it helps you to understand the Q-Learning)

About

Topics

Resources

Stars

Watchers

Forks

Languages

Welcome on one of the most ultra-detailed version of the
Frozen-Lake Q-Learning project
Ver. 2.1.0

For those who are interested by the calculation of the Q-Table here is an explication:
(Hope it helps you to understand the Q-Learning)