Reinforcement Learning Grid World

This project utilizes reinforcement learning (RL) to solve a navigational task within a defined environment, specifically, a "Lawnmower Grid World". The environment consists of 16 states with a goal to maximize collected rewards while avoiding penalties.The primary tasks involve moving from the initial state at the top-left corner of the grid to the goal state at the bottom-right corner, collecting batteries (positive rewards) and avoiding pebbles (negative rewards).

Environment Setup

States: The states are defined as tuples representing the row and column indices on a 4x4 grid, ranging from (0, 0) to (3, 3).
Actions: Four possible actions include moving Up, Down, Right, and Left.
Rewards: Rocks represent negative rewards (-5, -6), batteries positive rewards (+5, +6), and the goal state offers the highest reward (+10).
Objective: Navigate through the grid to reach the goal state at (3, 3) by maximizing the total rewards.

Vizualization

The environment visualization uses Matplotlib to depict the 4x4 grid where:

Grid cells are color-coded to differentiate between positive (warmer) and negative (cooler) rewards.
The goal state is highlighted in red, and the lawnmower's current position is marked appropriately on the grid.

Algorithms Implemented

SARSA (State-Action-Reward-State-Action)

Approach: On-policy learning algorithm where the agent learns from the action taken based on the current policy.
Update Formula: Q(s, a) = Q(s, a) + α (r + γ * Q(s', a') - Q(s, a)), with parameters for learning rate (α), reward (r), discount factor (γ), and Q-values of the current and next state-action pairs.
Features: Employs a direct method from actual experiences with a balance between exploration and exploitation.

Q-Learning

Approach: Off-policy algorithm that learns the optimal policy indirectly using the greedy method.
Update Formula: Q(s, a) = Q(s, a) + α * (r + γ * max(Q(s', a')) - Q(s, a)).
Features: It tends to overestimate Q-values due to its maximization step but is robust in large state spaces.

The project showcases the application of SARSA and Q-Learning to a simple yet illustrative RL problem, emphasizing the importance of hyperparameter tuning and algorithm selection based on the specific characteristics of the environment and task objectives.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
RL_Grid_Navigation.ipynb		RL_Grid_Navigation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

RL_Grid_Navigation.ipynb

RL_Grid_Navigation.ipynb

Repository files navigation

Reinforcement Learning Grid World

Environment Setup

Vizualization

Algorithms Implemented

SARSA (State-Action-Reward-State-Action)

Q-Learning

About

Languages

License

SejalKankriya/reinforcement-learning-grid-world

Folders and files

Latest commit

History

LICENSE

LICENSE

README.md

README.md

RL_Grid_Navigation.ipynb

RL_Grid_Navigation.ipynb

Repository files navigation

Reinforcement Learning Grid World

Environment Setup

Vizualization

Algorithms Implemented

SARSA (State-Action-Reward-State-Action)

Q-Learning

About

Topics

Resources

License

Stars

Watchers

Forks

Languages