Bandits-using-UCB-algorithm

Thompson Sampling for Bandits using UCB policy

Multi-armed bandits problem, see https://en.wikipedia.org/wiki/Multi-armed_bandit

Suppose there are K slot machines, each of them provide reward amount specific to its own distribution. A gambler has to decide which arm to lift each time and at what order to maximize the total rewards he/she will recieve.

The UCB algorithm specifies at time t, we pull arm a_t that has the maximum value of (observed_mean reward of a + UCB confidence bound)

This program assumes K = 5, and the reward each arm gives subjects to Bernoulli distribution. If we adopt a Bayes point of view, our prior belief is that the probability of each arm is distributed according to a Beta(1, 1) distribution (i.e. our prior is uniform for each arm).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
UCB.py		UCB.py
arm.py		arm.py
posterior.py		posterior.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

UCB.py

UCB.py

arm.py

arm.py

posterior.py

posterior.py

test.py

test.py

Repository files navigation

Bandits-using-UCB-algorithm

About

Releases

Packages

Languages

annieyan/Bandits-using-UCB-algorithm

Folders and files

Latest commit

History

Repository files navigation

Bandits-using-UCB-algorithm

About

Topics

Resources

Stars

Watchers

Forks

Languages