One of my favorite data science blogs comes from James McCaffrey, a software engineer and researcher at Microsoft. He recently wrote a blog post on a method for allocating turns in a multi-armed bandit problem.
(Yes, I'm well aware of various slot machine apps/games but they don't have newer machines and are a bit lame) Thanks for any help! This thread is archived. New comments cannot be posted and votes cannot be cast.
I really liked his post, and decided to take a look at the algorithm he described and code up a function to do the simulation in R.
Note: this is strictly an implementation of Dr. McCaffrey’s ideas from his blog post, and should not be taken as my own.
You can find the .Rmd file for this post on my GitHub.
Slot Machine Simulator
The basic idea of a multi-armed bandit is that you have a fixed number of resources (e.g. money at a casino) and you have a number of competing places where you can allocate those resources (e.g. four slot machines at the casino). These allocations occur sequentially, so in the casino example, we choose a slot machine, observe the success or failure from our play, and then make the next allocation decision. Since we’re data scientists at a casino, hopefully we’re using the information we’re gathering to make better gambling decisions (is that an oxymoron?).
We want to choose the best place to allocate our resources, and maximize our reward for each allocation. However, we should shy away from a greedy strategy (just play the winner), because it doesn’t allow us to explore our other options.
There are different strategies for choosing where to allocate your next resource. One of the more popular choices is Thompson sampling, which usually involves sampling from a Beta distribution, and using the results of that sampling to determine your next allocation (out of scope for this blog post!).
The following function implements the roulette wheel allocation, for a flexible number of slot machines.
The function starts by generating a warm start with the data. We need to gather information about our different slot machines, so we allocate a small number of resources to each one to collect information. After we do this, we start the real allocation. We pick a winner based on how its cumulative probability compares to a draw from a random uniform distribution.
So, if our observed success probabilities are
machine | observed_prob | cumulative_prob | selection_range |
---|---|---|---|
1 | 0.2 | 0.2 | 0.0-0.2 |
2 | 0.3 | 0.5 | 0.2-0.5 |
3 | 0.5 | 1.0 | 0.5-1.0 |
And our draw from the random uniform was 0.7, we’d pick the third arm (0.7 falls between 0.5 and 1). This selection criteria is the main point of Dr. McCaffrey’s algorithm. For a better and more thorough explanation, I’d suggest reading his blog post.
We then continue this process (playing a slot machine, observing the outcome, recalculating observed probabilities, and picking the next slot machine) until we run out of coins.
And here’s the code
I’ll show a brief example of what we can do with the data generated from this function.
machine | true_probabilities | observed_probs | successes | failures | plays | machine_played | coins_left |
---|---|---|---|---|---|---|---|
1 | 0.10 | 0.0658 | 15 | 213 | 228 | FALSE | 0 |
2 | 0.25 | 0.2562 | 228 | 662 | 890 | FALSE | 0 |
3 | 0.50 | 0.5027 | 835 | 826 | 1661 | FALSE | 0 |
4 | 0.65 | 0.6709 | 1490 | 731 | 2221 | TRUE | 0 |
Let’s look at how the observed probabilities changed over time:
And how did our plays for each machine accumulate through time?
Boring!
Maybe if we run a smaller number of simulations, we might get a better sense of variation in our number of plays.
machine | true_probabilities | observed_probs | successes | failures | plays | machine_played | coins_left |
---|---|---|---|---|---|---|---|
1 | 0.10 | 0.0833 | 1 | 11 | 12 | FALSE | 0 |
2 | 0.30 | 0.3810 | 16 | 26 | 42 | FALSE | 0 |
3 | 0.65 | 0.6087 | 28 | 18 | 46 | TRUE | 0 |
That shows our allocations a little bit better than the previous visualization.
This was a fun exercise for me, and it reminded me of a presentation I did in graduate school about a very similar topic. I also wrote a roulette wheel function in Python, and was moderately successful at that (it runs faster than my R function, but I’m less confident in how “pythonic” it is).
My biggest concern with this implementation is the potential situation in which our warm start results in all failures for a given slot machine. If the machine fails across the warm start, it will not be selected for the rest of the simulation. To offset this, you could add a little “jitter” (technical term: epsilon) to the observed probabilities at each iteration. Another option would be to generate a second random uniform variable, and if that value is very small, you that pull a random lever, rather than the one determined by the simulation.
Finally, I’d be interested in comparing the statistical properties of this algorithm and others that are used in sequential allocation problems…if I have the time.
I was bored and that can be a dangerous thing. Like doodling on the phone book while you are talking on the phone, I doodle code while answering questions on DIC. Yeah, it means I have no life and yes it means I was born a coder. During this little doodle I decided to make a slot machine. But not your standard slot machine per say, but one designed a little bit more like the real thing. Sure it could have been done a little more simpler and not even using a Wheel class at all, but what fun is that? In this entry I show the creation of a slot machine from a bit more of a mechanical aspect than a purely computerized one. It should provide a small sampling of classes and how they can represent real life machines. We cover it all right here on the Programming Underground!
So as I have already said, this little project was just something to play around with. It turned out kinda nice, so I thought I would share it. But what did I mean about it being mechanical in nature? Well, if you have ever played a real slot machine, not the digital ones they have in casinos now, you would see a metal case with a series of wheels. Typically it would be three wheels with pictures on them. When you put your money in and pull the handle the wheels would be set into motion. They would spin and then the first wheel would stop, followed by the second and then the third. After they have all stopped, the winnings are determined and you are paid out in coinage or credits.
I thought, why not be a bit mechanical in this slot machine design and create the wheels as a class called “Wheel” and give it the ability to spin independently of the other wheels? Have the wheel keep track of which picture (or in our case number) is flying by and report the results to the actual slot machine class. I could have done this mechanism without the need of a wheel at all and instead load up an array and have it randomly pick a number from the wheel. Little slimmer, little more efficient but wouldn’t show much programming theory.
What do we gain by recreating these Wheel classes and spinning them independently? Well, you gain a slight bit of flexibility. Independently we are able to control the speed of the spinning if we wanted to, we are able to grasp the idea of the wheel as a concept in our mind and manipulate it. We could easily built in features like if the wheel lands on a certain number it will adjust itself. Like some slots in Vegas, if you land on lets say a rocket in the center line, the machine would see the rocket and correct the wheel to spin backwards 1 spot (in the direction of the rocket as if the rocket was controlling the wheel). We could spin one wheel one way and another wheel another. We could inherit from that wheel and create a specialized wheel that does a slew of new different behaviors. All encapsulated into one solid object making the actual Machine class oblivious to the trickery of the wheel itself… encapsulation at its finest!
The machine class we create will contain 3 pointers. Each to one of the wheels. The machine itself will be in charge of a few different tasks. Taking money, issuing and removing credits, determining when to spin, telling each of the wheels to spin and checking our winnings based on some chart we create. It has enough on its plate than worrying about the wheels and reading their values.
So lets start with our Wheel class and its declaration/implementation…
wheel.h
As you can see the wheel itself is not a difficult concept to envision. The bulk of the work is in the read() method. Here we simply read the values from our internal array of integers (the values on the wheel) and return those values as an array of the three integers… representing the visible column. This column will then be loaded into our 2-Dimensional Array back in the Machine class. The 2D array represents the view or screen by which the user sees the results. Remember that the user never gets to see the entire wheel. Only the 3 consecutive values on the face of the wheel.
Here is how it may look in the real world. We have our machine with the three wheels and our 2D array called “Screen” which acts as our viewing window. Each wheel will report its values and those values will be put into the screen…
Below is our machine class…
machine.h
This looks like a lot of code but really it is not if you look at each function. Most of them are very very simple to understand. We have a spin method which essentially spins each of the wheels, reads their values back from the Wheel class into a pointer (representing each column), then they are loaded into the 2D array one column at a time (our view screen), printed for the user to see the results and lastly the winnings are checked. The checkwinnings() method determines which rows to check based on the amount of the bet. If they chose 1 line, it checks for winning combinations on the middle row only. If they choose 2 lines, it checks the middle and top lines, 3 line bet checks all three horizontal rows, 4 line bet checks the first diagonal as well and 5 line bet checks both diagonals in addition to the lines.
How does it check the lines? Well each line is given to the checkline() helper function which compares the 3 values of the line against an enumerated type of various symbols. Here we are just assigning a symbol against each numbered value to help the programmer determine which numbers correspond to which winning combos. For instance, luckyseven represents the number 3 in the enumeration. So if it runs across a line with 3 number 3s, then it knows it hit the grand jackpot and credits the player 1000. This method makes things easy because if we ever wanted to change the win patterns later, we could change the enum and checkline method to do so. We could also build in multiple types of symbols and even let the user choose what slot machine game they want to go by. It becomes very flexible and is a testament to great design!
Lastly we can put some tests together just to show some the various aspects of how this thing works and how the programmer can use the classes…
Free Slot Machines Games Online
slotmachine.cpp
Home Slot Machine Games
This simply inserts a 5 dollar bill and a coin for good luck. Then bets 5 lines and spins. Despite the outcome we go and bet five lines again and spin once more. Hopefully we win something this time around! But either way, those are the classes for you and I hope you like them. As always, all code here on the Programming Underground is in the public domain and free for the taking (just don’t cause a mess in isle 3, I am tired of running out there for cleanup). Thanks for stopping by and reading my blog. 🙂