I saw an interesting video on YouTube where someone created a reinforcement algoritm to learn to play flappy bird. This inspired me to try something similar, so I wanted to see if I could apply Model Predictive Control (MPC) to dictate the control of the bird.
The premise of flappy bird is simple: pairs of pipes move across the screen, and it is the goal to pass through as many pipes as possible. The only control input for the player is to tap the screen, which makes the bird jump, and gain altitude.
Building a clone of flappy bird for manipulation was the first step. Once I had a version that I felt matched the original game close enough, I started to conceptualize the control algorithm.
MPC is a control method that works by optimizing the control horizon of a given system. Applied to flappy bird, this means choosing an optimal set of N different inputs. Since there is only one input for flappy bird, we can consider the two options 0 or 1, where 1 is a jump (tap) and 0 indicates doing nothing.
For an input sequence of length N, we have 2^N diffferent sequences to evaluate. To evaluate each sequence, I devised a cost function that is dependant on the distance of the bird from the center axis of the pipe gap, the velocity of the bird, the choice to jump and the crash scenario. The sequence with the lowest cost will be chosen as the optimal control path.
Evaluating 2^N sequences every frame is very computationally expensive...time complexity of O(N · 2^N) expensive. Scaling this algorithm to predict further in the future required improving the core algorithm.
To do this, I altered algorithm V1 to incorporate a version of the Beam Search algorithm.
Beam search is a heuristic search algorithm. Instead of evaluating every possible sequence (2^N), it keeps only the best 'B' sequences. Now, for a sequence len N, only 2B(N-log2(B)+1)−2 sequences are evaluated.
For a sequence len of 100 and beam width of 4, we now only have to evaluate 790 sequences per timestep.
With the original algorithm, instead we would have to calculate 1 267 650 600 228 229 401 496 703 205 376 (1.267 nonillion) sequences per timestep.