Autonomous Flappy Bird Player

Class
ME 2145
Year
May 2026
Duration
2 Weeks
Status
Complete
github

Motive

I saw an interesting video on YouTube where someone created a reinforcement algoritm to learn to play flappy bird. This inspired me to try something similar, so I wanted to see if I could apply Model Predictive Control (MPC) to dictate the control of the bird.

The Foundation

The premise of flappy bird is simple: pairs of pipes move across the screen, and it is the goal to pass through as many pipes as possible. The only control input for the player is to tap the screen, which makes the bird jump, and gain altitude.

Building a clone of flappy bird for manipulation was the first step. Once I had a version that I felt matched the original game close enough, I started to conceptualize the control algorithm.

Game Comparison

Algorithm

MPC is a control method that works by optimizing the control horizon of a given system. Applied to flappy bird, this means choosing an optimal set of N different inputs. Since there is only one input for flappy bird, we can consider the two options 0 or 1, where 1 is a jump (tap) and 0 indicates doing nothing.

V1

For an input sequence of length N, we have 2^N diffferent sequences to evaluate. To evaluate each sequence, I devised a cost function that is dependant on the distance of the bird from the center axis of the pipe gap, the velocity of the bird, the choice to jump and the crash scenario. The sequence with the lowest cost will be chosen as the optimal control path.

V2

Evaluating 2^N sequences every frame is very computationally expensive...time complexity of O(N · 2^N) expensive. Scaling this algorithm to predict further in the future required improving the core algorithm.

To do this, I altered algorithm V1 to incorporate a version of the Beam Search algorithm.

Beam search is a heuristic search algorithm. Instead of evaluating every possible sequence (2^N), it keeps only the best 'B' sequences. Now, for a sequence len N, only 2B(N-log2(​B)+1)−2 sequences are evaluated.

For a sequence len of 100 and beam width of 4, we now only have to evaluate 790 sequences per timestep.

With the original algorithm, instead we would have to calculate 1 267 650 600 228 229 401 496 703 205 376 (1.267 nonillion) sequences per timestep.

Demo

Technology Stack

Python
pygame
Model Predictive Control

Interested in similar work?

Get in Touch View More Projects