博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
reinforcement learning for Flappy bird
阅读量:3607 次
发布时间:2019-05-21

本文共 3620 字,大约阅读时间需要 12 分钟。

Flappy Bird RL

Flappy Bird hack using Reinforcement Learning


The Hack.

This is a hack for the popular game, Flappy Bird. Although the game is on Google Play or the App Store, it did not stop folks from creating very good replicas for the web. People have also created some interesting variants of the game - and 

After playing the game a few times (read few hours), I saw the opportunity to practice my machine learning skills and try and get Flappy Bird to learn how to play the game by itself. The video above shows the results of a really well trained Flappy Bird that basically keeps dodging pipes forever.

The How.

Initially, I wanted to create this hack for the Android app and I was planning to useto get screenshots and send click commands. But it takes about 1 - 2 seconds to get a screenshot and that was definitely not fast or responsive enough.

Then I found 's game engine, Omega500 and his of Flappy Bird for typing practice. I ripped out the typing component and added some javascript Q Learning code to it.

Reinforcement Learning

Here's the basic principle: the agent, Flappy Bird in this case, performs a certain action in a state. It then finds itself in a new state and gets a reward based on that. There are many variants to be used in different situations: Policy Iteration, Value Iteration, Q Learning, etc.

Q Learning

I used Q Learning because it is a model free form of reinformcent learning. That means that I didn't have to model the dynamics of Flappy Bird; how it rises and falls, reacts to clicks and other things of that nature.

is a nice, concise description of Q Learning. The following is the algorithm.

State Space

I discretized my space over the folowing parameters.

  • Vertical distance from lower pipe
  • Horizontal distance from next pair of pipes
  • Life: Dead or Living

Actions

For each state, I have two possible actions

  • Click
  • Do Nothing

Rewards

The reward structure is purely based on the "Life" parameter.

  • +1 if Flappy Bird is still alive
  • -1000 if Flappy Bird is dead

The Learning Loop

The array Q is initialized with zeros and I always chose the best action, the action that will maximize my expected reward. To break ties I chose "Do Nothing" because that is the more common action.

Step 1: Observe what state Flappy Bird is in and perform the action that maximizes expected reward.

Let the game engine perform its "tick". Now. Flappy Bird is in a next state, s'.

Step 2: Observe new state, s', and the reward associated with it. +1 if the bird is still alive, -1000 otherwise.

Step 3: Update the Q array according to the Q Learning rule.

Q[s,a] ← Q[s,a] + α (r + γ*V(s') - Q[s,a])

The alpha I chose is 0.7 because we have a deterministic state and I wanted it to be pretty hard to un-learn something. Also, the dicount factor, lambda, was 1.

Step 4: Set the current state to s' and start over.

The Next Steps.

  • It took about 6-7 hours to train Flappy Bird to be good enough (150 score). This can be improved by instantiating more than 1 bird in the beginning and have all of them contribute their "learnings" to the same Q array.
  • Another way to make the learning faster would be to let users provide "good" input. Right now, you can click on the game to make Flappy Bird jump. But, that input is not taken into account by the learner.
  • Get this to work on a mobile phone!! If anyone has any ideas , please let me know in the comments :)

Credits.

I'd like to give a shout out to  for creating the Omega500 game engine and making it open source!

Comments.

Want to leave a comment? Visit  (you'll need a GitHub account. What? Like you already don't have one?!).Flappy Bird RL is maintained by 

转载地址:http://dstzn.baihongyu.com/

你可能感兴趣的文章
JUC - 线程池:
查看>>
JUC - Java8流式编程
查看>>
MySQL - 高级部分:
查看>>
JavaWeb框架 - Spring注解部分:
查看>>
SpringBoot使用外部的Tomcat: bean with name 'defaultValidator' defined in class path resource
查看>>
SpringBoot上传文件413问题:
查看>>
Java多线程 - AQS简单实现:
查看>>
建造者模式:
查看>>
适配器模式:
查看>>
LinkedList源码分析
查看>>
美团Java一面面经
查看>>
疏漏总结(九)——http与https
查看>>
疏漏总结(十)
查看>>
线程池
查看>>
Mysql(条件,常用函数,分组)
查看>>
servlet的其他作用,git的使用方法
查看>>
Oracle数据库sql*plus常用命令
查看>>
Oracle中表的简单查询
查看>>
Linux-进程管理
查看>>
Linux-ssh服务及服务管理、文件传输
查看>>