机器人控制中的强化学习极简入门指南

最近随着深度强化学习在机器人领域大放异彩,大家对这个方向的关注热情日趋高涨。最近老有人来问我相关的问题,借此机会我写了一个强化学习极简入门教程,其实也就是回答了入门应该看一些什么资料的问题。其实我本人也是初学者,如有疏漏或者错误还请大家海涵,如有问题也欢迎大家一起讨论。这是本系列的第一篇文章,计划后续会继续写一些更具体的环境搭建教程和入门教程,争取不鸽。

强化学习/深度学习部分参考资料

  • OpenAI的spinning up,主要讲深度强化学习的基本概念和算法的主流思想,里面也有配套代码,代码适合初学者学习,不过配置环境有点恶心,涉及到OpenMPI的内容,代码实在跑不起来就算了,能读懂就挺好,这个强烈推荐看完
    https://spinningup.openai.com/en/latest/

  • OpenAI的gym(更名为Gymnasium了,其实就是gym)环境,经典的强化学习环境,这个环境可以好好玩玩
    https://gymnasium.farama.org/
    https://github.com/Farama-Foundation/Gymnasium

  • stable baseline3,深度强化学习经典/常用算法合集,和gym配合使用,gym只是一个环境,或者叫agent,这个是学习控制agent的算法,或者叫policy。
    配合gym使用,熟悉一下api就好,别看里面的实现,太复杂了
    https://stable-baselines3.readthedocs.io/en/master/

  • 这个是网上一个老哥写的强化学习算法合集,这个和stable baseline3的算法内容基本相同,但是简单很多,适合初学者学习,可以配合这spinning up来看,这个代码很好读也很好跑起来。
    https://github.com/Lizhi-sjtu/DRL-code-pytorch

  • 另一个不错的强化学习算法库:
    https://github.com/thu-ml/tianshou

  • 最后推荐一本叫《强化学习》的书,这本书很经典系统的介绍了强化学习的基础理论。但是这本书知识有点旧了,但是对入门还是有一些帮助,主要是前几张的内容可以仔细看看,理解一下动态规划,蒙特卡洛,和时序差分的概念比较重要。

  • pytorch必学深度学习框架,个人感觉学习pytorch看官网的教程就够了,不懂的api问gpt就好
    https://pytorch.org/

一些可能会有帮助的资料:
https://docs.anaconda.com/free/miniconda/index.html
https://code.visualstudio.com/docs
https://zh.cppreference.com/w/
https://www.liaoxuefeng.com/wiki/896043488029600
https://www.runoob.com/python3/python3-tutorial.html
https://www.runoob.com/linux/linux-tutorial.html
https://www.runoob.com/docker/docker-tutorial.html

以上就是强化学习/深度学习的一些资料,建议先熟悉熟悉基本的强化学习。熟悉一下linux和python,以及pytorch,之后再看下一步的强化学习在机器人上的应用。强烈建议看完spinning up,以及gym,后面那些框架大概看看就好,不要花费太多时间看算法框架,很多也不一定适合,而且太复杂,不要话太多时间思考算法细节,初步使用不会涉及到修改学习算法本身,多看看仿真器,多跑跑实际工程。多写多练才能更快进步。

机器人控制和仿真器参考资料

啊,终于到正题了

双足/四足机器人强化学习经典论文

算法理论

  • X. B. Peng, Z. Ma, P. Abbeel, S. Levine and A. Kanazawa, “AMP: adversarial motion priors for stylized physics-based character control,” ACM Transactions on Graphics, vol. 40, p. 144:1–144:20, July 2021.

  • X. B. Peng, P. Abbeel, S. Levine and M. van de Panne, “DeepMimic: example-guided deep reinforcement learning of physics-based character skills,” ACM Transactions on Graphics, vol. 37, p. 143:1–143:14, July 2018.

  • J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, “Proximal Policy Optimization Algorithms,” 2017.

  • V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, p. 529–533, February 2015.

  • D. Kalashnikov, J. Varley, Y. Chebotar, B. Swanson, R. Jonschkowski, C. Finn, S. Levine and K. Hausman, “MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale,” 2021.

  • E. Jang, A. Irpan, M. Khansari, D. Kappler, F. Ebert, C. Lynch, S. Levine and C. Finn, “BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning,” in Proceedings of the 5th Conference on Robot Learning, 2022.

  • C. Finn, P. Christiano, P. Abbeel and S. Levine, “A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models,” 2016.

  • A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta and A. A. Bharath, “Generative Adversarial Networks: An Overview,” IEEE Signal Processing Magazine, vol. 35, p. 53–65, January 2018.

仿真工具

  • N. Rudin, D. Hoeller, P. Reist and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Conference on Robot Learning, 2022.

  • V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa and G. State, “Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning,” 2021.

  • M. Körber, J. Lange, S. Rediske, S. Steinmann and R. Glück, “Comparing Popular Simulation Environments in the Scope of Robotics and Reinforcement Learning,” 2021.

双足/四足控制

  • A. Tang, T. Hiraoka, N. Hiraoka, F. Shi, K. Kawaharazuka, K. Kojima, K. Okada and M. Inaba, “HumanMimic: Learning Natural Locomotion and Transitions for Humanoid Robot via Wasserstein Adversarial Imitation,” September 2023.

  • J. Siekmann, Y. Godse, A. Fern and J. Hurst, “Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi'an, China, 2021.

  • T. Miki, J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun and M. Hutter, “Learning robust perceptive locomotion for quadrupedal robots in the wild,” Science Robotics, vol. 7, p. eabk2822, 2022.

  • Z. Li, X. B. Peng, P. Abbeel, S. Levine, G. Berseth and K. Sreenath, “Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control,” January 2024.

  • J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,” Science Robotics, vol. 5, p. eabc5986, October 2020.

  • A. Kumar, Z. Fu, D. Pathak and J. Malik, “RMA: Rapid Motor Adaptation for Legged Robots,” 2021.

  • Y. Jin, X. Liu, Y. Shao, H. Wang and W. Yang, “High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning,” Nature Machine Intelligence, vol. 4, p. 1198–1208, December 2022.

  • F. Jenelten, J. He, F. Farshidian and M. Hutter, “DTC: Deep Tracking Control,” Science Robotics, vol. 9, p. eadh5401, January 2024.

  • D. Hoeller, N. Rudin, D. Sako and M. Hutter, “ANYmal parkour: Learning agile navigation for quadrupedal robots,” Science Robotics, vol. 9, p. eadi7566, March 2024.

  • H. Duan, B. Pandit, M. S. Gadde, B. van Marum, J. Dao, C. Kim and A. Fern, “Learning Vision-Based Bipedal Locomotion for Challenging Terrain,” September 2023.

  • D. Baek, A. Purushottam and J. Ramos, “Hybrid LMC: Hybrid Learning and Model-based Control for Wheeled Humanoid Robot via Ensemble Deep Reinforcement Learning,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022.

  • I. M. Aswin Nahrendra, B. Yu and H. Myung, “DreamWaQ: Learning Robust Quadrupedal Locomotion With Implicit Terrain Imagination via Deep Reinforcement Learning,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023.

附录

一个值得参考的强化学习简介PPT,点击此处下载。