erwan.lecarpentier@isae-supaero.fr

Hi, I just completed my PhD in Computer Sciences on the topic of “Reinforcement Learning in Non-Stationary Markov Decision Processes”. The thesis was carried out at ISAE-SUPAERO in the awesome city of Toulouse. I was honored to work under supervision of Prof. Emmanuel Rachelson, Dr. Guillaume Infantes, and Dr. Charles Lesire.

I am interested in Artificial Intelligence and was introduced to the field via the Reinforcement Learning (RL) paradigm. Currently, I am focusing on several different questions, including the following:

- How to build RL algorithms that realize a trade-off between performance guarantees and level of approximation? Mostly, existing algorithms are ON/OFF, i.e. they are either exact/tabular and feature a high sample complexity, or they use function approximators that make them efficient but implies losing the performance guarantees. Could an in-between exist?
- How to build good metrics to measure MDP similarities? Such a tool could be used to build efficient transfer learning methods (see our work on Lifelong RL). A key point would be to learn to detect good features of MDPs that indicate the possibility of knowledge transfer.
- How to build efficient state and/or action abstractions to prune the complexity of an RL task? Precisely, I am interested in ways to formalize a good optimization criterion that would produce those abstractions. I am thinking about criteria that would be similar to the one human could use (e.g. bio-inspired), instead of criteria that would optimize directly an objective such as the discounted sum of rewards in an MDP.

2021 Erwan Lecarpentier, David Abel, Kavosh Asadi, Yuu Jinnai, Emmanuel Rachelson, Michael L. Littman. Lipschitz Lifelong Reinforcement Learning. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, AAAI 2021.

**PDF** - **arXiv**

2019 Erwan Lecarpentier and Emmanuel Rachelson. Non-Stationary Markov Decision Processes, a Worst-Case Approach using Model-Based Reinforcement Learning. In Proceedings of the Thirty-third Conference on Neural Information Processing Systems, NeurIPS 2019.

**NeurIPS** - **PDF** - **arXiv**

2018 Erwan Lecarpentier, Guillaume Infantes, Charles Lesire, and Emmanuel Rachelson. Open loop execution of tree-search algorithms. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018.

**IJCAI** - **PDF** - **arXiv**

2017 Erwan Lecarpentier, Sebastian Rapp, Marc Melo and Emmanuel Rachelson. Empirical evaluation of a Q-Learning Algorithm for Model-free Autonomous Soaring. In JFPDA 2017.

**JFPDA** - **PDF** - **arXiv**

PhD Thesis: Reinforcement Learning in Non-Stationary Environments, ISAE-SUPAERO - Université de Toulouse.

**PDF**

Some RL environments I created:

**Dyna Gym**
This is a pip package implementing Reinforcement Learning algorithms in non-stationary environments supported by the OpenAI Gym toolkit.

**Flatland environment**
A C++ library for navigation task in a 2D environment. The settings enable the use of different policies, environments and action spaces. Choice of state space is also made available so that the agent can either evolve within a discrete gridworld or a continuous-state world.

**Learning2Fly**
A C++ library simulating the flight of a glider UAV within a non-stationary atmosphere featuring thermal currents. The used dynamics model is borrowed from Beeler et al. 2003.

**Traveler**
Traveler is a graph-based non-stationary MDP simulating travels between waypoints. Each node of the graph represents a location and each edge a route between locations. The travel duration corresponding to an edge is time-dependent, making the environment non-stationary. The goal of an agent is to reach a unique termination node as quickly as possible.