P starcraft ii images

4/25/2023

Agents are based on similar neural network architecture ( Figure 1) which differs only on the last control module part and slightly on the first layers due to different definitions of input. We train the Maneuver and ACC agents ( Section 3.1) with PPO algorithm. The representative work proposes to reconstruct the observation with autoencoders, a training model to predict the next state, or a train inverse model to predict the action from the previous state. The last type of transparent learning is representation learning which involves learning latent features to facilitate the extraction of meaningful information by the agent models. An example work which proposed to learn multiple Q-functions, one for each meaningful part of the reward, to understand predictions about future rewards. Another approach is simultaneous learning which learns both the policy and explanation at the same time.

The hierarchical structure is designed to provide an understanding of the agent’s decision-making processes. Such an approach is implemented in hierarchical RL where the major task is decomposed for sub-tasks with a trained higher-level agent and lower-level agents. The group of transparent algorithms includes those whose models are built to support their interpretability. The recent advances in XRL were categorized in into two major groups: transparent algorithms and post hoc explainability. We find out that making such an analysis allows for a better understanding of the agent’s decisions, inspecting its behavior, debugging the ANN model, and verifying the correctness of the input values, which increases its credibility. We applied the proposed method to the RL motion planning agent which is supposed to drive a vehicle safely and efficiently on a highway. The method allows us to determine whether the process of making a decision by the agent is coherent with human intuition and what contradicts it. It indicates which input features have an impact on the agent’s decision and the relationships between the decisions, the significance of the input features, and their values. The method relies on conducting a statistical analysis on a massive set of state-decisions samples.

In this paper, we propose a method for understanding what the RL agent’s decision is based on.

This especially applies to the use of deep reinforcement learning in solutions that require security, such as vehicle motion planning. While machine learning models are powering more and more everyday devices, there is a growing need for explaining them.

0 Comments

P starcraft ii images

Leave a Reply.

Author

Archives

Categories