Optimal Rewards for Cooperative Agents


Following work on designing optimal rewards for single agents, we define a multiagent optimal rewards problem (ORP) in cooperative (specifically, common-payoff or team) settings. This new problem solves for individual agent reward functions that guide agents to better overall team performance relative to teams in which all agents guide their behavior with the same given team-reward function. We present a multiagent architecture in which each agent learns good reward functions from experience using a gradient-based algorithm in addition to performing the usual task of planning good policies (except in this case with respect to the learned rather than the given reward function). Multiagency introduces the challenge of nonstationarity: because the agents learn simultaneously, each agent's reward-learning problem is nonstationary and interdependent on the other agents evolving reward functions. We demonstrate on two simple domains that the proposed architecture outperforms the conventional approach in which all the agents use the same given team-reward function (even when accounting for the resource overhead of the reward learning); that the learning algorithm performs stably despite the nonstationarity; and that learning individual reward functions can lead to better specialization of roles than is possible with shared reward, whether learned or given.

Did you like this research project?

To get this research project Guidelines, Training and Code... Click Here

PROJECT TITLE : Reinforcement Learning-based Collision Avoidance and Optimal Trajectory Planning in UAV Communication Networks ABSTRACT: In this paper, we investigate optimal trajectory planning for unmanned aerial vehicle (UAV)
PROJECT TITLE : PRIME: An Optimal Pricing Scheme for Mobile Sensors-as-a-Service ABSTRACT: In this article, we propose a pricing scheme for provisioning mobile Sensors-as-a-Service (mSe-aaS) in the mobile sensor-cloud (MSC) architecture.
PROJECT TITLE : Fast Globally Optimal Transmit Antenna Selection and Resource Allocation Scheme in mmWave D2D Networks ABSTRACT: The process of transmit antenna selection, abbreviated as TAS at base stations, has been the subject
PROJECT TITLE : Optimal Sensor Placement for Source Localization: A Unified ADMM Approach ABSTRACT: Source localization is an important part of many different applications, including radar, wireless communications, and communications
PROJECT TITLE : Optimal Scale Combination Selection Integrating Three-Way Decision With Hasse Diagram ABSTRACT: In the field of machine learning, the multi-scale decision system, also known as MDS, is a useful tool for describing

Ready to Complete Your Academic MTech Project Work In Affordable Price ?

Project Enquiry