Distributed Multi-Agent Online Learning Based on Global Feedback - 2015
In this paper, we tend to develop on-line learning algorithms that enable the agents to cooperatively find out how to maximize the overall reward in scenarios where only noisy global feedback is available while not exchanging any data among themselves. We have a tendency to prove that our algorithms' learning regrets-the losses incurred by the algorithms because of uncertainty-are logarithmically increasing in time and so the time average reward converges to the optimal average reward. Moreover, we additionally illustrate how the regret depends on the scale of the action area, and we have a tendency to show that this relationship is influenced by the informativeness of the reward structure with regard to each agent's individual action. When the overall reward is absolutely informative, regret is shown to be linear in the whole variety of actions of all the agents. When the reward function is not informative, regret is linear in the number of joint actions. Our analytic and numerical results show that the proposed learning algorithms significantly outperform existing online learning solutions in terms of regret and learning speed. We illustrate how our theoretical framework can be utilized in practice by applying it to online Massive Information mining using distributed classifiers.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here