This article proposes a reinforcement learning (RL) method based on the Actor-Critic architecture, which can be applied to partially-observable multi-agent competitive games. As an example, we consider a card game "Hearts". The RL then becomes a partially-observable Markov decision process (POMDP). However, the card distribution becomes inferable from the disclosed information as a single game proceeds. In addition, the strategy (model) of the other players can be learnable from their actual plays by repeating games. In our method, a single Hearts game is divided into three stages, and three actors are prepared so that one of them plays and learns separately in each stage. In particular, the actor for the middle stage plays so as to enlarge the expected temporal-difference (TD) error, which is calculated using the evaluation function approximated by the critic and the estimated state transition. After a learning player trained by our RL method plays several thousands training games with three heuristic players, the RL player becomes strong enough to beat the heuristic players.