Learning for Multi-robot Cooperation in Partially Observable Stochastic Environments with Macro-actions