The document presents a novel approach to off-policy reinforcement learning with the introduction of a new algorithm called Retrace(λ). This algorithm achieves low variance, safety by effectively utilizing samples from any behavior policy, and efficiency by leveraging samples from near on-policy behavior. The proposed method demonstrates a convergence to the optimal action-value function without requiring the typical greedy-in-the-limit with infinite exploration assumption.