This document presents a study on learning-based route management in mobile ad hoc networks, focusing on various routing protocols such as DSDV, AODV, and DSR, and an innovative method called Q routing that utilizes reinforcement learning. The proposed approach enhances routing performance by optimizing route selection based on estimated link delays and confidence values. The findings indicate that the dual reinforcement confidence-based Q routing outperforms traditional protocols in terms of packet delivery ratio and end-to-end delay.