Dynamic treatment regimes (DTRs) are sequential decision regimes for individual patients that can adapt over time to an evolving illness. The goal is to find the DTRs tailored to individual characteristics that lead to the best long term outcome if implemented. In many clinical applications, it is desirable to provide a fixed decision rule over time for the patients. We introduce a general learning framework on constructing stabilized dynamic treatment regimes (SDTRs), where we can make stabilized decisions over time using the repeated measured information on the same variable. The proposed method is based on directly max- imizing a doubly robust estimator of the expected long-term outcome over all constrained DTRs. Numerical studies showed a superior performance of the proposed method.