UNLABELLED: All mammals exhibit flexible decision policies that depend, at least in part, on the cortico-basal ganglia-thalamic (CBGT) pathways. Yet understanding how the complex connectivity, dynamics, and plasticity of CBGT circuits translate into experience-dependent shifts of decision policies represents a longstanding challenge in neuroscience. Here we present the results of a computational approach to address this problem. Specifically, we simulated decisions driven by CBGT circuits under baseline, unrewarded conditions using a spiking neural network, and fit an evidence accumulation model to the resulting behavior. Using canonical correlation analysis, we then replicated the identification of three control ensembles ( AUTHOR SUMMARY: The task of selecting an action among multiple options can be framed as a process of accumulating streams of evidence, both internal and external, up to a decision threshold. A decision policy can be defined by the unique configuration of factors, such as accumulation rate and threshold height, that determine the dynamics of the evidence accumulation process. In mammals, this process is thought to be regulated by low dimensional subnetworks, called control ensembles, within the cortico-basal ganglia-thalamic (CBGT) pathways. These control ensembles effectively act by tuning specific aspects of evidence accumulation during decision making. Here we use simulations and computational analysis to show that synaptic plasticity at the cortico-striatal synapses, mediated by choice-related reward signals, adjusts CBGT control ensemble activity in a way that improves accuracy and reduces decision time to maximize the increase of reward rate during learning.