We study the problem of choosing optimal policy rules in uncertain environments using models that may be incomplete and/or partially identified. We consider a policymaker who wishes to choose a policy to maximize a particular counterfactual quantity called a policy transform. We characterize learnability of a set of policy options by the existence of a decision rule that closely approximates the maximin optimal value of the policy transform with high probability. Sufficient conditions are provided for the existence of such a rule. However, learnability of an optimal policy is an ex-ante notion (i.e. before observing a sample), and so ex-post (i.e. after observing a sample) theoretical guarantees for certain policy rules are also provided. Our entire approach is applicable when the distribution of unobservables is not parametrically specified, although we discuss how semiparametric restrictions can be used. Finally, we show possible applications of the procedure to a simultaneous discrete choice example and a program evaluation example.