Demand response (DR) programs aim to engage distributed demand-side resources in providing ancillary services for electric power systems. Previously, aggregated thermostatically controlled loads (TCLs) have been demonstrated as a technically viable and economically valuable provider of such services that can effectively compete with conventional generation resources in reducing load peaks and smoothing demand fluctuations. Yet, to provide these services at scale, a large number of TCLs must be accurately aggregated and operated in sync. This paper describes a Markov Decision Process (MDP) that aggregates and models an ensemble of TCLs. Using the MDP framework, we propose to internalize the exogenous uncertain dynamics of TCLs by means of stochastic and distributionally robust optimization. First, under mild assumptions on the underlying uncertainty, we derive analytical stochastic and distributionally robust control policies for dispatching a given TCL ensemble. Second, we further relax these mild assumptions to allow for a more delicate treatment of uncertainty, which leads to distributionally robust MDP formulations with moment- and Wasserstein-based ambiguity sets that can be efficiently solved numerically. The case study compares the analytical and numerical control policies using a simulated ensemble of 1,000 air conditioners.