In recent years, researchers have contributed promising new techniques for allocating cloud resources in more robust, efficient, and ecologically sustainable ways. Unfortunately, the wide-spread use of these techniques in production systems has, to date, remained elusive. One reason for this is that the state of the art for investigating these innovations at scale often relies solely on model-driven simulation. Production-grade cloud software, however, demands certainty and precision for development and business planning that only comes from validating simulation against empirical observation.
In this work, we take an alternative approach to facilitating cloud research and engineering in order to transition innovations to production deployment faster. In particular, we present a new methodology that complements existing model-driven simulation with platform-specific and statistically trustworthy results. We simulate systems at scales and on time frames that are testable, and then, based on the statistical validation of these simulations, investigate scenarios beyond those feasibly observable in practice. We demonstrate the approach by developing an energy-aware cloud scheduler and evaluating it using production and synthetic traces in faster than real time. Our results show that we can accurately simulate a production IaaS system, ease capacity planning, and expedite the reliable development of its components and extensions.