In this report, we investigate and characterize the
behavior of “big” and “fast” data analysis frameworks, in multitenant,
shared settings for which computing resources (CPU
and memory) are limited. Such settings and frameworks are
frequently employed in both public and private cloud deployments.
Resource constraints stem from both physical limitations
(private clouds) and what the user is willing to pay (public
clouds). Because of these constraints, users increasingly attempt
to maximize resource utilization and sharing in these settings.
To understand how popular analytics frameworks behave and
interfere with each other under such constraints, we investigate
the use of Mesos to provide fair resource sharing for resource
constrained private cloud systems. We empirically evaluate such
systems using Hadoop, Spark, and Storm multi-tenant workloads.
Our results show that in constrained environments, there is
significant performance interference that manifests in multiple
ways. First, Mesos is unable to achieve fair resource sharing
for many configurations. Moreover, application performance over
competing frameworks depends on Mesos offer order and is
highly variable. Finally, we find that resource allocation among
tenants that employ coarse-grained and fine-grained framework
scheduling, can lead to a form of deadlock for fine-grained
frameworks and underutilization of system resources.