Systems


HOPS is the first open platform-as-a-service distribution of Hadoop v2. HOPS provides support for virtualized Hadoop on different platforms: HOPS can currently be deployed and managed on AWS EC2, OpenStack, and Bare-Metal. HOPS also provides a highly available, more scalable distribution of HDFS, where the NameNode is replaced by a highly available, replicated in-memory database. Our HDFS distribution supports much larger amounts of meta-data that is now customizable support. HOPS also supports many different data-intensive computing platforms, such as MapReduce and Spark through YARN. Stratosphere will support come soon. Read more here.


Flink
Apache Flink is a platform for efficient, distributed, general-purpose data processing. It features powerful programming abstractions in Java and Scala, a high-performance runtime, and automatic program optimization. It has native support for iterations, incremental iterations, and programs consisting of large DAGs of operations. Flink Streaming is an extension of the core Flink API for high-throughput, low-latency data stream processing. The system can connect to and process data streams from many data sources like RabbitMQ, Flume, Twitter, ZeroMQ and also from any user defined data source. Read more here.


Karamel
Karamel is an orchestration engine for Chef Solo that enables the deployment of arbitrarily large distributed systems on both virtualized platforms, e.g., AWS, and bare-metal hosts. A distributed system is defined in YAML as a set of node groups that each implement a number of Chef recipes, where the Chef cookbooks are deployed on github. Karamel orchestrates the execution of Chef recipes using a set of ordering rules defined in a YAML file (Karamelfile) in each cookbook. For each recipe, the Karamelfile can define a set of dependent (possibly external) recipes that should be executed before it. At the system level, the set of Karamelfiles defines a directed acyclic graph (DAG) of service dependencies. Karamel system definitions are very compact. We leverage Berkshelf to transparently download and install transitive cookbook dependencies, so large systems can be defined in a few lines of code. Finally, the Karamel runtime builds and manages the execution of the DAG of Chef recipes, by first launching the virtual machines or configuring the bare-metal boxes and then executing recipes with Chef Solo. The Karamel runtime executes the node setup steps using JClouds or ssh. Karamel transparently handles faults by retrying, as virtual machine creation or configuration is not always reliable or timely. Read more here.


Kompics
Kompics is a message-passing component model for building distributed systems by putting together protocols programmed as event-driven components. Systems built with Kompics leverage multi-core machines out of the box and can be dynamically reconfigured to support hot software upgrades. A simulation framework enables deterministic debugging and reproducible performance evaluation of unmodified Kompics distributed systems. Read more here.


CATS
Distributed key-value stores provide scalable, fault-tolerant, and self-organizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing. CATS is a distributed key-value store that uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is scalable, elastic, and self-organizing; key properties for modern cloud storage middleware. Read more here.


ElastMan
ElastMan is an elasticity controller for Elastic Cloud-based services. ElastMan combines feedforward and feedback control. Feedforward control is used to respond to spikes in the workload by quickly resizing the service to meet SLOs at a minimal cost. Feedback control is used to correct modeling errors and to handle diurnal workload. To address nonlinearities, our design of ElastMan leverages the near-linear scalability of elastic Cloud services in order to build a scale-independent model of the service. Read more here.


PonIC
PonIC is an initial implementation of an integration of Pig and Stratosphere. The current prototype supports a subset of the most common Pig operations and it can be easily extended to support the complete set of Pig Latin statements. Stratosphere has desirable properties that significantly simplify the plan generation. We argue that Pig can highly benefit from using Stratosphere as the back-end system and gain performance, without any loss of expressiveness. Read more here.