Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Automated Implementation of Hybrid Cloud for Performance Evaluation of Distributed Databases (2006.02833v1)

Published 4 Jun 2020 in cs.DC

Abstract: A Hybrid cloud is an integration of resources between private and public clouds. It enables users to horizontally scale their on-premises infrastructure up to public clouds in order to improve performance and cut up-front investment cost. This model of applications deployment is called cloud bursting that allows data-intensive applications especially distributed database systems to have the benefit of both private and public clouds. In this work, we present an automated implementation of a hybrid cloud using (i) a robust and zero-cost Linux-based VPN to make a secure connection between private and public clouds, and (ii) Terraform as a software tool to deploy infrastructure resources based on the requirements of hybrid cloud. We also explore performance evaluation of cloud bursting for six modern and distributed database systems on the hybrid cloud spanning over local OpenStack and Microsoft Azure. Our results reveal that MongoDB and MySQL Cluster work efficient in terms of throughput and operations latency if they burst into a public cloud to supply their resources. In contrast, the performance of Cassandra, Riak, Redis, and Couchdb reduces if they significantly leverage their required resources via cloud bursting.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yaser Mansouri (4 papers)
  2. Victor Prokhorenko (4 papers)
  3. M. Ali Babar (71 papers)
Citations (31)

Summary

  • The paper presents an automated hybrid cloud implementation using WireGuard VPN and Terraform to enable on-demand cloud bursting for performance evaluation.
  • The study shows that MongoDB and MySQL Cluster maintain stable read performance, while Cassandra, Riak, CouchDB, and Redis experience significant degradation.
  • The evaluation highlights that cost-effective automation approaches can achieve robust, reproducible deployments despite critical WAN latency bottlenecks.

This paper, "An Automated Implementation of Hybrid Cloud for Performance Evaluation of Distributed Databases" (Mansouri et al., 2020 ), presents a practical approach to building and evaluating a hybrid cloud environment, specifically focusing on the performance of distributed databases under cloud bursting scenarios.

The core problem addressed is the challenge of securely, robustly, and cost-effectively connecting private and public cloud resources to enable cloud bursting, and then assessing how distributed databases perform when scaled across this hybrid infrastructure, particularly considering the impact of Wide Area Network (WAN) latency.

The authors propose an automated solution for implementing a hybrid cloud using two key technologies:

  1. WireGuard VPN: A Linux kernel-based VPN is used to establish a secure, encrypted connection between the private cloud (OpenStack) and the public cloud (Microsoft Azure). This is chosen over traditional VPN options like Azure VPN Gateway due to its reported simplicity, robustness, zero cost, and better performance characteristics (lower ping time and higher throughput compared to IPSec).
  2. Terraform: This Infrastructure as Code (IaC) tool is leveraged to automate the provisioning and management of infrastructure resources (VMs, networks, security groups) across both OpenStack and Azure, ensuring consistent and repeatable deployments.

The paper outlines three hybrid cloud usage models: On-demand (cloud bursting from private to public), Fragmented (connecting multiple private clouds), and Collaborative (connecting various private and public clouds under one or collaborating organizations). The implemented solution focuses on the On-demand model, deemed one of the most conventional. The implementation process involves phases such as creating consumer (private) and donor (public) broker VMs, configuring WireGuard connectivity between them using public/private keys, expanding shared networks, peering networks, and configuring data routing.

To evaluate the performance impact of cloud bursting, the authors deploy six popular distributed databases (MongoDB, Cassandra, Riak, CouchDB, Redis, and MySQL Cluster) across the implemented hybrid cloud. The experimental setup consists of a total of 8 VMs, distributed between the local OpenStack cloud and Microsoft Azure in varying configurations (e.g., 8 nodes in private, 0 in public; 4 in private, 4 in public; 1 in private, 7 in public). The OpenStack cluster is located in Australia, and the Azure cluster is in the East US region, ensuring a significant WAN distance. The databases are benchmarked using the YCSB tool with various workloads (read-intensive, write-intensive, read-only, read-latest, scan, read-modify-write).

Key findings from the performance evaluation include:

  • Network Performance: The latency between the private and public clouds is significantly high and unpredictable (220-230ms), and bandwidth is limited (around 1 MB/s upload, 11 MB/s download). This confirms the significant network bottleneck in a hybrid cloud spanning distant data centers.
  • Database Performance:
    • MongoDB and MySQL Cluster: These databases generally performed better or showed less degradation under cloud bursting compared to others. They achieved the lowest read and write latencies among the tested databases. For read-related workloads, their throughput either remained stable or slightly improved in hybrid configurations, although write-intensive workloads could see degradation when a majority of nodes were in the public cloud.
    • Cassandra, Riak, CouchDB, and Redis: These databases exhibited substantial performance degradation (reduced throughput, increased latency) when increasing the number of nodes in the public cloud. Databases using quorum-based consistency (Cassandra, CouchDB) were particularly affected by high latency. Redis, an in-memory database, also saw significant performance drops as it's not optimized for WAN deployment.
  • Error Rates: Riak and MySQL Cluster had 0% operation errors in hybrid configurations. Cassandra had the highest error percentage (20%-26%), followed by MongoDB (around 11%).

Practical observations highlight the simplicity, reliability, and cost-effectiveness of WireGuard compared to Azure VPN for hybrid cloud connectivity. Terraform proved effective for automating infrastructure deployment and ensuring reproducibility. Database-specific complexities in installation and clustering required flexible scripting.

The authors conclude that while cloud bursting may not always improve database performance, it can be beneficial for capacity expansion under certain conditions. The high latency between distant data centers is a critical factor. The paper suggests that databases like MongoDB and MySQL Cluster are better suited for hybrid cloud bursting over WAN compared to Cassandra, Riak, CouchDB, and Redis, unless database configurations or architectures are adapted for such environments.

Future work proposed includes investigating dynamic scaling, the impact of distance and replication factor, cross-region public cloud deployments, and optimal data placement strategies for hybrid clouds.