Making real-time business decisions based on data from social, mobile and cloud platforms is a requirement in today’s world. That’s why an ecosystem of next-generation operational reporting platforms such as Spark, Apache Cassandra, Kafka, Docker, Mesos, Marathon and more have come to life. Enterprises are using these technologies to re-architect their data storage and processing frameworks to achieve lower costs, scalability and faster response times. However, as enterprises onboard business-critical applications on this next-generation technology stack, they need to prepare for failures or errors that may disrupt customer-facing applications.
One of the key benefits that operational reporting platforms provide is real-time visibility, which allows businesses to make strategic decisions based on actionable insights. However, the new applications generate a large amount of data and archaic operational reporting platforms (those based on traditional data warehouses, data marts and ETL logic) are unable to catch up and, as a result:
- Business decisions are based on stale data
- Fragmented data increases the risk of errors
- Multiple copies of data increase storage costs
These issues have driven many progressive, data-centric enterprises to re-architect their operational reporting platforms. In this new architecture, Kafka serves as a persistent message bus that ingests data from relational stores and provides failure resiliency and message buffering. Spark or Spark Streaming may be used for transformation, aggregation and other lightweight stream processing. These processing nodes may be deployed in Docker containers. Finally, Apache Cassandra serves as a distributed data storage layer that provides linear scalability and high-availability. Kafka also enables the ability to add multiple data consumers such as Hadoop/HBase through Flafka.
However, as enterprises adopt operational reporting platforms, they want to ensure that bad data feeds and human error do not result in data loss. Apache Cassandra natively replicates data but that does not provide the capability to go back in time to recover data efficiently and at scale.
As enterprises onboard their critical applications to achieve development agility, scalability and lower operating costs, they must also explore data protection solutions that are specifically dedicated for scale-out databases.