Let's take a deep-dive into the Kloudfuse platform, the unified observability data lake.
By Krishna Yadappanavar
Published on 10/28/2023
The modern application development and deployments need agile frameworks. Kubernetes has become the de facto standard for container orchestration. Today, half of organizations running containers use Kubernetes, whether in self-managed clusters, or through a cloud provider service like GKE, AKS, EKS from 3 major cloud vendors. Kubernetes adoption has more than doubled since 2017, and continues to grow steadily.
Usage of serverless container technologies from all major cloud providers—including AWS App Runner, Fargate, Azure Container Apps and Container Instances (ACI), and Google Cloud Run(GCR)—increased from 21 percent in 2020 to 36 percent in 2022.
Users leverage the dashboards as overview and alerts as signals for monitoring, but observability and troubleshooting needs more fine grained data for debugging, deriving the root cause analysis and forecasting to reduce mean time to failure. When it comes to the monitoring, observing and troubleshooting of deployments based on Kubernetes and Serverless, the tools needed for internal real time analytics need more data and intuitive understanding of relationship between all the entities involved. Automation for troubleshooting steps and root cause analysis needs to be thought through from the ground up. Faster CI/CD pipelines and containerization has resulted in an explosion in the number of daily deployments, compounding the high dimensionality and high cardinality problems of observability and troubleshooting.
With the modern application deployments using the Kubernetes and serverless technologies, it has introduced the Data Sprawl problem which can be quantified by High Dimensions (too many attributes in dataset) and High Cardinality (too many distinct values for the data attributes) of data needed for Observability and Troubleshooting. Newer streams in observability like distributed tracing for application performance monitoring (APM), continuous profiling, Front End Real User Monitoring (RUM) and code analysis have increased high dimensions and high cardinality of data needed for Observability and Troubleshooting. Also newer streams of the observability like APM, continuous profiling, RUM and code analysis have become must for the end to end troubleshooting.
When we discussed with the 100+ customers with respect to their current pain points, following points popped up in the internal facing observability data.
When it comes to commercial SaaS vendors like Datadog, Splunk, Dynatrace and NewRelic, customers have following concerns:
Architecturally broken as customers have to send the data to vendor cloud.
Vendor lock-in/inflexible as they do not support Open Source Query languages.
No unlimited data access as they rate limit the data access through their APIs, if you want to do offline analysis.
Privacy/Compliance related issues because of the vendor dictates the compliance levels.
Expensive as the customer can’t use your negotiated cloud cost/credits, egress costs etc.
In the open source world, there are lot of self-managed solutions like Prometheus, ElasticSearch, but they have following issues:
No unified observability solution in the open source.
Operationally heavy as they are hard to manage.
Hard to scale as there are architectural limitations with respect to clustering, high availability and infrastructure costs
Not enterprise grade ready UI/UX to traverse the data.
In the world of Real time Data Analytics, we can overlay how different companies are leveraging the Realtime and Batch analytics on both internal and external data as shown in here. In the quadrant for external real time analytics, Distributed highly scalable OLAP databases like Apache Druid, Pinot and Clickhouse are gaining most traction in the external facing real time analytics use cases.
With the above background in the context of customer problems and the current landscape of OLAP databases, the Kloudfuse platform is the industry’s first full stack observability platform
Built on top of distributed OLAP database with custom built indexes for metadata intensive observability data including metrics and traces and high volume logs.
Industry’s first log deduplication technology built on top of fingerprint indexing to achieve storage efficiency and auto facet extraction.
Unified datastore for all streams of Observability data including Metrics, Events, Logs, Traces, Profiles, Real User Monitoring, CI/CD, GitOps, Infrastructure like k8s, serverless.
Unified data observability for data streams, data warehouses, data lakes.
Fully Open Source Query(PromQL, LogQL, Otel) compatibility with multi level aggregation pushed down to databases.
Decoupled data plane and control plane so that customer observability data can be stored within their VPC, but completely managed operationally using control plane.
Support any agent, any instrumentation so that customer’s existing investment is preserved.
With the promise to migrate from commercial vendors like Datadog, Wavefront, New Relic etc within 7 days to OSS compatible dashboards and alerts.
At Kloudfuse, our vision is to superpower developers, devOps and SRE pre and post deployments.
Our solution lets you deploy the data plane, a Kubernetes cluster with our platform that ingests and stores customer observability data within their VPC, while the control plane manages the complete life cycle of the data plane using just the observability data of the data plane. We support ingestion of all streams of observability data across all clouds, customers applications, platform components like kafka, Cassandra, CI/CD pipelines with 600+ integrations using any open source agents like DataDog, Elastic, Filebeat, FluentBit, Otel agent, prometheus remote write etc. The platform lets you ingest the data with minimal resources like compute, storage and memory, store the data with highest storage efficiency and query with minimal latency with least resources. Our conversion tool transforms the queries and layout of the dashboards and alerts from commercial vendors like Datadog, Wavefront, NewRelic etc to open source compatible queries and Grafana compatible dashboards and alerts. As part of the data platform, we offer a rich set of advanced analytics functions like Outlier and Anomaly detection and correlation and it generates intelligent alerts and provides analysis on those alerts with correlated data from different observability streams.
In the coming blogs, we will go into more details of the Kloudfuse Unified Observability Data Platform and innovations in each of the streams, analytics and conversion tool.