The Making of Kloudfuse 3.5: Custom Metrics SLOs - Monitoring What Matters to Your Business

Define service level objectives using any metric with PromQL, tracking business outcomes

Table of Contents

Service Level Objectives traditionally focus on technical performance: 99% of requests complete within 200ms, 99.9% of requests succeed without errors. These SLOs matter for understanding system health, but they don't always reflect what users or businesses actually care about.

A payment API might meet its latency and availability SLOs while checkout completion rates drop. A data pipeline might process requests successfully while falling behind on data freshness commitments. A recommendation service might respond quickly while recommendation quality degrades.

Technical SLOs measure infrastructure health. Business SLOs measure outcomes. Kloudfuse 3.5 bridges this gap with Custom Metrics SLOs.

The Limitations of Traditional SLOs

Kloudfuse introduced Service Level Objectives in version 2.6.7, focusing on APM metrics derived from distributed tracing. Teams could set SLOs like "99% of requests to checkout-service complete within 500ms" with alerting when error budgets approached depletion.

These traditional SLOs work well for technical performance targets. Latency percentiles. Error rates. Request success ratios. All derived from distributed traces captured through APM instrumentation.

The limitation: not everything that matters shows up in APM traces. Businesses care about conversion rates tracked through custom application metrics. Data teams commit to processing lag measured in minutes or hours, not request latency. Platform teams monitor queue depths, cache hit ratios, and resource utilization that define service quality but aren't request-response patterns.

Traditional SLO systems force you to choose: monitor technical performance with proper SLO tracking, or monitor business metrics without formal objectives and error budget management.

Custom Metrics SLOs Using PromQL

Kloudfuse 3.5 removes this limitation. Custom Metrics SLOs enable defining service level objectives using any metric accessible through PromQL queries, not just pre-defined APM metrics.

Define SLO calculations using custom numerator and denominator queries. The numerator represents good events—successful outcomes you're measuring. The denominator represents total events—all attempts or opportunities. The ratio becomes your SLI (Service Level Indicator), which you measure against your SLO target.

This flexibility enables SLOs for any quantifiable business or technical objective:

Business-driven SLOs track outcomes users care about. E-commerce platforms define SLOs for checkout completion rates: numerator counts completed purchases, denominator counts initiated checkouts. The target might be 95% completion rate. When completion drops below threshold, alerts fire before revenue impact becomes severe.

Data pipeline SLOs track processing freshness. Data platforms commit to processing events within 5 minutes of generation. Numerator counts events processed within SLA, denominator counts total events. The SLO tracks whether data remains fresh enough for downstream consumers.

Infrastructure SLOs track resource efficiency. Platform teams monitor cache hit ratios as SLOs: numerator counts cache hits, denominator counts total requests. Degrading cache performance appears as SLO violations, triggering investigation before user-facing latency increases.

Custom application SLOs track domain-specific quality. Recommendation systems measure recommendation acceptance rates. Search platforms track query success rates. Fraud detection systems monitor false positive ratios. Each domain has metrics defining quality that don't fit traditional latency/availability patterns.

Integration with Alerting and Error Budgets

Custom Metrics SLOs maintain full integration with Kloudfuse's alerting system and error budget tracking. This isn't a separate monitoring system—it's the same SLO framework extended to custom metrics.

Define SLO targets and error budgets. When your checkout completion rate drops below 95%, alerts fire to configured contact points. Error budget tracking shows how much budget remains: if you're targeting 95% completion and currently running at 94.5%, you're consuming error budget at a measurable rate.

Error budget burn rate alerts provide early warning. Rather than waiting until SLO violations occur, configure alerts when burn rate accelerates. If checkout completion normally consumes error budget slowly but suddenly drops sharply, burn rate alerts fire immediately—enabling response before monthly SLO targets are missed.

This integration means business-driven objectives receive the same operational rigor as technical performance targets. Checkout completion rates aren't just dashboards someone checks occasionally. They're formal service level objectives with defined targets, error budgets, and automated alerting.

PromQL Flexibility

Using PromQL queries for SLO calculation provides significant flexibility. PromQL's aggregation and filtering capabilities enable sophisticated SLO definitions.

Calculate SLOs across specific dimensions. An e-commerce platform might track checkout completion rates separately by region, by payment method, or by customer segment. Each becomes an independent SLO with its own target and error budget. When completion rates drop for a specific payment provider, that SLO fires alerts while overall completion remains healthy.

Combine multiple metrics into composite SLOs. A data pipeline SLO might require both processing freshness (events processed within 5 minutes) AND processing accuracy (fewer than 0.1% errors). Both conditions must be met for events to count as "good" in the SLO calculation.

Leverage existing Prometheus instrumentation. If you're already collecting custom application metrics through Prometheus exporters, those metrics become available for SLO definitions immediately. No new instrumentation required—just define the PromQL queries expressing your objectives.

Operational Use Cases

Custom Metrics SLOs enable operational patterns that weren't practical with traditional APM-only SLOs.

Product teams define SLOs for user-facing features. A video streaming platform tracks buffering rates as SLOs: percentage of playback sessions with less than 2% time spent buffering. When buffering increases, product teams receive alerts tied to actual user experience degradation, not just infrastructure metrics.

Data teams manage pipeline SLAs through SLOs. A real-time analytics platform commits to 5-minute data freshness. Custom Metrics SLOs track this commitment formally, with error budgets and alerting. When processing lag increases, data teams know immediately—before downstream consumers complain about stale data.

Platform teams track efficiency metrics as SLOs. Cache hit ratios, connection pool utilization, message queue depths—all become formal objectives with targets and budgets. Degradation in these efficiency metrics triggers investigation before they cascade into user-facing performance problems.

Finance teams monitor transaction success rates as SLOs. Payment processing platforms track authorization rates, settlement rates, and fraud detection accuracy. These business-critical metrics receive SLO-level visibility and management rather than ad-hoc monitoring.

What We Built

Custom Metrics SLOs in Kloudfuse 3.5 deliver:

  • PromQL-based SLO definitions for any metric, not just APM latency and availability

  • Custom numerator and denominator queries for flexible SLI calculation

  • Full integration with alerting system and contact points

  • Error budget tracking and burn rate alerts for custom metrics

  • Multi-dimensional SLOs across regions, segments, or custom attributes

  • Composite SLOs combining multiple metrics into single objectives

Service Level Objectives shouldn't be limited to technical performance metrics. Businesses care about outcomes. Custom Metrics SLOs enable tracking what actually matters—with the same rigor, alerting, and error budget management as traditional SLOs.

Observe. Analyze. Automate.

logo for kloudfuse

Observe. Analyze. Automate.

logo for kloudfuse

Observe. Analyze. Automate.

logo for kloudfuse

All Rights Reserved ® Kloudfuse 2025

Terms and Conditions

All Rights Reserved ® Kloudfuse 2025

Terms and Conditions

All Rights Reserved ® Kloudfuse 2025

Terms and Conditions