Introducing Kloudfuse 4.0

Certified. AI-Ready. Built for Scale.

Table of Contents

Why This Release Exists

Enterprise observability has a structural problem. The platforms that handle massive scale require you to send every metric, log, and trace to someone else's cloud. The platforms that keep data inside your infrastructure force you to compromise on features, AI capabilities, or compliance readiness. And across the industry, security is treated as a tier, something you unlock with a federal SKU or a premium contract, not something that ships with the product.

Kloudfuse 4.0 was built to resolve these trade-offs.

This release delivers three things that most enterprises have been told they cannot get from a single platform: NIST-validated cryptography embedded across every service, an AI observability layer governed with the same rigor as production data access, and platform engineering controls that let teams separate, scale, and govern workloads independently. It runs in your VPC on AWS, Azure, or GCP. No data leaves your environment.

Three pillars define the release: Certified Secure, AI-Ready, and Built for Scale. Here is what each one means and why it matters.

“At our scale, reliability depends on how quickly teams can identify issues, understand service dependencies, and take action with confidence. Kloudfuse has helped simplify that by giving our teams a more unified view of production behavior across the platform. With Kloudfuse 4.0’s workload isolation, we can also scale observability infrastructure more deliberately as demand grows, without creating new operational bottlenecks. That combination strengthens both reliability execution and long-term resilience.”
- Michael Kuperman, Chief Reliability Officer & GM, Zscaler

Certified Secure

FIPS 140-3 Certification - Certificate #5186

Kloudfuse has achieved FIPS 140-3 certification, validated by the National Institute of Standards and Technology (NIST) Cryptographic Module Validation Program (CMVP). Certificate #5186 and #5209. The cryptographic module, SafeLogic CryptoComply, is embedded directly in the Kloudfuse platform. It is not an external dependency that customers source and configure. It is not a flag in a settings file that an administrator enables before first startup. It ships with every deployment by default.

That distinction , embedded versus configured, default versus opt-in , is the difference between a compliance claim and a certified property of the architecture.

Most observability platforms approach FIPS in one of three ways. Some offer a FIPS-enabled agent or proxy that covers a specific segment of the data path, leaving the rest of the platform outside the scope of the certificate. Others support FIPS as a configuration option, the validated module exists, but it must be explicitly enabled, sometimes before the first startup, sometimes through a settings file after deployment. A third group offers FIPS-validated cryptography only in a separate federal product tier; the standard platform does not include it.

Kloudfuse does not follow any of these patterns. Certificates #5186 and #5209 cover the standard platform. Every commercial deployment ships with STIG-hardened container images and FIPS 140-3 compliant cryptographic algorithms by default. The same validated cryptography that federal customers require is embedded in the architecture from day one, not bolted on after procurement. Organizations with additional federal compliance requirements can deploy Kloudfuse's dedicated federal configuration, but the core security posture is never downgraded for commercial customers. Validated cryptography is a property of the platform, not a premium tier.

The FIPS 140-2 sunset is approaching. NIST has set September 22, 2026 as the date when all FIPS 140-2 certificates move to the Historical List. After that date, FIPS 140-2 can no longer support new federal acquisitions. The average CMVP validation process takes over 542 days from submission to certificate. Organizations on FIPS 140-2 today face a mandatory transition with a lead time that has already passed for most vendors. Kloudfuse customers deploying today are already on the standard that will be required six months from now.

STIG Hardened Containers on Red Hat UBI9

Every container image in Kloudfuse 4.0 has been migrated to Red Hat UBI9-minimal base images and STIG hardened. FIPS crypto policy is enabled at the operating system level across all services. This is not a hardening guide that customers follow post-deployment. It is how the containers ship. For organizations operating under DoD mandates, CMMC Level 2 requirements, or agency-specific STIG policies, Kloudfuse 4.0 meets the baseline without additional remediation.

Signed Supply Chain and Non-Root Execution

All container images and Helm charts are cryptographically signed. Customers can verify image integrity before a single pod runs. OIDC federation support allows external partners and CI/CD systems to pull images from AWS ECR without long-lived credentials , an architectural requirement for organizations that have eliminated static secrets from their deployment pipelines.

Every service runs as non-root by default. Service accounts and security context are configurable via Helm values, giving platform teams control over privilege boundaries without modifying the application layer.

Compliance Posture

Kloudfuse holds SOC 2 certification. ISO certification is in progress. FIPS 140-3 certification supports alignment with FedRAMP, CMMC, STIG, HIPAA, and PCI DSS requirements. Combined with the Self-SaaS deployment model , where observability data never leaves the customer's VPC , Kloudfuse eliminates entire categories of compliance concerns that organizations encounter when evaluating SaaS observability tools.

The encryption is validated. The containers are hardened. The supply chain is signed. The processes are non-root. The data stays inside your infrastructure. Security is not a configuration. It is a certified property of the platform.

“As we scale, we need an observability platform that can grow with us without adding operational complexity. Kloudfuse gave us a unified platform that runs inside our environment and supports the level of visibility we need without forcing trade-offs on cost or control. With Kloudfuse 4.0’s FIPS certification, we can take that same observability stack into regulated federal environments without maintaining a separate toolchain. That kind of consistency is a meaningful operational advantage at enterprise scale.”
- Kishore Thakur, Senior Director, Cloud Platform Engineering, Zscaler

AI-Ready: The Enterprise MCP Server

The Kloudfuse MCP Server gives AI agents direct access to production observability data using natural language. Connect Claude, ChatGPT, custom models, or IDE-embedded agents and ask questions like "What caused the latency spike in checkout?" or "Show me services with error rates above 2%" , and get answers from live data without building a dashboard, writing a query, or switching tools.

That capability alone is useful. What makes it enterprise-ready is how the server is governed.

Centrally Managed with Built-In Authentication

The MCP Server runs within the Kloudfuse stack as a remote, centrally managed service. There is no local installation. There is no per-developer configuration file. Every query is tied to a user identity through built-in authentication. IT and security teams manage a single deployment , not individual instances scattered across developer machines. This is the difference between a tool that engineers discover on their own and a capability that the security team can approve for production use.

Query Safety Mode

Every AI-generated query is validated before execution. The MCP Server rejects bare metric selectors that lack label filters, queries with lookback-to-step ratios that would scan excessive data, and queries that would return more data points than the system can handle. These are the same guardrails a platform team would apply to any production data consumer , applied automatically to every AI agent interaction, regardless of which LLM generated the query.

Query safety is not just a performance protection mechanism. It is a governance layer. In environments where observability data includes sensitive infrastructure metadata, unscoped queries are a data exposure risk. The MCP Server enforces scope before the query touches the data.

Full Observability Coverage

The MCP Server covers the entire Kloudfuse data model. Metrics, logs, and traces are queryable through natural language with automatic FuseQL translation. Beyond the core signals:

Profiling: Query and compare application performance profiles through natural language. Discover profile types, retrieve flame graphs, query time series, and compare profiles across time ranges. Performance analysis that previously required navigating a dedicated profiling UI is now a conversation.
RUM: Frontend performance, user sessions, Core Web Vitals trends, and browser metrics. Ask about slow page loads, session errors, or geographic performance patterns without switching to the RUM interface.
APM Execution Breakdowns: Service execution time and downstream dependency contributions surfaced through natural language. During incident response, identifying which downstream service is adding latency becomes a question instead of a multi-tab investigation.

Audit Logging and Horizontal Scaling

Every MCP tool invocation and outbound API call is logged with duration and error information. The AI observability layer itself is auditable , a requirement that most compliance teams will impose before approving AI access to production systems. The server scales horizontally behind a load balancer, operating as an enterprise service rather than a single-process tool.

Built for Scale: Platform Engineering

The features in this section do not make for flashy headlines. They are the reason platform engineering teams commit to a deployment. They address the operational realities of running observability at enterprise scale: workload separation, infrastructure modernization, cost visibility, query performance, and data integrity.

Workload Isolation

Kloudfuse supports separating ingestion, query, and control plane workloads into components that scale independently. Each workload gets its own resource allocation. Tune ingestion memory without degrading query performance. Scale query capacity during business hours without over-provisioning ingestion during off-peak. Right-size each component for its actual demand rather than sizing the entire platform for the worst-case combination.

This is how mature data infrastructure operates. Kafka separates brokers and consumers. Data warehouses separate compute and storage. Observability platforms should separate the workloads that have fundamentally different resource profiles , and now Kloudfuse does.

Envoy Gateway: Moving Beyond NGINX

NGINX Ingress Controller is approaching end-of-life. For organizations standardizing on the Kubernetes Gateway API, this creates an infrastructure dependency that is aging in place. Kloudfuse 4.0 supports Envoy Gateway with the Kubernetes Gateway API as an alternative ingress controller, aligning with the direction the Kubernetes ecosystem is moving.

For existing deployments running NGINX Ingress, we built a zero-downtime, three-step migration path. No data loss. No traffic interruption. The migration can happen at the team's own pace, without scheduling a maintenance window or coordinating a cutover. This is the kind of infrastructure transition that typically requires a quarter of planning. With Kloudfuse 4.0, it is a documented, tested migration path.

Multi-Rollup Resolution: Eliminating Recording Rules

Metrics rollup tables now support multiple resolutions. Queries automatically select the appropriate rollup level and backfill gaps from the raw table. Long-range queries that previously timed out or returned incomplete results now complete faster with accurate data, without any configuration changes.

The operational impact goes further than query speed. Multi-rollup eliminates the need for recording rules entirely. SLO metrics, for example, are now computed at query time directly from raw data; rollup delivers the same performance without the operational overhead of maintaining recording rule configurations. For platform teams managing hundreds of recording rules across dozens of services, each one a potential source of drift, staleness, or misconfiguration, this is a meaningful reduction in operational toil.

Metrics Cardinality Explorer

High-cardinality metrics are one of the most common reasons observability costs spiral. A single label with unbounded values , a request ID, a user ID, a session token accidentally promoted to a metric label , can multiply cardinality by orders of magnitude. The problem is often invisible until the bill arrives.

The Cardinality Explorer provides a unified interface for discovering metrics, analyzing label distributions, and identifying exactly which label combinations produce the most series. Progressive filtering with promoted label filters and optional series count display. Instead of writing ad hoc PromQL to hunt for cardinality offenders, teams get a purpose-built tool that surfaces the information in a single workflow.

Unique Log IDs and Deterministic Sorting

Every ingested log line now receives a unique identifier at ingestion time, regardless of agent type or ingestion source. Log query results can be sorted by any column , timestamp, extracted facets, custom columns , in both query builder and FuseQL modes. Sort order persists across mode switches, dashboard saves, and alert creation. Downloads respect the selected sort order.

This sounds incremental until you are running a compliance audit or an incident post-mortem. Deterministic log ordering is the difference between a reliable audit trail and a best-effort approximation. When a regulator asks for the exact sequence of events between 2:14 and 2:17 AM, approximate ordering is not an answer.

Query Performance

Regex alternation queries , patterns like service_name matching "frontend" or "backend" or "api" , now execute orders of magnitude faster across all telemetry streams (logs, metrics, traces, events, RUM). High-cardinality queries use less memory and complete faster with series limits enforced at every processing stage. Dynamic range query splitting automatically breaks large queries to prevent series limit errors.

On the logs side, the underlying data layout has been significantly optimized for faster filtering and aggregation, particularly on long time range queries. Log hydration is now parallel with support for multiple concurrent jobs, pause/resume, and graceful handling of schema changes across time ranges. These are the performance improvements that compound , they affect every dashboard load, every alert evaluation, every ad hoc investigation.

Traces Streaming

Traces, trace errors, and trace analytics load progressively via streaming. Results appear as they arrive rather than waiting for the full query to complete. Long-running trace queries can be cancelled from the UI. Streaming extends to dashboard panels and alert evaluation views, bringing the same progressive loading pattern to every context where trace data is consumed.

“At our scale, the observability platform itself must never become a constraint. It needs to grow with the business without increasing the complexity teams face when ingesting, querying, or managing telemetry. Kloudfuse 4.0’s workload isolation is critical because it allows each layer to scale independently based on real usage patterns, which better reflects how cloud environments operate. This approach gives us a more resilient and efficient foundation as our platform continues to expand.”
- Kasi Sockalingam, Cloud Engineering Leader, Automation Anywhere

Query Power and Operational Depth

A platform is measured not just by its headline capabilities but by how well it supports the daily work of the teams that depend on it. Kloudfuse 4.0 delivers improvements across the query language, SLO management, alerting, dashboards, and access control that collectively demonstrate operational maturity.

FuseQL

Subqueries enable nested analysis where results from one query feed into another , find the hosts with the highest error rate, then pull their logs in a single query. The compare operator analyzes data across different time periods for before-and-after analysis during deployments or incidents. Parse multi extracts multiple patterns from log data with anchoring and nodrop options. Contextual autocomplete provides intelligent suggestions based on the current query structure, for both FuseQL and PromQL. Sort by clause support before aggregation operators in FuseQL is now also supported. These are the features that make FuseQL a primary query language for daily operations, not a secondary interface.

SLOs

SLO metrics are computed at query time from raw data using multi-rollup, which means recording rules are no longer required for SLO accuracy. Configurable time windows (7-day and 30-day) match how different teams think about reliability targets. SLO labels enable organization and filtering across large SLO sets. SLO versioning tracks creation date, modification date, and version number on every SLO. SLO alerts now include history and versions tabs, giving teams a clear audit trail of how their reliability targets have evolved.

Alerts

Multi-contact point notification policies with label-based routing send different alerts to different teams based on alert metadata , no more monolithic notification channels. Alerts created in the Kloudfuse UI can only be edited in Kloudfuse; alerts created in Grafana can only be edited in Grafana. This boundary prevents the unintended cross-platform modifications that have caused alert outages in dual-UI environments. Pending silence display shows scheduled-but-not-yet-active silences in the UI, so teams know in advance when alerts will be temporarily suppressed during maintenance windows. Datasource-based minimum step intervals and configurable evaluation frequency for non-metric alerts (logs, traces) round out the maturity improvements.

Dashboards and UX

Table sort persistence across reloads and saves. Stat panel custom font sizes and units. Default time range configuration. Trace list template variable filtering. RUM-specific template variable support. Folder-level RBAC with permission inheritance. Shortened URLs for bookmarking and sharing across messaging tools. Dashboard list limits increased to 5,000 for organizations with large dashboard libraries. Custom log columns persist when switching between query builder and FuseQL modes. These are the refinements that reduce friction in daily workflows.

Infrastructure and Integrations

Grafana upgraded to v12.4.1 with security fixes and improved compatibility. Scheduled views now support historical data backfill , specify a start time and the view catches up automatically. AWS RDS Aurora cluster-level metrics for organizations monitoring Aurora at the cluster layer. Multiple GCP project IDs for Stackdriver metrics collection across multi-project GCP environments. Application-specific RUM custom facets and custom metrics for Core Web Vitals (LCP, FCP, INP).

The Self-SaaS Advantage

Every capability described in this post runs inside your VPC. On your cloud account. On your infrastructure. Your observability data , every metric, every log line, every trace, every session , stays within your environment.

Kloudfuse delivers SaaS-grade simplicity , managed control plane, automatic updates, expert support , without the SaaS trade-offs. No data egress fees. No per-GB ingestion pricing surprises. No compliance risk from sending production telemetry to a third-party cloud. You control retention. You control storage costs. You control who sees what.

For organizations that have been told they must choose between the operational simplicity of SaaS and the control of self-hosted, Kloudfuse 4.0 demonstrates that the choice is unnecessary.

What 4.0 Means

The FIPS 140-3 certification is not a roadmap item. It is Certificate #5186, validated by NIST CMVP, running on STIG-hardened Red Hat UBI9 containers with a signed supply chain. The MCP Server is not a prototype. It is a centrally managed, query-safe, audited, horizontally scaling enterprise service. Workload isolation is not a future architecture. It is how the platform operates today.

Kloudfuse 4.0 is built for organizations that need unified observability across metrics, logs, traces, events, and real user monitoring. That need AI-native capabilities governed with production-grade rigor. That require federal-grade security without a separate product tier. And that refuse to send their data to someone else's cloud to get it.

About Kloudfuse: Kloudfuse is a unified observability platform integrating with over 700 infrastructures, cloud services, and applications. Built on open standards like OpenTelemetry and PromQL, Kloudfuse deploys within customer VPCs to deliver metrics, logs, traces, events, and real user monitoring with enterprise security, AI-native capabilities, and cost control. Trusted by Zscaler, GE Healthcare, Workday, Tata 1mg, and Automation Anywhere.