The Making of Kloudfuse 3.5: Visualizing Kubernetes Topology Using Pure OpenTelemetry

Real-time Kubernetes topology powered entirely by OpenTelemetry, no proprietary agents.

Table of Contents

Kubernetes topology visualization has traditionally required proprietary agents. Vendors install custom collectors that scrape Kubernetes APIs, discover pod relationships, map services to workloads, and build dependency graphs. These agents work, but they create vendor lock-in. Your infrastructure instrumentation becomes tied to a specific observability platform.

We built Kubernetes topology visualization in Kloudfuse 3.5 using pure OpenTelemetry. No proprietary agents. No vendor-specific collectors. Just standard OTel instrumentation that remains portable regardless of which observability backend you choose.

The Proprietary Agent Problem

Most observability vendors approach Kubernetes monitoring with custom agents. Datadog has the Datadog Agent. New Relic has the New Relic Infrastructure Agent. Dynatrace has the OneAgent. These agents deeply integrate with vendor platforms, providing rich topology visualization and dependency mapping.

The trade-off is lock-in. Your instrumentation becomes specific to that vendor. Switching observability platforms means replacing agents across your entire infrastructure. The migration complexity creates switching costs that vendors rely on for retention.

OpenTelemetry emerged as the industry standard for vendor-neutral instrumentation. Metrics, logs, and traces flow through standard protocols to any compatible backend. But Kubernetes topology has remained a gap. Most vendors still require proprietary agents for full infrastructure visibility, even when they support OpenTelemetry for application telemetry.

We wanted to close this gap. Kubernetes topology should be observable through pure OpenTelemetry, with no proprietary components required.

OpenTelemetry Events for Kubernetes

OpenTelemetry includes support for Kubernetes through OTel Events. The OpenTelemetry Collector can receive Kubernetes events—pod creations, deployments, service updates, node status changes—and forward them as structured telemetry.

These events contain rich metadata: pod labels, namespace information, owner references, container statuses, resource requests and limits. Everything needed to build comprehensive topology views exists in standard Kubernetes events accessible through OTel.

Kloudfuse 3.5 leverages this capability. Deploy the OpenTelemetry Collector with Kubernetes receiver configuration. Events flow to Kloudfuse through standard OTLP protocol. No proprietary agents. No custom collectors. Just OTel doing what it's designed to do.

The architecture is straightforward. The OpenTelemetry Collector watches Kubernetes API events. It transforms these events into OTel semantic conventions. Events flow to Kloudfuse like any other telemetry. The platform processes them into queryable topology data.

Building Topology from Events

Kubernetes events alone don't provide topology. They're discrete notifications about state changes. Building a living map of your infrastructure requires correlating these events, maintaining entity relationships, and tracking state over time.

We built this correlation layer inside Kloudfuse. As Kubernetes events arrive, we extract entity information: pods, nodes, services, deployments, replica sets, namespaces. We parse owner references to build hierarchies: which deployment owns which replica set, which replica set owns which pods, which pods run on which nodes.

Label selectors define service-to-pod mappings. We track which services select which pods based on matching labels, building the service mesh topology automatically. Resource requests and limits provide capacity context. Health status from liveness and readiness probes indicates which pods are actually serving traffic.

This entity graph updates continuously as events arrive. Pod creations add new nodes. Deletions remove them. Status updates change health states. Deployments create new replica sets. The topology reflects current cluster state without polling Kubernetes APIs repeatedly.

Integrating with Distributed Tracing

Kubernetes topology becomes powerful when integrated with application observability. Which services are running on which pods? When a service experiences high latency, which specific pods are affected? When a node has CPU pressure, which applications are impacted?

Kloudfuse automatically correlates distributed traces with Kubernetes topology. APM spans include pod identifiers from standard Kubernetes environment variables. We map these identifiers to entities in the topology graph, connecting application behavior to infrastructure state.

When you investigate a slow service, the topology view shows which pods serve that service, their resource utilization, their health status, and their node placement. Drill into traces from specific pods. Correlate latency spikes with pod restarts or node issues. The infrastructure context enriches application observability automatically.

This correlation works because everything uses standard conventions. OpenTelemetry semantic conventions define how Kubernetes metadata appears in traces. OTel Events provide infrastructure topology. The standards enable integration without custom glue code.

Multi-Cluster Visibility

Modern deployments span multiple Kubernetes clusters. Production might use three clusters for high availability. Regional deployments might have clusters per geography. Development and staging run separate clusters.

Kloudfuse's topology visualization spans clusters automatically. Deploy OpenTelemetry Collectors in each cluster. Events from all clusters flow to Kloudfuse through OTLP. The topology view aggregates across clusters, showing which services run where.

Filter topology by cluster, namespace, or label selectors. Zoom into a specific cluster for detailed views. See cross-cluster service dependencies when services in different clusters communicate. The multi-cluster visibility emerges naturally from aggregating OTel events from multiple sources.

This matters for understanding blast radius during incidents. A service deployed across three clusters experiences issues in one. The topology view shows which cluster is affected, which pods are unhealthy, and whether other clusters are compensating. Engineers understand scope immediately without manual correlation.

Standards Enable Portability

Using pure OpenTelemetry for Kubernetes topology preserves instrumentation portability. Your OpenTelemetry Collector configuration works with any OTel-compatible backend. Switch from Kloudfuse to another vendor, and the same events flow to the new platform.

This matters for reducing vendor lock-in risk. Observability decisions are long-term commitments. Knowing you can switch if requirements change provides negotiating leverage and strategic flexibility. Standards-based instrumentation makes switching practical, not theoretical.

It also simplifies multi-vendor scenarios. Some organizations use different observability platforms for different purposes. Pure OTel means the same Kubernetes instrumentation feeds multiple backends simultaneously. No managing multiple proprietary agents with conflicting requirements.

The OpenTelemetry Collector configuration becomes portable infrastructure-as-code. Store it in Git repositories. Deploy it through standard Kubernetes manifests or Helm charts. Version control changes. Apply the same configuration across environments. Kubernetes topology instrumentation becomes infrastructure you manage like any other component.

Performance Considerations

Kubernetes events can be high volume in large clusters. Hundreds of pods scaling up and down. Rolling deployments creating and destroying replica sets. Frequent health check status changes. Naive implementations create event storms that overwhelm ingestion pipelines.

We optimized event processing through several mechanisms. Intelligent sampling reduces redundant events without losing critical state changes. Batching aggregates related events before transmission. State reconciliation periodically synchronizes full topology without processing every intermediate event.

The OpenTelemetry Collector configuration provides control over event volume. Filter by namespace to focus on production workloads. Sample health check updates while capturing all pod lifecycle events. Adjust batch sizes based on cluster scale and network capacity.

These optimizations mean Kubernetes topology visualization scales to large deployments without creating observability data explosions. Clusters with thousands of pods remain manageable because the event stream is intelligently filtered and sampled.

What This Enables

Visualizing Kubernetes topology using pure OpenTelemetry eliminates a major source of vendor lock-in. Your infrastructure instrumentation remains portable. You get comprehensive topology views, service-to-pod mapping, multi-cluster visibility, and integration with distributed tracing without proprietary agents.

For customers evaluating observability platforms, this reduces switching costs and strategic risk. For customers already using OpenTelemetry, it completes the standards-based observability stack. Infrastructure, application, and business telemetry all flow through open protocols.

Observability platforms should compete on capabilities, not lock-in. Pure OpenTelemetry for Kubernetes topology in Kloudfuse 3.5 makes that possible.