Migrating Observability from New Relic to Kloudfuse
A 5-step migration guide
Published on
Table of Contents
Many organizations face rising costs with legacy observability platforms like New Relic, especially as their microservices scale. Kloudfuse is a unified observability platform built on open standards (OpenTelemetry, PromQL, Grafana) with a predictable pricing model that does not charge for overages or additional seats.
This guide walks you through migrating your existing New Relic (NR) workloads to Kloudfuse in five structured steps. We cover inventorying your current NR usage, choosing a migration strategy, and migrating traces/APM, logs, metrics, followed by alerts/dashboards, and final cutover. Along the way, we compare NR vs Kloudfuse on cost, retention, and openness, and highlight tips, config examples, and known limitations.
Why Migrate to Kloudfuse?
New Relic paved the way for full-stack observability, but modern engineering teams now face new challenges such as data volume growth, tool sprawl, and rising costs. Kloudfuse takes a fresh approach, and here’s where it stands out:
1. Unified Data Lake for All Telemetry: Unlike New Relic’s siloed architecture (separate backends for metrics, logs, traces), Kloudfuse uses a single storage and query layer across all telemetry types. This means no more reconciling timelines or stitching context across tools. A single query can traverse traces and logs without data transformation or external joins.
2. Open Standards and Source-Based Flexibility: Kloudfuse is built around OpenTelemetry from the ground up, not layered onto it. That means easier instrumentation, better portability, and no proprietary agents. It also avoids the need for a proprietary query language, instead supporting popular query languages (i.e., PromQL, LogQL, TraceQL, SQL, GraphQL) you're already familiar with. . You control what you collect and how it’s enriched, reducing vendor lock-in and enabling deeper integration into your CI/CD pipelines.
3. Cost Predictability at Scale: New Relic’s pricing often scales with data volume and retention, forcing teams to choose between visibility and budget. Kloudfuse takes a source-deduplication-first approach and optimizes storage upstream. The result is predictable costs even as your observability footprint grows, without aggressive sampling or restrictive retention policies.
Step-by-Step Migration Plan
Migrating from New Relic doesn’t have to be overwhelming. Breaking things down into a few focused steps helps you move forward with confidence and clarity. We’ll begin by taking stock of what’s currently hooked into New Relic, then look at a few different ways you can approach the migration depending on your setup and goals.
Inventory Existing NR Integrations
Before making any changes, it’s important to understand how each service currently uses New Relic. Begin by cataloging how each service currently uses New Relic. For every application or microservice, list:
APM Agents: For each application or microservice, check if it uses a New Relic APM agent (Java, Node.js, etc.). Look for newrelic.agent initialization in the code or configuration files.
Distributed Tracing: Verify if distributed tracing is enabled. New Relic supports the W3C trace context, so check for that or any custom tracing setup.
Custom Metrics: Identify any custom metrics being sent. Search for code using newrelic.recordMetric() or scripts calling the New Relic REST API.
Logs: Find out how logs are forwarded to New Relic. Common tools include Fluent Bit, Fluentd, or Filebeat for sending container or host logs.
Alerts & Dashboards: Review dashboards and alert policies. Use the New Relic UI or NerdGraph API to list dashboards and alerts tagged with your service names.
Aim for a per-service inventory and create a summary table, for example:
Service | Agent | Tracing | Custom Metrics | Log Forwarding | Dashboards |
Auth Service | Node.js vX | Enabled | Yes (via REST API) | Fluent Bit | 3 Dashboards |
This “assessment of current state” helps identify what to migrate. In practice, you may need to scan code repos, deployment manifests, and NR’s own APIs. Document key metrics/traces to preserve. For each integration, note its purpose and any special settings like API keys or custom tags.
Practical tip: Engage stakeholders and cross-functional teams to ensure effective collaboration. Involving SREs, devs, and ops ensures no data source is missed. Break the inventory work into manageable parts and use scripts where possible. Treat this like an audit: list every telemetry flow and any “hidden” data pipelines. Only once you have a complete picture (like in phase 1 of an observability migration) can you plan the next steps effectively.
Choose Your Migration Strategy
There are two main approaches: Parallel Shipping (Dual-Run) or Cutover (Big Bang).
Parallel Shipping (Dual-Run): In this strategy, you configure each service to send data to both New Relic and Kloudfuse simultaneously. You gradually validate in parallel, then cut off NR when ready.
Pros: Low risk. You can compare data in both systems to ensure parity, test dashboards and alerts with real traffic, and roll back easily if something breaks. Running both systems in parallel provides an ideal setup to ensure your dashboards and alerts are functioning correctly.
Cons: Ingesting twice doubles ingestion costs (temporarily) and adds some overhead to your applications. Services may experience increased network traffic and a slight performance impact while dual-writing. You also need to keep New Relic running until the migration is complete.
Phased migration and rollout: You can migrate only some streams like Metrics first followed by Logs or APM. Also you can migrate only certain environments first (Dev/QA followed by parts of prod environments) or certain applications.
Pros: Phased migration builds user confidence over time. Lower risk streams and environments can be migrated first to manage risks.
Cons: Takes longer, leading to resources and personnel engaged in the migration for a longer period. Can lead to additional effort on support and reliability teams as two systems need to be maintained instead of one.
Cutover (Big Bang): Switch completely from NR to Kloudfuse at a chosen point (e.g., deploy OTel agents that point only to Kloudfuse, then disable NR agents).
Pros: Simpler setup in the short term (no dual publishing), no duplicate data ingestion costs, and once switched, you have a single system of truth.
Cons: Higher risk. If something is misconfigured, you might temporarily lose visibility or alerts. There is no easy fallback to NR without reconfiguration. You must be highly confident in the migration logic. Otherwise, schedule the cutover during low-traffic windows to mitigate risk.
If you have dozens of services and need assurance, parallel shipping is safer. It’s often recommended for large-scale migrations. Use it if you want a gradual roll-out or if risk tolerance is low.
If your setup is small (e.g., a handful of services) or if cost/time constraints demand a faster switch, you might opt for cutover. In practice, many teams employ a hybrid/phased approach: run a subset of non-critical services in parallel first (a pilot), then cut them over, building confidence before migrating the rest.
In summary, parallel migration ensures correctness, while cutover is quicker but riskier. Document your strategy and communicate it. Adjust based on scale and operational constraints.
Migration by Stream
This section covers the migration of each telemetry type. In all cases, Kloudfuse supports open standards (OpenTelemetry, PromQL, etc.), so plan to replace vendor-specific plugins with OTel/Prometheus where possible.
Traces / APM
For distributed tracing and APM, the goal is to replace New Relic agents/SDKs with OpenTelemetry SDKs and exporters. Kloudfuse can ingest OTLP traces directly. For Node.js/JavaScript services, you might:
Install OpenTelemetry SDK: In each service, remove or disable the New Relic agent. Instead, install OTel packages, e.g.:
|
Initialize instrumentation: Add code on service startup to configure tracing. For example, in a Node.js service, you can use the OTel NodeSDK:
|
This snippet uses HTTP and Express instrumentations; adjust for your framework. The key part is pointing the OTLP exporter to Kloudfuse’s trace ingester (/ingester/otlp/traces) and using your API key or token.
Trace context propagation: OTel automatically injects W3C Trace Context headers in outgoing HTTP or gRPC calls, and extracts them on the receiving end (via instrumentations). This preserves end-to-end traces across services. No extra work is needed if you use the standard OTel instrumentations. Ensure that all services use compatible propagators (the default W3CTraceContext is common).
Deploy gradually: Start with one service at a time. After instrumenting it, deploy it to staging or production with dual shipping (if performing a parallel migration) and confirm traces appear in Kloudfuse. In Kloudfuse’s UI, you should see your service name and operations. Once verified, proceed service-by-service. It’s often wise to “start small” with a non-critical service first, learn from it, then migrate core services.
If rewriting code is too involved, Kloudfuse can even ingest data from existing New Relic agents. You can repoint a New Relic agent by setting its host to Kloudfuse:
|
This is a temporary bridge but retains the proprietary agent. It’s not preferred in the long term, as it’d keep vendor lock-in, but it can speed up initial migration tests. Note that currently only specific streams are supported for this method, and other streams need to be disabled on the agent side.
Logs
To start sending your logs to Kloudfuse instead of New Relic, you'll need to update your log shipping tool, usually Fluent Bit or Fluentd. Kloudfuse works with both and provides HTTP endpoints to receive log data.
Using Fluent Bit
If you're using Fluent Bit, you’ll need to update the configuration file, usually fluent-bit.conf or a Helm values.yaml file if you're running it in Kubernetes.
The basic idea is to direct Fluent Bit to send log data over HTTPS to Kloudfuse. Kloudfuse expects logs in JSON format, and the endpoint usually looks like this.
|
Replace with the correct DNS or IP address of your Kloudfuse stack. You can also narrow the Match rule to just your app logs if needed. If you're working in an internal network without TLS, you can switch the port to 80 and set TLS Off.
Using Fluentd
Fluentd works similarly. You'll just add or update a block in your configuration file:
|
Again, use https:// and port 443 if TLS is enabled. You can tune the buffer size and flush intervals to fit your traffic volume.
Enriching Your Logs
Adding helpful metadata to your logs makes them easier to search and analyze in Kloudfuse: Kubernetes metadata: If you're running in Kubernetes, use Fluent Bit or Fluentd’s built-in filters to add pod, container, and namespace info. For Fluent Bit, that means enabling the kubernetes filter; for Fluentd, use the kubernetes_metadata plugin.
Static tags: You can add custom tags like service names or environments directly in your configuration. These show up as searchable fields in the Kloudfuse UI.
Message field: Kloudfuse expects a message or log field in each record. If your logs use a different format, you may need to tweak parsing on Kloudfuse’s side, but most JSON logs work without changes.
Final Steps
Once your config is ready, test it by deploying Fluent Bit or Fluentd to a single node first. Check the Kloudfuse UI to confirm logs are flowing and metadata appears as expected. Once it’s working, you can safely roll it out to your whole cluster.
Metrics
There are two main types of metrics you'll want to migrate:
Infrastructure metrics (e.g., from Prometheus or cloud providers)
Custom application metrics (e.g., created via New Relic SDKs or APIs)
Prometheus metrics
If you use Prometheus to scrape metrics (common in Kubernetes), the simplest migration is to enable remote write to Kloudfuse. Edit your Prometheus config (prometheus.yml) or Helm values:
|
Replace <KLOUDFUSE_ADDRESS> with your actual Kloudfuse ingress IP or domain.
Use https:// if your ingress is secured with TLS.
If you’re using kube-prometheus-stack, update its Helm values.yaml to include a similar remoteWrite entry.
After restarting Prometheus, metric samples will stream into Kloudfuse. From there, you can use PromQL in the Kloudfuse UI to recreate dashboards and alerts.
Custom Application Metrics
If you used New Relic’s SDK or REST API for custom metrics, you now have a few options depending on your setup:
Option 1: Use OpenTelemetry Metrics API
Replace New Relic SDK calls with OpenTelemetry instrumentation. For example, in Node.js:
|
Ensure that you configure an OTLP exporter to send metrics to Kloudfuse (e.g., via the OpenTelemetry Collector or in-app SDK configuration). If you have already set this up for tracing, you can often reuse the same pipeline.
Option 2: Emit in Prometheus Format
If refactoring code is difficult, expose your metrics using Prometheus client libraries:
prom-client in Node.js
Spring Boot Actuator
Go to the client’s /metrics endpoint
Then either:
Let Prometheus scrape them and remote-write to Kloudfuse, or
Use a Grafana Agent to scrape and forward metrics directly.
Kloudfuse accepts metrics from both Prometheus and the Grafana Agent.
Option 3: Repoint Custom Metric APIs (If Applicable)
If your app used New Relic's REST API to send metrics directly, you’ll need to replace those calls. Kloudfuse does not accept New Relic’s proprietary format, so you must:
Emit metrics using the OpenTelemetry format, or
Expose them in Prometheus format as described above.
There is currently no direct drop-in REST endpoint for New Relic's custom metric format.
Alerts & Dashboards
Once your metrics and logs are flowing into Kloudfuse, it's time to set up your dashboards and alerts.
Dashboards
Export your New Relic dashboards (e.g., via API or manually) and recreate them in Kloudfuse. If you use Grafana, Kloudfuse includes it as well, allowing you to recreate dashboards using the same panels and PromQL queries. Here’s how:
Migration Steps:
Export from New Relic: Use New Relic's UI or API to export your existing dashboards.
Import into Kloudfuse: Use Kloudfuse's provided Python scripts (dashboard.py) to upload your dashboards into Grafana within Kloudfuse. These scripts support uploading single dashboards or entire directories.
Adjust Queries: Since Kloudfuse uses PromQL, you'll need to convert any NRQL queries from New Relic to PromQL. Kloudfuse supports PromQL in its Metrics Explorer and dashboards.
Alerts
In Kloudfuse, alerts are defined as rules based on metrics and logs, and use PromQL-style conditions. You’ll need to create equivalent alert rules (thresholds, spike detection, anomaly detection, etc.) in Kloudfuse. Here are the steps:
Migration Steps:
Define Alert Rules: Use Kloudfuse's alerting interface to create new alert rules based on your metrics and logs. You can set conditions using PromQL-style queries.
Test Alerts: Temporarily lowering alert thresholds helps verify that alerts fire as expected. One migration case study temporarily lowered threshold values to force alerts and confirm they were working properly.
Set Up Notification Channels: Configure your preferred notification methods (e.g., email, Slack, PagerDuty) in Kloudfuse. This ensures you receive timely alerts through your chosen channels.
At this stage, major metric pipelines have been redirected, and the system is ready for final testing.
Final Testing & Cutover
Before fully decommissioning New Relic, perform thorough testing under realistic conditions:
Simulate production scenarios: Generate traffic and data as normal. For example, run load tests or use canaries to produce traces, logs, and metrics in all services. Monitor Kloudfuse to ensure nothing is missing or delayed.
Data integrity checks: Compare key metrics and traces between NR and Kloudfuse during the parallel period. Run sample queries in both systems (e.g., request rate, error rate, tail latency) and confirm the numbers match within reason. Look for any data gaps or discrepancies.
Alert verification: Confirm that alerts fire as expected. As described earlier, intentionally breach thresholds (or lower them) to trigger alerts, and check that Kloudfuse notifies your channels. Use this to tune any misconfigurations.
Retention and performance: Check that Kloudfuse’s retention policies and query performance meet your needs. Kloudfuse is designed for high-cardinality data at low cost, but verify that queries (e.g., complex logs or high-frequency metrics) are fast enough. Adjust retention settings if necessary.
Document and rollback plan: Ensure you have a rollback plan so you can quickly switch back to New Relic if anything goes wrong right after cutover.
Once confident, execute the cutover. Here are the steps:
Redirect All Remaining Data Sources: Turn off or redirect all remaining data sources so they only send data to Kloudfuse. For example, remove the New Relic agent or change the NEW_RELIC_HOST setting away from your apps.
Monitor for Errors and Misconfigurations: Monitor Kloudfuse closely for any errors (e.g., agent misconfigured). Keep New Relic running but inactive for a short time as a backup.
Final Configuration Cleanup: Update any final configurations (such as alert endpoints) to remove NR references.
Decommission New Relic: When stable (typically a few days of observation), you can decommission the New Relic accounts, cancel licenses, etc.
Communicate clearly with your teams throughout this phase. Cutover can be a significant change, so confirm that everyone agrees that monitoring is sound.
These steps can help you confidently move from New Relic to Kloudfuse. The result will be a unified, vendor-neutral observability stack that controls costs and gives you ownership of your telemetry data.
Conclusion
Migrating observability workloads is a strategic opportunity to modernize your telemetry stack. While New Relic has been a reliable choice, evolving needs around scale, cost control, and open standards often demand a more adaptable solution.
Kloudfuse addresses modern observability needs with a unified architecture that’s native to OpenTelemetry and a cost model built for long-term sustainability.
In this article, we started by taking inventory of our existing telemetry. This included APM agents, custom metrics, logs, and distributed traces. Next, we selected a migration strategy. You can either choose parallel shipping to reduce risk or a phased cutover for a faster transition.
Each type of telemetry had its own migration path. For traces, we replaced New Relic agents with OpenTelemetry SDKs. For logs, we reconfigured Fluent Bit. For metrics, we rerouted Prometheus or custom pipelines. We also rebuilt alerts and dashboards to keep everything running smoothly. Finally, we tested the setup thoroughly before completing the full cutover.