How to Architect Observability Storage with EBS Without Hitting AWS Limits
Published on
Jun 12, 2025
Table of Contents
Introduction
Observability tools rely heavily on persistent storage to retain logs, metrics, and traces over time as we at Kloudfuse call it MELT data. A large part of observability is the storage of this data, which needs to be reliably accessible for analytics, alerting, and troubleshooting.
Companies on average store terabytes of MELT data daily, according to industry benchmarks. Some large organizations ingest and store over 10TB of data per day.
MELT can be archived to several backends such as Amazon S3, Glacier, or AWS EBS. While S3 is ideal for long-term low-cost storage, our architecture with Apache Pinot recommends that our customers use AWS EBS for following reasons:
Faster querying times when reloading data.
Tighter integration with EC2-based analytics workloads.
Simpler IAM and VPC configurations in certain architectures.
To properly configure such setups and to avoid hitting AWS limitations during volume adjustments, we've created this quick guide.
What is Amazon Elastic Block Store?
Amazon Elastic Block Store (EBS) is a critical component for storing persistent data in the AWS ecosystem. As applications grow and evolve, the need to resize EBS volumes becomes a common operational task. However, AWS places some constraints around how often and how much you can resize a volume. In this blog, we'll dive into the hard limits you need to be aware of when modifying EBS volumes.
Why Resize an EBS Volume?
Before we jump into the limits, let’s quickly recap why you might resize an EBS volume:
Scaling storage capacity to accommodate growing data.
Changing volume types (e.g., gp2 to gp3) to optimize cost or performance.
Increasing IOPS or throughput for performance-sensitive workloads.
Hard Limit #1: Maximum Volume Size
As referenced in the Amazon EBS Volume Types, different EBS volume types have different maximum sizes:
General Purpose SSD (gp2, gp3): Up to 16 TiB
Provisioned IOPS SSD (io1, io2): Up to 64 TiB
Throughput Optimized HDD (st1) and Cold HDD (sc1): Up to 16 TiB
You can increase the volume size in 1 GiB increments, but you cannot reduce the size of an existing volume. If you need to shrink a volume, you must create a new one and migrate your data.
Hard Limit #2: Resize Frequency
Here’s where things get interesting when you start modifying the EBS volumes. AWS allows a maximum of 8 modification requests per volume within a 6-hour window.
This includes all modification types:
Size increases
Volume type changes
IOPS or throughput adjustments
Once you hit this limit, AWS will block additional requests with an error like:
“Volume modification limit exceeded. You can modify this volume up to 8 times in a 6-hour period.”
This is a hard limit and cannot be increased via AWS Support.
Best Practices
To avoid hitting these limits, keep these tips in mind:
Batch your modifications where possible. Plan ahead to minimize the number of changes.
Use threshold and forecast alerts to proactively monitor volume performance.
If you're experimenting or testing, use disposable volumes instead of modifying production volumes repeatedly.
What to Do If You Hit the Limit
If you exceed the limit, your only option is to wait until 6 hours have passed since the first of the 8 modifications. In the meantime, here are some steps and strategies you can use:
Step 1: Monitor the volume's performance:
Go to the EC2 Dashboard → Elastic Block Store → Volumes → Select the volume → Click Monitoring tab.
Alternatively, go to CloudWatch → Metrics → EBS to get detailed insights.
Key metrics to monitor:
VolumeReadOps / VolumeWriteOps: IOPS utilization.
VolumeQueueLength: Number of pending I/O requests.
VolumeThroughputPercentage / VolumeConsumedReadWriteOps: Useful for gp3 volumes.
BurstBalance: For burstable volume types like gp2, shows remaining burst credits.
Set up CloudWatch Alarms to get notified when thresholds are breached (e.g., high queue length or low burst balance).
Optionally, use CloudWatch Dashboards to visualize trends over time and correlate performance with application behavior.
Step 2: Use Snapshots to Create a New Volume:
For EC2 Instances
Navigate to EC2 → Elastic Block Store → Volumes.
Select the volume you want to replace and choose Actions → Create Snapshot.
Provide a name and description, then click Create Snapshot.
Once the snapshot is complete (EC2 → Snapshots → Status: completed), create a new volume:
Go to Snapshots → Select Snapshot → Actions → Create Volume.
Choose the size, type (e.g., gp3), and availability zone.
Detach the old volume (Volumes → Select → Actions → Detach Volume) after stopping the instance or ensuring it's safe to do so.
Attach the new volume (Volumes → Select → Actions → Attach Volume) using the original device name (e.g., /dev/xvdf).
On the EC2 instance, run sudo growpart and resize2fs to expand the filesystem.
For EKS (Elastic Kubernetes Service):
If the volume is managed by a PersistentVolumeClaim (PVC), the process is a bit different: Identify the EBS volume ID associated with the PVC:
kubectl get pvc <your-pvc-name> -n <namespace> -o jsonpath='{.spec.volumeName}'Use the AWS CLI or Console to snapshot that volume.
Create a new EBS volume from the snapshot with the desired size.
Create a new Kubernetes PersistentVolume (PV) object that references the new EBS volume:
apiVersion: v1
kind: PersistentVolume
metadata:
name: new-ebs-pv
spec:
capacity:
storage: 200Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: gp3
awsElasticBlockStore:
volumeID: <new-volume-id>
fsType: ext4
Update or create a new PVC that binds to the new PV.
Restart your Pod with the new PVC. If using StatefulSets, update the PVC reference in the volumeClaimTemplates or scale down and recreate the StatefulSet accordingly.
Verify volume mounting and filesystem size from inside the Pod.
This method helps bypass the modification limit and provides greater control over how the new volume is integrated into your workloads.
Step 3: Plan and batch future modifications:
Instead of resizing or modifying one parameter at a time, combine multiple changes in a single modification request.
For example, increase the volume size and switch from gp2 to gp3 in the same operation.
Document your intended changes and validate them in a test environment before applying to production.
Limit on-the-fly experimentation, especially on live workloads.
Step 4: Use EBS Elastic Volumes (if applicable):
Elastic Volumes allow live resizing and configuration changes with minimal downtime.
Ensure your EC2 instance type supports this feature (most modern instances do).
Supported operating systems include Amazon Linux 2, Ubuntu 18.04+, and RHEL 7+.
To use it, modify the volume via the EC2 console or CLI and changes will be automatically applied.
Still subject to the 8 changes per 6-hour limit, but offers more flexibility during those changes.
Step 5: Consider temporary scaling alternatives:
Add an additional EBS volume temporarily and mount it to your instance.
Spread data across volumes using logical volume managers (LVM) or application-level sharding.
Move non-critical or infrequently accessed data to S3 or other storage tiers.
Use in-memory caching (e.g., Redis, Memcached) to reduce storage I/O demand.
Step 6: Document your changes:
Keep a change log that includes timestamp, requested modification, reason, and outcome.
Track modification attempts to avoid hitting limits unintentionally.
Use tags on volumes to note when they were last modified or resized.
Consider creating an automated script or CloudWatch Log Insights query to detect frequent changes.
These steps can help you stay operational and plan effectively even under AWS's strict modification limits.
Final Thoughts
AWS EBS is a powerful and flexible storage option, but understanding its limitations is key to maintaining performance and availability. The 8-request-per-6-hour rule might seem strict, but it encourages thoughtful planning and resource management.
With Kloudfuse, you keep observability fully inside your VPC, no data leaves your environment. We don’t just show you what’s happening; we help you spot architectural weaknesses early, like service bottlenecks or scaling gaps, so you can stay ahead of production incidents.