Your CloudWatch Logs Are Silently Eating Your AWS Budget — Here’s How to Stop It

Intermediate

Introduction: The $4,200 Surprise on My AWS Bill

Last quarter, I was reviewing our AWS bill for what I thought was a “medium-sized” production account — a handful of Lambda functions, an EKS cluster, some EC2 instances. Total monthly spend was around $12,000. Reasonable. Then I drilled into the line items and saw it: CloudWatch Logs — $4,200/month. That’s 35% of the entire bill, spent on logs nobody was reading.

If you’ve never specifically checked your CloudWatch Logs costs, I guarantee you’re overpaying. Most engineers know ingestion is $0.50/GB but have no idea about storage, API calls, or the fact that every log group defaults to never expire. This article walks you through exactly how CloudWatch pricing works, how to find your most expensive log groups, and five concrete fixes I used to cut that $4,200 down to $740.

This is for anyone running production workloads on AWS — especially teams using Lambda, EKS, or VPC Flow Logs.

How CloudWatch Logs Pricing Actually Works

Most engineers only know one number: ingestion costs $0.50/GB. That’s just the beginning. Here’s the full pricing breakdown for us-east-1:

Component Price
Log Ingestion (Collection) $0.50 per GB
Log Storage (Archival) $0.03 per GB/month
Logs Insights Queries $0.005 per GB scanned
Vended Logs (VPC Flow Logs, etc.) $0.25 per GB (first 10TB)

Here’s the trap: storage is cheap per GB, but it compounds monthly. If you ingest 100 GB/month with no retention policy, after 12 months you’re storing 1.2 TB and paying $36/month just in storage — on top of $50/month in ingestion. After 3 years? $108/month in storage for logs you’ll never touch. And vended logs (VPC Flow Logs, Route 53 query logs) have their own pricing tier that catches people off guard.

Why Costs Explode Silently

Here’s a real breakdown from the account that shocked me:

Source Monthly Ingestion Monthly Cost (Ingestion Only)
Lambda functions (47 functions) 1,800 GB $900
EKS control plane logs 950 GB $475
VPC Flow Logs (3 VPCs) 3,200 GB $800 (vended rate)
Application logs (ECS services) 600 GB $300
Accumulated storage (2+ years) 18 TB stored $540
Logs Insights queries (team of 12) ~37 TB scanned/month $185
Total ~$3,200

The biggest offenders: Lambda functions logging full request/response bodies on every invocation, EKS control plane with all five log types enabled (api, audit, authenticator, controllerManager, scheduler), and VPC Flow Logs shipping every ACCEPT and REJECT to CloudWatch instead of S3. Nobody configured retention on any of these — logs from two years ago were still sitting there, accumulating storage charges.

Finding Your Most Expensive Log Groups

Before fixing anything, you need visibility. Use Cost Explorer first: go to AWS Cost Explorer → Filter by Service: CloudWatch → Group by: Usage Type. Look for usage types containing DataProcessing-Bytes (ingestion) and TimedStorage-ByteHrs (storage).

Then find the specific log groups burning money. This CLI command lists all log groups sorted by stored bytes:

aws logs describe-log-groups \
  --query 'logGroups[*].[logGroupName,storedBytes,retentionInDays]' \
  --output table | sort -t'|' -k3 -rn | head -20

For a cleaner view with human-readable sizes:

aws logs describe-log-groups \
  --query 'logGroups[*].[logGroupName,storedBytes,retentionInDays]' \
  --output json | jq -r '.[] | "\(.[0])\t\((.[1] // 0) / 1073741824 * 100 | round / 100) GB\t\(.[2] // "Never expires")"' | sort -t$'\t' -k2 -rn | head -20

Expected output looks like:

/aws/eks/prod-cluster/cluster    892.34 GB    Never expires
/aws/lambda/order-processor      445.12 GB    Never expires
/aws/vpc-flow-logs/vpc-0a1b2c   1203.87 GB   Never expires
/ecs/prod-api-service            234.56 GB    Never expires

Every “Never expires” is money burning. Let’s fix that.

Fix 1: Set Log Retention Policies in Bulk

The single highest-impact change. Most log groups have no retention policy, meaning logs are kept forever. Set retention to match how long you actually query logs — for most teams, that’s 14 to 30 days for application logs and 90 days for audit logs.

Set retention on a single log group:

aws logs put-retention-policy \
  --log-group-name "/aws/lambda/order-processor" \
  --retention-in-days 14

Set 30-day retention on ALL log groups that currently have no retention policy:

aws logs describe-log-groups \
  --query 'logGroups[?!retentionInDays].logGroupName' \
  --output text | tr '\t' '\n' | while read group; do
  echo "Setting 30-day retention on: $group"
  aws logs put-retention-policy \
    --log-group-name "$group" \
    --retention-in-days 30
done

Valid retention values: 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1096, 1827, 2192, 2557, 2922, 3288, 3653. For Lambda and EKS logs, 14 days is usually plenty. For compliance-related logs, export to S3 first (Fix 5), then set retention.

Fix 2: Filter and Reduce What Gets Logged

Stop logging noise at the source. For Lambda, avoid logging full event payloads — use Lambda Advanced Logging Controls to set log level at the function level:

aws lambda update-function-configuration \
  --function-name order-processor \
  --logging-config '{"LogLevel": "WARN", "LogFormat": "JSON"}'

For EKS, you probably don’t need all five control plane log types. Disable the noisy ones:

aws eks update-cluster-config \
  --name prod-cluster \
  --logging '{"clusterLogging":[{"types":["api","authenticator"],"enabled":true},{"types":["audit","controllerManager","scheduler"],"enabled":false}]}'

The audit log type alone can generate hundreds of GBs in busy clusters. Keep api and authenticator for security, disable the rest unless you’re actively debugging.

Fix 3: Route High-Volume Logs to S3 Instead of CloudWatch

VPC Flow Logs are the biggest offender here. Sending them to CloudWatch costs $0.25/GB (vended log rate). Sending them directly to S3 costs $0.25/GB for the first 10 TB as well — but storage in S3 is $0.023/GB/month versus $0.03/GB in CloudWatch, and you can query them with Athena at $5/TB scanned (with columnar format optimization bringing real costs much lower).

Create VPC Flow Logs targeting S3 instead of CloudWatch:

aws ec2 create-flow-logs \
  --resource-type VPC \
  --resource-ids vpc-0a1b2c3d4e5f \
  --traffic-type ALL \
  --log-destination-type s3 \
  --log-destination "arn:aws:s3:::my-flow-logs-bucket/vpc-flow-logs/" \
  --max-aggregation-interval 600 \
  --destination-options '{"FileFormat":"parquet","HiveCompatiblePartitions":true,"PerHourPartition":true}'

Using Parquet format is critical — it compresses ~10x compared to plain text, and Athena scans far less data. A 3.2 TB/month VPC Flow Log in CloudWatch becomes roughly 320 GB in Parquet on S3, queryable with Athena for pennies.

Fix 4: Use CloudWatch Logs Insights Efficiently

Logs Insights charges $0.005 per GB scanned. Sounds cheap until your team of 12 runs broad queries across multi-TB log groups. Three rules:

1. Always constrain the time range. A 1-hour window on a 100 GB/day log group scans ~4 GB. A 7-day window scans 700 GB ($3.50 per query).

2. Use filter early in your query. Logs Insights processes commands in order:

# Expensive: scans everything, then filters
fields @timestamp, @message | filter @message like /ERROR/

# Cheaper: filter first, reduces data processed downstream
filter @message like /ERROR/ | fields @timestamp, @message

3. Query specific log groups, not wildcards. Querying /aws/lambda/* across 47 functions multiplies your scanned data. Target the specific function you’re debugging.

Check how much your team is spending on Logs Insights with this CloudWatch metric:

aws cloudwatch get-metric-statistics \
  --namespace "AWS/Logs" \
  --metric-name "IncomingBytes" \
  --start-time "$(date -u -d '30 days ago' +%Y-%m-%dT%H:%M:%S)" \
  --end-time "$(date -u +%Y-%m-%dT%H:%M:%S)" \
  --period 2592000 \
  --statistics Sum \
  --dimensions Name=LogGroupName,Value=/aws/lambda/order-processor

Fix 5: Export Old Logs to S3 Glacier

For compliance requirements where you must retain logs for years, export to S3 and transition to Glacier. First, create an export task:

aws logs create-export-task \
  --task-name "export-old-lambda-logs" \
  --log-group-name "/aws/lambda/order-processor" \
  --from 1672531200000 \
  --to 1688169600000 \
  --destination "my-log-archive-bucket" \
  --destination-prefix "cloudwatch-exports/lambda/order-processor"

Then set an S3 lifecycle rule to transition to Glacier:

aws s3api put-bucket-lifecycle-configuration \
  --bucket my-log-archive-bucket \
  --lifecycle-configuration '{
    "Rules": [{
      "ID": "ArchiveOldLogs",
      "Filter": {"Prefix": "cloudwatch-exports/"},
      "Status": "Enabled",
      "Transitions": [
        {"Days": 30, "StorageClass": "GLACIER"},
        {"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
      ]
    }]
  }'

S3 Glacier storage costs $0.004/GB/month — that’s 7.5x cheaper than CloudWatch storage. Glacier Deep Archive drops to $0.00099/GB/month. After exporting, set retention on the CloudWatch log group so the originals get cleaned up.

Note: create-export-task only runs one task at a time per account. For bulk exports, script it with waits between tasks, or use CloudWatch Logs subscription filters with a Kinesis Firehose delivery stream for continuous export.

Common Mistakes Engineers Make

1. Leaving Lambda log retention at “Never Expire.” Every Lambda function auto-creates a /aws/lambda/<function-name> log group with no retention. If you deploy with CloudFormation or Terraform, explicitly set retention in your IaC templates — don’t rely on manual fixes.

2. Enabling all EKS control plane log types “just in case.” The audit and scheduler log types are extremely verbose. One busy EKS cluster can generate 30+ GB/day from audit logs alone — that’s $450/month in ingestion for one cluster.

3. Not using Cost Allocation Tags on log groups. Without tags, you can’t attribute CloudWatch costs to specific teams or services in Cost Explorer. Tag your log groups.

4. Querying CloudWatch Logs Insights with wide time ranges during incidents. Panic-querying 30 days of logs across multiple groups can easily cost $50-100 per query session. Narrow your time range first.

5. Ignoring vended log pricing. VPC Flow Logs, Route 53 resolver query logs, and API Gateway access logs use vended pricing. They’re cheaper per GB on ingestion ($0.25 vs $0.50), but the volume is typically massive — making them the top cost driver.

Conclusion

CloudWatch Logs is one of the most overlooked cost drivers in AWS, primarily because teams enable logging everywhere (good practice) but never configure retention or optimize delivery (expensive practice). The five fixes above took me a single afternoon to implement and dropped our monthly CloudWatch bill from $4,200 to $740.

  • Set retention policies on every log group — the default “never expire” is the single biggest waste.
  • Route high-volume logs (VPC Flow Logs, EKS audit) to S3 in Parquet format — cheaper storage, cheaper queries.
  • Use Lambda Advanced Logging Controls to reduce log verbosity at the source.
  • Constrain Logs Insights queries by time range and use early filters to minimize GB scanned.
  • Audit your log groups monthly — run the CLI commands above and make it part of your cost review.

Found this helpful? Share it with your team. For more practical AWS and DevOps guides, visit riseofcloud.com.

Let’s keep learning consistently at a medium pace.

Leave a Comment

Your email address will not be published. Required fields are marked *