Highly Available, Cost-Optimized Auto Scaling with Spot and On-Demand Instances

By Khushi Carpenter, Piyush Jalan / May 06, 2025

Contents

Introduction 

AWS EC2 Spot Instances can reduce cloud bills by up to 90%, though this comes with the risk of potential interruptions. A predictive auto-scaling strategy can deliver these savings without introducing downtime.

This approach monitors Spot market conditions and pre-emptively switches to On-Demand EC2 when interruption risk rises—providing:

  • Maximum cost savings
  • Zero downtime
  • Fully automated logic

Solution Overview

This solution uses a serverless AWS Lambda function to:

  • Monitor Spot instance pricing via ec2:DescribeSpotPriceHistory
  • Dynamically adjust Auto Scaling Group (ASG) instance mix
  • Optionally send alerts (e.g., Slack) when switching occurs

Key Architecture Features:

  • Mixed Instances Policy in the ASG
  • Lambda-based Spot price monitor
  • Scheduled execution using EventBridge
  • Slack notifications for visibility

Outcome: A self-adjusting compute layer that ensures uptime while optimizing cost.

Use Case Example

1. Create a Mixed Instances Auto Scaling Group (ASG)

Launch Template

The following configurations are essential when creating the launch template:

  • For Instance Type, select ‘Don’t Include in the Template’
  • For Purchasing Type, select ‘Don’t Include in the Template’

Auto Scaling Group

To set up an Auto Scaling Group with a mix of Spot and On-Demand instances, use the following steps:

  • Go to AWS Console, search for EC2 and select Auto Scaling Group.
  • Select create and choose options as below:
  • Click on next, and configure following Instance Options: 
  • The On-Demand base is set to 0 as to run the workload mostly on Spot Instances.
  • Click on next, and configure following Network Options:
  • Click on next, and configure following Scaling Options:

2. Build the Lambda Spot Watcher

A Python-based Lambda function performs the following:

  • Retrieves recent Spot pricing
  • Compares against a configurable PRICE_THRESHOLD
  • If pricing exceeds the threshold → switch to 100% On-Demand
  • If pricing is stable → revert to 50/50 Spot-On-Demand split
IAM Permissions
	
    	{
		  "Effect": "Allow",
		  "Action": [
		    "ec2:DescribeSpotPriceHistory",
		    "autoscaling:UpdateAutoScalingGroup",
		    "autoscaling:DescribeAutoScalingGroups"
		  ],
		  "Resource": "*"
		}
			
Lambda Logic (Highlights)
	
	    	if spot_price > threshold:
			    update_asg(100)  # All On-Demand
			else:
			    update_asg(50)   # Half Spot, Half On-Demand
	    
		

Slack Webhook integration is optional for alerting.

Lambda Environment Variables

In the Lambda console, go to Lambda Function, Configurations and Environment Variables:

- INSTANCE_TYPE = t3.medium

- ASG_NAME = your-asg-name

- PRICE_THRESHOLD = 0.05

- AZ = us-east-1a

- SLACK_WEBHOOK = (optional)

Lambda Code

		import boto3
import datetime
import os
import requests

# AWS clients
ec2 = boto3.client('ec2')
asg = boto3.client('autoscaling')

# Configuration from environment variables
INSTANCE_TYPE = os.getenv("INSTANCE_TYPE", "t3.medium")
ASG_NAME = os.getenv("ASG_NAME", "example-asg")
PRICE_THRESHOLD = float(os.getenv("PRICE_THRESHOLD", "0.05"))
AVAILABILITY_ZONE = os.getenv("AZ", "us-east-1a")
SLACK_WEBHOOK = os.getenv("SLACK_WEBHOOK")

def get_latest_spot_price():
    """Fetches the latest Spot price for the specified instance type and AZ."""
    now = datetime.datetime.utcnow()
    prices = ec2.describe_spot_price_history(
        InstanceTypes=[INSTANCE_TYPE],
        ProductDescriptions=['Linux/UNIX'],
        StartTime=now - datetime.timedelta(minutes=15),
        EndTime=now,
        AvailabilityZone=AVAILABILITY_ZONE,
        MaxResults=1
    )
    if prices['SpotPriceHistory']:
        return float(prices['SpotPriceHistory'][0]['SpotPrice'])
    else:
        raise Exception("No Spot price history available.")

def update_asg(on_demand_percent):
    """Updates the ASG with a new On-Demand percentage."""
    response = asg.update_auto_scaling_group(
        AutoScalingGroupName=ASG_NAME,
        MixedInstancesPolicy={
            'InstancesDistribution': {
                'OnDemandPercentageAboveBaseCapacity': on_demand_percent,
                'SpotAllocationStrategy': 'capacity-optimized'
            }
        }
    )
    return response

def send_alert(message):
    """Sends an optional Slack alert."""
    if SLACK_WEBHOOK:
        try:
            requests.post(SLACK_WEBHOOK, json={"text": message})
        except Exception as e:
            print(f"[ERROR] Failed to send Slack alert: {e}")

def lambda_handler(event, context):
    """Main Lambda entry point."""
    try:
        price = get_latest_spot_price()
        print(f"[INFO] Current Spot Price: ${price:.4f}")

        if price > PRICE_THRESHOLD:
            update_asg(100)  # All On-Demand
            msg = (
                f"Spot price is ${price:.4f}, above threshold (${PRICE_THRESHOLD}). "
                f"Switched ASG '{ASG_NAME}' to 100% On-Demand to preserve uptime."
            )
        else:
            update_asg(50)  # 50% Spot, 50% On-Demand
            msg = (
                f"Spot price is ${price:.4f}, below threshold (${PRICE_THRESHOLD}). "
                f"Using 50% Spot in ASG '{ASG_NAME}' for optimized savings."
            )

        print(msg)
        send_alert(msg)

    except Exception as e:
        error_msg = f"[ERROR] {str(e)}"
        print(error_msg)
        send_alert(f"Lambda error in spot_monitor: {str(e)}")
	

3. Schedule with EventBridge

Run Lambda on a 5-minute interval for continuous responsiveness. To setup the event bridge use following CLI commands:

  • aws events put-rule \ --schedule-expression "rate(5 minutes)" \ --name SpotMonitorSchedule
  • aws events put-targets \ --rule SpotMonitorSchedule \ --targets "Id"="1","Arn"="{your-lambda-arn}"

4. Track Results in Cost Explorer

AWS Cost Explorer can visualize usage trends, highlighting reduced costs during Spot usage and stable operation during On-Demand transitions.

Results

Expected outcomes from this solution:

  • Significant cost savings, with up to 90% reduction in EC2 spend during Spot usage.
  • Maintained uptime through proactive switching to On-Demand before Spot instance interruptions.
  • Improved visibility via real-time alerts that notify of instance mix changes.
  • Automatic recovery by reverting to Spot usage when pricing stabilizes.
  • Reduced operational overhead with fully automated scaling decisions.

Conclusion

A predictive auto-scaling model eliminates the trade-off between savings and reliability.

  • Spot usage during stable market conditions.
  • On-Demand fallback during spikes.
Main Logo
Rocket