Highly Available, Cost-Optimized Auto Scaling with Spot and On-Demand Instances

By Khushi Carpenter, Piyush Jalan / May 06, 2025

Contents

Introduction
Solution Overview
Use Case Example

1. Create a Mixed Instances Auto Scaling Group (ASG)
2. Build the Lambda Spot Watcher
3. Schedule with EventBridge
4. Track Results in Cost Explorer

Results
Conclusion

Introduction

AWS EC2 Spot Instances can reduce cloud bills by up to 90%, though this comes with the risk of potential interruptions. A predictive auto-scaling strategy can deliver these savings without introducing downtime.

This approach monitors Spot market conditions and pre-emptively switches to On-Demand EC2 when interruption risk rises—providing:

Maximum cost savings
Zero downtime
Fully automated logic

Solution Overview

This solution uses a serverless AWS Lambda function to:

Monitor Spot instance pricing via ec2:DescribeSpotPriceHistory
Dynamically adjust Auto Scaling Group (ASG) instance mix
Optionally send alerts (e.g., Slack) when switching occurs

Key Architecture Features:

Mixed Instances Policy in the ASG
Lambda-based Spot price monitor
Scheduled execution using EventBridge
Slack notifications for visibility

Outcome: A self-adjusting compute layer that ensures uptime while optimizing cost.

Use Case Example

1. Create a Mixed Instances Auto Scaling Group (ASG)

Launch Template

The following configurations are essential when creating the launch template:

For Instance Type, select ‘Don’t Include in the Template’
For Purchasing Type, select ‘Don’t Include in the Template’

Auto Scaling Group

To set up an Auto Scaling Group with a mix of Spot and On-Demand instances, use the following steps:

Go to AWS Console, search for EC2 and select Auto Scaling Group.
Select create and choose options as below:

Click on next, and configure following Instance Options:

Click on next, and configure following Network Options:

Click on next, and configure following Scaling Options:

2. Build the Lambda Spot Watcher

A Python-based Lambda function performs the following:

Retrieves recent Spot pricing
Compares against a configurable PRICE_THRESHOLD
If pricing exceeds the threshold → switch to 100% On-Demand
If pricing is stable → revert to 50/50 Spot-On-Demand split

IAM Permissions

	
    	{
		  "Effect": "Allow",
		  "Action": [
		    "ec2:DescribeSpotPriceHistory",
		    "autoscaling:UpdateAutoScalingGroup",
		    "autoscaling:DescribeAutoScalingGroups"
		  ],
		  "Resource": "*"
		}

Lambda Logic (Highlights)

	
	    	if spot_price > threshold:
			    update_asg(100)  # All On-Demand
			else:
			    update_asg(50)   # Half Spot, Half On-Demand

Slack Webhook integration is optional for alerting.

Lambda Environment Variables

In the Lambda console, go to Lambda Function, Configurations and Environment Variables:

- INSTANCE_TYPE = t3.medium

- ASG_NAME = your-asg-name

- PRICE_THRESHOLD = 0.05

- AZ = us-east-1a

- SLACK_WEBHOOK = (optional)

Lambda Code


		import boto3
import datetime
import os
import requests

# AWS clients
ec2 = boto3.client('ec2')
asg = boto3.client('autoscaling')

# Configuration from environment variables
INSTANCE_TYPE = os.getenv("INSTANCE_TYPE", "t3.medium")
ASG_NAME = os.getenv("ASG_NAME", "example-asg")
PRICE_THRESHOLD = float(os.getenv("PRICE_THRESHOLD", "0.05"))
AVAILABILITY_ZONE = os.getenv("AZ", "us-east-1a")
SLACK_WEBHOOK = os.getenv("SLACK_WEBHOOK")

def get_latest_spot_price():
    """Fetches the latest Spot price for the specified instance type and AZ."""
    now = datetime.datetime.utcnow()
    prices = ec2.describe_spot_price_history(
        InstanceTypes=[INSTANCE_TYPE],
        ProductDescriptions=['Linux/UNIX'],
        StartTime=now - datetime.timedelta(minutes=15),
        EndTime=now,
        AvailabilityZone=AVAILABILITY_ZONE,
        MaxResults=1
    )
    if prices['SpotPriceHistory']:
        return float(prices['SpotPriceHistory'][0]['SpotPrice'])
    else:
        raise Exception("No Spot price history available.")

def update_asg(on_demand_percent):
    """Updates the ASG with a new On-Demand percentage."""
    response = asg.update_auto_scaling_group(
        AutoScalingGroupName=ASG_NAME,
        MixedInstancesPolicy={
            'InstancesDistribution': {
                'OnDemandPercentageAboveBaseCapacity': on_demand_percent,
                'SpotAllocationStrategy': 'capacity-optimized'
            }
        }
    )
    return response

def send_alert(message):
    """Sends an optional Slack alert."""
    if SLACK_WEBHOOK:
        try:
            requests.post(SLACK_WEBHOOK, json={"text": message})
        except Exception as e:
            print(f"[ERROR] Failed to send Slack alert: {e}")

def lambda_handler(event, context):
    """Main Lambda entry point."""
    try:
        price = get_latest_spot_price()
        print(f"[INFO] Current Spot Price: ${price:.4f}")

        if price > PRICE_THRESHOLD:
            update_asg(100)  # All On-Demand
            msg = (
                f"Spot price is ${price:.4f}, above threshold (${PRICE_THRESHOLD}). "
                f"Switched ASG '{ASG_NAME}' to 100% On-Demand to preserve uptime."
            )
        else:
            update_asg(50)  # 50% Spot, 50% On-Demand
            msg = (
                f"Spot price is ${price:.4f}, below threshold (${PRICE_THRESHOLD}). "
                f"Using 50% Spot in ASG '{ASG_NAME}' for optimized savings."
            )

        print(msg)
        send_alert(msg)

    except Exception as e:
        error_msg = f"[ERROR] {str(e)}"
        print(error_msg)
        send_alert(f"Lambda error in spot_monitor: {str(e)}")

3. Schedule with EventBridge

Run Lambda on a 5-minute interval for continuous responsiveness. To setup the event bridge use following CLI commands:

aws events put-rule \ --schedule-expression "rate(5 minutes)" \ --name SpotMonitorSchedule
aws events put-targets \ --rule SpotMonitorSchedule \ --targets "Id"="1","Arn"="{your-lambda-arn}"

4. Track Results in Cost Explorer

AWS Cost Explorer can visualize usage trends, highlighting reduced costs during Spot usage and stable operation during On-Demand transitions.

Results

Expected outcomes from this solution:

Significant cost savings, with up to 90% reduction in EC2 spend during Spot usage.
Maintained uptime through proactive switching to On-Demand before Spot instance interruptions.
Improved visibility via real-time alerts that notify of instance mix changes.
Automatic recovery by reverting to Spot usage when pricing stabilizes.
Reduced operational overhead with fully automated scaling decisions.

Conclusion

A predictive auto-scaling model eliminates the trade-off between savings and reliability.

Spot usage during stable market conditions.
On-Demand fallback during spikes.