Enhanced resiliency: Backup restore testing plan in action

By Saumil Shah, Piyush Jalan / Jun 06, 2024

Contents

Introduction:

In our rapidly evolving technological landscape, the rise of cyber threats underscores the heightened responsibility to protect our data. As technology advances, so to do the methods of potential attackers, necessitating a proactive approach to safeguarding critical information. While creating Immutable backups is a fundamental step in data protection, the true test of resilience lies in ensuring their reliability through rigorous testing. In this blog, we explore the vital importance of implementing a thorough backup restore testing plan as a proactive defense against data loss and cyber-attacks.

AWS Backup restore testing plan along with custom backup validation tools can further help to ensure the backups are safe and healthy to be used. The custom scans could involve an agent installation, a custom script or even can involve AWS Inspector to do a manual scan to ensure it is validated as per the security expectations.

Architecture Overview:


The above architecture uses AWS recovery testing plan to recover an EC2 from a backup vault and use ‘SSM Run Command’ to run a script from a S3 bucket on the restored EC2 to scan the restored EC2 for the presence of a specific ransomware files with extension of ‘.ryk’.

Step-by-step guide:

1. Creating a restore testing plan.

2. Assign resource to recovery testing plan

3. Create an S3 bucket

  • Create an S3 bucket to host a custom script which can scan for ransomware or any other custom security scan on the restored EC2 machine.
  • SSM Would be used to deploy the agent or required script on the restored EC2 machine to perform necessary scans and checks.

For Instance:

  • It could be an agent installation which performs security checks.
  • It could be a custom python script, which scans for specific file formats like ‘.ryk’ files on entire EC2 machine.
  • Anything you want which can help you validate that the restored EC2 is healthy.
  • You can further have those script/agent to send a scan report and also update the restore testing result in AWS Backup console

Below is custom python code which would look for ‘.ryk’ encrypted ransomware files in the EC2 machine and send results to respective owner via SES and also updates the health of backup in restore testing section of AWS Backup.

check_ransomware.py:

import os
import boto3
import sys
 
# AWS SES Configuration
AWS_REGION = 'primary-region'
SENDER_EMAIL = 'source-email'
RECIPIENT_EMAIL = 'destination-email'
 
def send_email(subject, body):
    client = boto3.client('ses', region_name=AWS_REGION)
    response = client.send_email(
        Destination={'ToAddresses': [RECIPIENT_EMAIL]},
        Message={
            'Body': {'Text': {'Charset': 'UTF-8', 'Data': body}},
            'Subject': {'Charset': 'UTF-8', 'Data': subject},
        },
        Source=SENDER_EMAIL
    )
    print("Email sent successfully")
 
def scan_for_ryk_files(directory):
    found_ryk_files = False
    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith('.ryk'):
                found_ryk_files = True
                break
        if found_ryk_files:
            break
    return found_ryk_files
 
def send_restore_validation_result(restore_job_id, validation_status):
    client = boto3.client('backup', region_name=AWS_REGION)
    response = client.put_restore_validation_result(
        RestoreJobId=restore_job_id,
        ValidationStatus=validation_status,
        ValidationStatusMessage='Ryk files found' if validation_status == 'FAILED' else 'No Ryk files found'
    )
    print("Restore validation result sent successfully")
 
def main():
    # Check if the Restore Job ID is provided as a command-line argument
    if len(sys.argv) < 2:
        print("Usage: python script.py <RestoreJobId>")
        return
 
    restore_job_id = sys.argv[1]  # Retrieve Restore Job ID from command-line argument
    
    # Perform the scan and send email based on the results
    data_folder = '/data'
    if scan_for_ryk_files(data_folder):
        subject = "Virus Alert: Infected Files Detected"
        body = "The system has detected files with the '.ryk' extension in the /data folder. Please take necessary actions immediately."
        send_restore_validation_result(restore_job_id, 'FAILED')
    else:
        subject = "System Clean: No Infected Files Detected"
        body = "The system has scanned the /data folder and found no files with the '.ryk' extension. The system is clean."
        send_restore_validation_result(restore_job_id, 'SUCCESSFUL')
    send_email(subject, body)
 
if __name__ == "__main__":
    main()

4. Creating a lambda function

- Create a lambda function with python 3.12 runtime & necessary IAM role & permissions

Lambda Function Code:

Replace the below variables in python code

  1. ARN: IAM instance profile to be attached to restored EC2 (must have necessary permissions to access S3 bucket with custom script)
  2. 'Note': Replace the S3 bucket name & Script name respectively.
    import boto3
    import time
     
    def lambda_handler(event, context):
        # Extract ARN from resources list
        resource_arn = event['resources'][0]
        
        # Extract Restore Job ID from the event
        restore_job_id = event['detail']['restoreJobId']
     
        # AWS Services
        ec2_client = boto3.client('ec2')
        ssm_client = boto3.client('ssm')
     
        created_resource_arn = event['detail']['createdResourceArn']
        arn_segments = created_resource_arn.split(':')
        restored_instance_id = arn_segments[-1].split('/')[-1]
        
        # Associate IAM role with the restored EC2 instance from AWS Backup
        try:
            response = ec2_client.associate_iam_instance_profile(
                IamInstanceProfile={
                    'Arn': ''  # Replace with the ARN of your IAM role
                },
                InstanceId=restored_instance_id
            )
        except Exception as e:
            print(f"Error associating IAM role to instance: {e}")
            return
     
        # Wait for a few seconds for IAM role association to take effect
        time.sleep(5)
     
        # Run phased shell script on restored EC2 instance using SSM Run Command
        try:
            ssm_response = ssm_client.send_command(
                InstanceIds=[restored_instance_id],
                DocumentName="AWS-RunShellScript",
                Parameters={
                    'commands': [f'sudo aws s3 cp s3://logicalmabdapythoncode/check_ransomware.py /tmp/check_ransomware.py']
                }
            )
        except Exception as e:
            print(f"Error downloading file from S3 to EC2: {e}")
            return
     
        # Run Python script on restored EC2 instance using SSM Run Command
        try:
            ssm_response = ssm_client.send_command(
                InstanceIds=[restored_instance_id],
                DocumentName="AWS-RunShellScript",
                Parameters={
                    'commands': ['sudo pip3 install boto3', f'sudo python3 /tmp/check_ransomware.py {restore_job_id}']  # Pass Restore Job ID as an argument
                }
            )
        except Exception as e:
            print(f"Error running shell script on instance: {e}")
            return
     
        return "Lambda execution completed successfully"

5. Create an EventBridge rule to trigger above created lambda function

{
            "source": ["aws.backup"],
            "detail-type": ["Restore Job State Change"],
            "detail": {
              "status": ["COMPLETED"]
            }
 }

Conclusion:

Implementing a robust backup restore testing plan is crucial for ensuring the resilience and reliability of your data protection strategy. By leveraging AWS Backup and custom validation tools, organizations can proactively verify the integrity and security of their backups. This approach not only enhances data security but also builds confidence in the ability to recover from potential data loss and cyber threats. As technology continues to evolve, maintaining a proactive stance on data protection through regular testing will remain a cornerstone of effective cybersecurity practices.

Main Logo
Rocket