Case Study Main Image

A more resilient and efficient DR in AWS: A case study of a digital healthcare solutions company

Mar 28,2023

A successful migration to cloud and consolidation of workloads would have enabled the company to transform multiple on-prem platforms to reduce operational costs. While the migration to cloud was the ultimate goal, the company had a major need to build DR capability for Business Continuity requirements from their large customer base.

A U.S.-based Digital Healthcare Solutions and Strategic Consulting company, whose solutions and services are used by leading healthcare plans, wanted to consolidate applications running in multiple datacenters into the AWS cloud. These leading healthcare plans are used by more than 42 million people nationwide!

Background and Challenges

Disaster Recovery Site build in AWS

The company initially worked with their technology infrastructure hosting provider and attempted to setup DR in their multiple datacenters. During this effort, it was realized:

  • On-premises technology infrastructure for DR will not be cost effective, scalable and will make it difficult to fulfill its SLAs with their client base for specified RTO/RPO requirements
  • Dynamic allocation and autoscaling of DR compute resources not easy achievable, thus the cost to configure and maintain just DR with their hosting provider will be extremely high
  • It will not be the right step towards ultimate cloud migrations

Company decided to move from their “On-prem Hosted DR Environment” to an AWS Based DR solution. The overall solution will be configured to use AWS Landing Zone design and segmented into multiple AWS accounts, utilizing the Networking Account with Palo Altos which will connect to a Transit Gateway.

Company decided to embark on the cloud journey with their internal resources initially. The migration program faced several challenges in terms of achieving its primary goals; in terms of planning, timing and costs. Intuitive Technology Partners was engaged to re-assess the ultimate cloud migration initiative holistically do deliver a plan towards a successful migration, starting with configuration complete DR solution in AWS as the first step.

Intuitive implemented DR solution in AWS using multiple technologies for different workloads – to sync and maintain data, application configuration and content:

  • Cloudendure
  • Oracle Physical Standby
  • SQL Server Log-shipping
  • Custom-built solution for 100s of analytical workloads

Business Goals

  • Business to survive its operations from DR site in case of unavailability of On-prem
  • Maintain application performance at similar or better levels as it existed
  • Maintain data integrity and security without any unplanned downtime for their customers
  • Retain all necessary controls to maintain the Industry regulations and compliances (HITRUST, HIPAA)
  • MPLS and VPN connectivity to AWS DR for the customers and business
  • Automate Infrastructure Provisioning (IaC)
  • Ability to dynamically scale DR compute resources, as on-prem infrastructure grows

**Additional item (added by customer – Move MongoDB workloads to MongoDB Atlas Multi-Cloud Database Service):

  • Provision High Performing, Low-cost, Resilient Production and DR Database Environment
  • Readily available Elastic Primary & Secondary Infra + Hot Secondary DR site for DR
  • Be on latest MongoDB Technology TechStack
  • Move 100s of mongo databases from an on-prem hosting facility to cloud and provision DR as part of the move to:
    • Cut costs by running databases in atlas mongodb cloud, vs on-prem hosting setup
    • Gain effeciences and reduce operational overhead
    • Achieve flexiliby to spin, terminate or pause any cluster on-demand
  • Ability to autoscale compute resources

Implementation Strategy - ‘DR Build in AWS’ and ‘MongoDB Migration to Atlas’:

DR Build in AWS & MongoDB Migration:

To build company’s DR solution and begin its modernization journey to AWS for its core applications and database platform, the organization engaged Intuitive Technology Partners to assess the app/db infrastructure and determine the course of action to migrate to AWS. Intuitive engineers worked with customer’s technology and business teams and designed a DR solution, mapping all applications and related supporting processes:

Intuitive Engineers conducted series of sessions to:

  • Understand apps & dbs in migration scope
  • Inventory of all related infrastructure assets
  • Software licensing in use
  • Expectations, downtimes, automation framework
  • Third-party tools & integrations
  • Backup and Monitoring methods

The assessment outcome listed 100+ servers hosting enterprise applications and databases.

Leverging its Cloud Migration & Modernization Practice, Intuitive was able to:

  1. Fully guide and execute build of DR infrastructure of all the core applications and databases in AWS
  2. Migrate Multi-Clustered large MongoDB footprint from on-premise to “Atlas MongoDB”
  3. Host mongo instance under Latest/Supported versions on ATLAS, with redundancy - DR comes bundled with this setup (no separate effort required to build mongo DR setup)

Major Wins:

  • Fully automated AWS Infrastructure provisioning with CloudFormation (IaC)
  • Automated AWS resources provisioning via Autoscaling -reducing the time to delivery to within minutes
  • With DR in AWS, company achieved 30 minutes (RTO) with less than 5 minute of data loss (RPO)
  • Analytics frequent configuration data syncs from on-prem to AWS
  • Capability to use AWS resources backups in case needed for any production issues debugging
  • Capability to rapidly clone any AWS resource for any testing need
  • Readily available MongoDB DR Site for on-demand DR drills/testing
  • Keep resources down during no-use periods, to reduce costs

Business Value – DR in AWS:

Disaster Readiness Company is able to maintain disaster readiness as part of its daily operations, validating its recovery time objectives quarterly with automation that verifies all systems while paying only for resources when it needs them.
DR Environment on-going maintenance Company IT staff is trained to bring-up any DR resource and maintain any environment with ease throught well established automated processes and procedures.
Ongoing testing/validation Business team are able to perform DR Drills and are prepared for actual failover to AWS as needed.

Business Value – Migration of MongoDB to atlas cloud:

  • Discovered 381 On-prem Databases - Migrated only 89 prod/non-prod required Databases (Others decommissioned during migration planning)
  • Discovered ~15TB storage used at On-prem – Reduced to within 500GB in atlas MoingoDB Cloud (due to many on-prem DBs no longer needed)
  • Discovered outdated versions of Mongo 2.6 and 3.0 – Modernized/Migrated to 4.2 at atlas cloud
  • Lack of DR Capability at On-prem – Modernized to Multi-AZ Mongo Cluster at atlas cloud
  • On-demand build of additional isolated clusters within minutes
  • Copy of latest backup snapshots from any source to target cluster with couple of clicks
  • Isolated dedicated DR drills cluster, ready for use within an hour (copied from production)

Core Technologies used:

AWS Services
  • Network: Direct Connect, Transit Gateway, VPCs, Route 53
  • Storage: EBS, EFS and S3 Buckets
  • Core resources: ec2, lambda, secrets manager, system manager
  • Backups: AWS Backup
  • Monitoring/Alerting: Cloudwatch, Cloudtrail, SES, SNS
Databases Sync from on-prem to AWS
  • Oracle Databases: Oracle DataGuard
  • SQL Server Databases: Log-shipping
  • Analytics Databases: Custom Bash/SQL Scripts
Application/Web/Content Servers Sync from on-prem to AWS
  • CloudEndure
Build and Configuration Management (IaC)
  • Terraform
MongoDB Migration to atlas MongoDB
  • Custom scripts:
    • Take data dumps on source
    • Import data in atlas MongoDB
    • Create all users and privs using atlas Mongodb APIs

Lessons Learned

It is important to frame the business problems in a way that takes into account the scale at which the business operates. In this case, based on Intuitive’s prior experiences, the migration factory approach yielded significant benefits in enabling the customer to reduce costs, modernize their technology footprint, and get more resilient disaster recovery capabilities at the same time. It is also critical that the communication channels across the various organizations and types of resources are well-maintained, as organizational and technical complexity at this scale can quickly get out of hand. Repeatability at scale is key, and yields benefits not only in making migrations more efficient, the ongoing benefits of standardization and lifecycle management are significant as well.

Main Logo
Rocket