Query AWS Cloud Trail Events

By Bhuvaneswari Subramani / Mar 14,2023

This post will provide you with a comprehensive understanding of how to store CloudTrail logs in an AWS CloudTrail Lake and leverage SQL queries to analyze the CloudTrail events that are stored in the lake.

AWS is a big container housing huge list of varied services. When you create an AWS Account, there are multiple ways in which you would create, update, delete or access the AWS resources – AWS Console, AWS SDK & AWS CLI.

Well, ultimately each of these events are either User activity or API calls. Now monitoring Who did what, where & when is called Account Monitoring and AWS CloudTrail is purpose built for that in 2013. Since then, CloudTrail has been the single source of truth to track user activity and API usage.

Later AWS CloudTrail Lake was launched in 2022 to aggregate, immutably store, and query your activity logs for auditing, security investigation, and operational troubleshooting is simplified.

In one product, CloudTrail Lake collects, stores, optimizes, and queries activity logs. As a result of combining these features into one environment, CloudTrail Lake eliminates the need for separate data pipelines across teams and products.

Recently, AWS CloudTrail Lake has also extended support for non-aws event source integration

Irrespective of the data source, the success of the services depends on how the data is stored and how seamlessly it can be utilised or accessed. This blog post focus on two important features, storing and querying from CloudTrail Lake.

So let's dive deep into the steps to store the CloudTrail logs in a CloudTrail Lake and run SQL queries on your CloudTrail events stored in AWS CloudTrail Lake

Table of Contents

  • Create CloudTrail
  • Create CloudTrail Lake
  • Create Query
  • Run Query
  • Validate Query
  • Cleanup
  • Learning Reference
  • Conclusion

Create CloudTrail

  • Sign in to the AWS Management Console and open the CloudTrail console at https://console.aws.amazon.com/cloudtrail/.
  • Create a CloudTrail, named cloudtrail-demo in your region of exploration, and save in an S3 bucket named aws-cloudtrail-logs-demo-321321

Create CloudTrail Lake

  • Stay on CloudTrail console, navigate to CloudTrail Lake and click Event data sourc and create an event data store

  • Configure Create an event data store named cloudtrail-event-ds-demo

  • Choose events Make sure you select at least one event source (Management / Data) else you will be notified with the below message while moving to the next screen

Select at least one of management events or data events. In the below example, only management event is selected and copying existing trail events from specified S3 bucket.

  • Review and create

Once successfully created, you will see the below confirmation

Create Query

Create query to run against the above event data store cloudtrail-event-ds-demo Go back to Event data store screen and select Run query

Queries in CloudTrail are authored in SQL. You can build a query on the CloudTrail Lake Editor tab by writing the query in SQL from scratch, or by opening a saved or sample query and editing it.

First you may run one of the sample queries to get a feel of query format and result, later frame your own query and execute.

Sample Query:

Run the sample query find the number of API calls grouped by event name and event source within the past week.

The Query results tab for an active query displays rows of results based on a query. You can filter results by entering all or part of an event field value.

On the Command output tab, you can review metadata about the query run, such as the event data store ID, run time, number of results scanned, and the success or failure of the query. Query results saved to an Amazon S3 bucket will also have a link to the S3 bucket in the metadata.

Custom Query 1:

To list the EC2 instance related events including the eventTime, eventName and IPAddress where the event has originated from the specified date and time (say last 5 days)

SELECT 
    userIdentity.principalid, userIdentity.userName, eventName, eventTime, sourceIPAddress

FROM 
    event_data_store_ID

WHERE
     userIdentity.principalid IS NOT NULL 
AND 
    eventTime > 'yyyy-mm-dd hh:mm:ss' 
AND 
    eventSource='ec2.amazonaws.com'

[Note: replace event_data_store_ID with your event data store id and date & time.]

In Save query, enter a name and description for the query. Choose Save query to save your changes as the new query. To discard changes to a query, choose Cancel, or close the Save query window

Try it yourself

Custom Query 2

To list the Terminated or Stopped EC2 instances where the event has originated from the specified date and time (say last 5 days)

SELECT 
    userIdentity.principalid, userIdentity.userName, eventName, eventTime, sourceIPAddress

FROM
    d56bf0c1-fee5-4667-986d-b0d9e6048e4b

WHERE 
    userIdentity.principalid IS NOT NULL 

AND 
    eventTime > '2023-02-01 00:00:00'

AND 
    eventName='TerminateInstances' OR eventName='StopInstances'

Run Query

Here you go with steps to run a query using CloudTrail Lake

  • Sign in to the AWS Management Console and open the CloudTrail console at https://console.aws.amazon.com/cloudtrail/.
  • From the navigation pane, open the Lake submenu, then choose Query.
  • On the Saved queries or Sample queries tabs, choose a query to run by choosing the value in the Query SQL column.
  • On the Editor tab, for Event data store, choose an event data store from the drop-down list.
  • (Optional) On the Editor tab, choose Save results to S3 to save the query results to an S3 bucket.

Query results can be saved in S3 bucket

Key points to remember while saving query results to S3

  • CloudTrail delivers the query results to the S3 bucket in compressed gzip format
  • On average, after the query scan completes you can expect a latency of 6 minutes for every GB of data delivered to the S3 bucket
  • Queries that run for longer than one hour might time out. Partial results will not be saved into S3, hence fine tune your query to limit the data scan to complete within an hour

Validate Query

If you want to determine whether the query results have been modified, deleted, or unchanged after CloudTrail delivered them, you can use CloudTrail query results integrity validation.

  • Sign in to the AWS Management Console and open the CloudTrail console at https://console.aws.amazon.com/cloudtrail/
  • From the navigation pane, open the Trials submenu, then choose the trail and click Delete button
  • You will receive the below prompt

  • Click Delete to delete the trail and then proceed to delete S3 bucket as detailed below.

Delete cloudtrail S3 bucket:

  • Go to Amazon S3 console, select the radio button before aws-cloudtrail-logs-demo-321321 bucket and click Delete button.
  • You might see the following error if the bucket contains cloudtrail events

  • permanently delete all objects and delete the bucket

Delete event store data:

  • Click on the Event data stores tab in the Lake console.
  • Select the event data store from the list (cloudtrail-event-ds-demo).
  • From the actions menu, select “Change termination protection”.
  • From the change termination protection pop-up select Disabled and click “Save”.
  • From the Actions menu select Delete, confirm that you want to delete it by entering the name of the data store. Then click “Delete”. This will place your event data store in the pending deletion state.
  • This will disable the data store and in seven days it will be deleted permanently.
  • If you feel, you have deleted by mistake during this time, you can restore it from Actions menu. (I was just curious if the event data store is getting into pending deletion state, then it should be restorable too)
  • Additionally, delete the S3 bucket if this has been created to store the query results in this demo. example: aws-cloudtrail-lake-query-results--

Learning Reference

  • AWS CloudTrail user guide
  • AWS CloudTrail pricing
Main Logo
Rocket