KINTO Tech Blog
DBRE

Introducing a Secure Method for Database Password Rotation

Cover Image for Introducing a Secure Method for Database Password Rotation

Hello, I am _awache (@_awache), from DBRE at KINTO Technologies (KTC).

In this article, I’ll provide a comprehensive overview of how I implemented a safe password rotation mechanism for database users primarily registered in Aurora MySQL, the challenges I encountered, and the peripheral developments that arose during the process.

To start, here's a brief summary, as this will be a lengthy blog post.

Summary

Background

Our company has implemented a policy requiring database users to rotate their passwords at regular intervals.

Solution

Considered

  • MySQL Dual Password: To set primary and secondary passwords by using Dual Password function that is available in MySQL 8.0.14 and later.
  • AWS Secrets Manager rotation function: To enable automatic update of passwords and strengthened security by using Secrets Manager

Adopted

Rotation function of AWS Secrets Manager was adopted for its easy setting and comprehensiveness.

Project Kickoff

At the beginning of the project, we created an inception deck and clarified key boundaries regarding cost, security, and resources.

What was developed in this project

Lambda functions

After thorough research, we developed multiple Lambda functions because the AWS-provided rotation mechanism did not fully meet KTC's requirements.

  1. Lambda function for single user strategy
    • Purpose: To rotate passwords for a single user
    • Settings: Managed by Secrets Manager. These functions execute at the designated rotation times in Secrets Manager to update passwords.
  2. Lambda function for alternate users rotation strategy
    • Purpose: This function updates passwords for two users alternately to enhance availability.
    • Settings: Managed by Secrets Manager. In the initial rotation, a second user (a clone) is created; passwords are switched in subsequent rotations.
  3. Lambda function for Secret Rotation Notifications
    • Purpose: this function reports the results of secret rotations.
    • Trigger: CloudTrail events for RotationStarted, RotationSucceeded, and RotationFailed
    • Function: To store the rotation results in DynamoDB and send notifications to Slack. Additionally, it posts a follow-up message with a timestamp to the Slack thread.
  4. Lambda function for Managing DynamoDB storage of rotation results
    • Purpose: To store rotation results in DynamoDB as evidence for submission to the security team.
    • Function: Executes in response to CloudTrail events to save the rotation results to DynamoDB and send SLI notifications based on the stored data.
  5. Lambda function for SLI notification
    • Purpose: To monitor the status of rotation and to send SLI notifications.
    • Function: Retrieves information from DynamoDB to track the progress of secret rotation and sends notifications to Slack as needed.
  6. Lambda function for rotation schedule management
    • Purpose: To determine the execution time of rotation for a DBClusterID.
    • Function: Generates a new schedule based on the settings of existing secret rotations, saves it to DynamoDB, and sets the rotation window and duration.
  7. Lambda function for applying rotation settings
    • Purpose: To apply the scheduled rotation settings to Secrets Manager
    • Function: Configures secret rotation at the designated times using information from DynamoDB.

A Tool for Registering Secret Rotations

We developed an additional tool to facilitate local registration of secret rotations.

  • Tool for setting Secrets Rotation schedule
    • Purpose: To set secret rotation schedules per database user.
    • Function: Applies the secret rotation settings based on data saved in DynamoDB for the specified DBClusterID and DBUser.

Final Architecture Overview

We initially believed it could be done much simpler, but it turned out to be more complex than expected...

The whole image

Results

  • Automated the entire secret rotation process, reducing security and management efforts.
  • Developed a comprehensive architecture that meets governance requirements.
  • Leveraged secret rotation to enhance database safety and efficiency, with ongoing improvement efforts.

Now, let's explore the main story.

Introduction

KTC has implemented a policy requiring database users to rotate their passwords at regular intervals. However, rotating passwords is not a straightforward process.

To change a database user's password, the system must first be stopped. Then, the password in the database is updated, system settings files are adjusted, and finally, system operations must be verified. In other words, we need to perform a maintenance operation that provides no direct value by stopping the system just to change a database user's password. It would be highly inconvenient to perform this for every service at extremely short intervals.

This article explains how we addressed this challenge through specific examples.

Solution Considerations

We considered two major solutions.

  1. To use functions of MySQL Dual Password
  2. To make use of the rotation function of Secrets Manager

MySQL Dual Password

The Dual Password function is available in MySQL starting from version 8.0.14. Using this function allows us to set both a primary and a secondary password, enabling password changes without stopping the system or applications.

Simple steps to use Dual Password function are as follows:

  1. Set a new primary password. You can use the command ALTER USER 'user'@'host' IDENTIFIED BY 'new_password' RETAIN CURRENT PASSWORD; while keeping the current password as the secondary one.
  2. Update all applications to be connected with the new password.
  3. Delete the secondary password by ALTER USER 'user'@'host' DISCARD OLD PASSWORD;.

Rotation function of Secrets Manager

AWS Secrets Manger supports periodical automatic update of secrets. Activating secret rotation not only reduces efforts to manage passwords manually but also helps to enhance security.

To activate it, one only needs to configure the rotation policy in Secrets Manager and assign a Lambda function to handle the rotation.

Rotation setting screen

  • Lambda rotation function
    • Creating the rotation function
      • By automatically deploying the code provided by AWS, we can use it immediately without the need to create custom Lambda functions.
    • Using rotation function from Account
      • You can either create a custom Lambda function or select the one created earlier under 'Creating the Rotation Function' if you wish to reuse it.
  • Rotation strategy
    • Single user
      • Method to rotate passwords for a single user.
      • The database connection is maintained, allowing authentication information to be updated and reducing the risk of access denial with an appropriate retry strategy.
      • After rotation, new connections require the updated authentication information (password).
    • Alternate user
      • Initially, I found it challenging to grasp the alternate user strategy, even after reading the manual. However, after careful consideration, I’ve articulated it as follows:
        • This method alternates password updates by rotation, where the authentication information (a combination of username and password) is updated in a secret. After creating a second user (a clone) during the initial rotation, the passwords are switched in subsequent rotations.
        • This approach is ideal for applications that require high database availability, as it ensures that valid authentication information is available even during rotations.
        • The clone user has the same access rights as the original user. It's important to synchronize the permissions of both users when updating their access rights.
      • Below is an image illustrating the concept explained above.
        • Changes before and after rotation
          • Before/after rotation
            • Though it may be a bit difficult to see, the username will have '_clone' appended during password rotation.
            • In the first rotation, a new user with the same privileges as the existing user is created on the database side.
            • The password will continue to be updated by reusing it in subsequent rotations after the second one.
          • Alternate user

The Solution Adopted

We decided to use rotation function by Secrets Manager for the following reasons:

  1. Easy to set up
    • MySQL Dual Password
      • The updated password must be applied to the application after preparing a script for the password change.
    • Rotation function of Secrets Manager
      • The product side does not need to modify code as long as the service consistently retrieves connection information from Secrets Manager.
  2. Comprehensiveness
    • MySQL Dual Password
      • Supported only in MySQL 8.0.14 and later (Aurora 3.0 or later)
    • Secrets Manager Rotation Function
      • Supports all RDBMS used by KTC
        • Amazon Aurora
        • Redshift
      • Providing additional support beyond database passwords
        • Can also manage API keys and other credentials used in the product.

Toward the Project Kickoff

Before starting the project, we first clarified our boundaries for cost, security, and resources to determine what should and shouldn’t be done. We also created an inception deck.

The following is outline of what was discussed:

Breakdown of responsibilities

Topic Product team DBRE team
Cost - Responsible for the cost of Secrets Manager for storing database passwords - Responsible for the cost associated with the secret rotation mechanism.
Security - Products using this mechanism must always retrieve database connection information from Secrets Manager.
- After a rotation, connection information must be updated by redeploying the application and other components until the next rotation occurs.
- Ensuring that rotations are completed within the company's defined governance limits.
- Providing records of secret rotations to the security team as required.
- Passwords must not be stored in plain text to maintain traceability.
- Sufficient security must be maintained in the mechanism used for rotation.
Resources - Ensuring that all database users are managed by Secrets Manager. - Ensuring that the implementation of secret rotation resources is executed with the minimum necessary configuration.

What needed to be done

  • Execute secret rotation within the company’s defined governance limits.
  • Detect and notify the start, completion, success, or failure of a secret rotation to the relevant product teams.
  • Ensure recovery from a failed secret rotation without affecting the product.
  • Align rotation timing with the schedule set by users registered in the same DB Cluster.
  • Monitor compliance with the company’s governance standards.

Inception deck (an excerpt)

  • Why are we here
    • To develop and implement a system that complies with the company’s security policy and automatically rotates database passwords at regular intervals.
    • To strengthen security, reduce management efforts, and ensure compliance through automation.
    • Led by the DBRE team, to achieve safer and more efficient password management by leveraging AWS's rotation strategy.
  • Elevator pitch
    • Our goal is to reduce the risk of security breaches and ensure compliance.
    • We offer a service called Secret Rotation,
    • designed for product teams and the security group,
    • to manage database passwords.
    • It has functions to strengthen automatic security and reduce effort to manage,
    • Unlike MySQL’s Dual Password feature,
    • It is compatible with all AWS RDBMS option
    • Through AWS services, we utilize the latest cloud technologies to provide flexible and scalable security measures that meet enterprise data protection standards.

Proof of Concept (PoC)

To execute the PoC we prepared the necessary resources in our testing environment, such as a DB Cluster for our own verification. We discovered that implementing the rotation mechanism through the console was straightforward, leading us to anticipate a rapid deployment of the service with high expectations.

However, at that time, I had no way of knowing that trouble was just around the corner...

Architecture

Providing secret rotation alone is not enough without a notification mechanism for users. I’ll introduce an architecture that includes this essential feature.

Secret Rotation Overview

The whole architecture

  • Secret rotation will be managed by secrets registered in Secrets Manager.

    • Here’s an example of a monthly update for clarity.
      • In this case, the same password can be used for up to 2 months due to the monthly rotation schedule.
      • During this period, you will achieve compliance with the company's rotation rules with minimal effort, aligning with any necessary deployment timing for product releases.
  • Rotation Results to be stored at DynamoDB

    • In Secret Rotation, a status will be written to CloudTrail as an event by the following timing:

      • Process start; RotationStarted
      • Process failure; RotationFailed
      • Process end; RotationSucceeded
    • We configured a CloudWatch Event so that the above events would serve to trigger the Lambda function for notification.

      • Below are some of the Terraform code snippets used:
        cloudwatch_event_name        = "${var.environment}-${var.sid}-cloudwatch-event"
      

    cloudwatch_event_description = "Secrets Manager Secrets Rotation. (For ${var.environment})"
    event_pattern = jsonencode({
    "source" : ["aws.secretsmanager"],
    "$or" : [{
    "detail-type" : ["AWS API Call via CloudTrail"]
    }, {
    "detail-type" : ["AWS Service Event via CloudTrail"]
    }],
    "detail" : {
    "eventSource" : ["secretsmanager.amazonaws.com"],
    "eventName" : [
    "RotationStarted",
    "RotationFailed",
    "RotationSucceeded",
    "TestRotationStarted",
    "TestRotationSucceeded",
    "TestRotationFailed"
    ]
    }
    })
    ```

    • Stored rotation results can be used as evidence for submission to the security team.

The architecture reflecting the components discussed so far is as follows:

Architecture only for Secret Rotation

AWS resources needed for providing functions

  • Lambda function for applying alternate user strategy (Different Lambda functions are required for MySQL and Redshift.)
    • Lambda function for alternate user to be set at Secrets Manager
      • We developed this in-house to meet company rules for infrastructure compliance. We encountered several elements that automatically-generated Lambda functions could not address, such as Lambda function settings and IAM configurations.
  • Lambda function to apply single user strategy (Different Lambda is needed for MySQL and Redshift respectively)
    • Lambda function for single user to be set at Secrets Manager
      • A password for administrator user cannot be applied with alternate user strategy
  • Lambda function for Secret Rotation Notifications
    • A mechanism to notify that it has been rotated by Secret Rotation must be prepared by ourselves.
      • As CloudTrail is stored with the status and results, we can use them as a trigger to notify to Slack.
      • Be careful that Lambda will be executed individually when executed by an event trigger.
  • DynamoDB for storing rotation results
    • Results of rotation to be stored in DynamoDB
    • Additionally, the timestamp is stored in the Slack thread to clarify which notification it is related to.

Why we chose to manage the Lambda function for secret rotation ourselves

As a prerequisite, we use AWS-provided Lambda.

Since AWS provides the ability to automatically deploy code, we can use it immediately without the need to create individual Lambda functions.

However, we deploy it using Terraform after committing the code set to our repository.

Main reasons for this are as follows:

  1. Multiple services exist within KTC's AWS account.
    • When several services exist in the same AWS account, IAM’s privilege becomes too strong
    • Also, services are provided across regions
      • As Lambda cannot be executed in cross-region, the same code must be deployed to regions by using Terraform.
  2. We have a large number of database users that require Secret Rotation settings.
    • Number of database clusters Below 200; Number of database users Below 1000
    • The workload would be overwhelming if we manually built the system for each secret.
  3. Applying Company Rules
    • It calls for setting of Tag in addition to IAM
      • Automatic and individual creation will require setting up of Tag subsequently
  4. AWS-provided code will be updated periodically.
    • Since the codes are provided by AWS, this inevitably happens.
    • There is a possibility that this will lead to a trouble by chance

I have written several matters so far, but in a nutshell, it was more convenient for us to manage the codes in consideration of the in-company rules.

How we managed Lambda functions for Secrets Rotation

This was really a hard job.

At the beginning, we thought it would go easily as AWS provided samples of Lambda codes. But we saw many kinds of errors after deploying based on them. While we had some success during our own verification, we faced significant challenges when errors occurred in specific database clusters.

However, we discovered that the automatically generated code from the console was error-free and remained stable, highlighting the need to use it effectively.

There are several approaches, but let me share the one we tried.

  1. Exploring how to deploy from a sample code
    • We can see the code itself from the above mentioned link
    • However, it is hard to match all the necessary modules including version. Besides, this Lambda code is frequency updated and we have to follow up.
      • We gave up this approach as it was a hard job.
      • Then, we realized it would be better off if make it inhouse with other method as long as we need to control this code.
  2. Download the Lambda code after automatically generating the Secret Rotation function from the console.
    • This method is to generate code automatically every time, download it to local to apply it to our Lambda. It is not too difficult to do.
    • However, there is a chance that existing and working code may change from a downloaded code by timing of automatic code generation.
      • This approach would have worked, but we found it burdensome to deploy every time the code needed updates.
  3. Verify how it was deployed from the CloudFormation template used behind the scenes when the Secret Rotation function is automatically generated from the console.
    • When automatically generated from the console, AWS CloudFormation operates in the background.
    • By examining the template at this stage, we can obtain the S3 path of the code automatically generated by AWS.

We adopted the third method above as it was the most efficient way to directly obtain the Zip file from S3, eliminating the need to generate Secret Rotation code each time.

The actual script to download from S3 are as follows:

#!/bin/bash

set -eu -o pipefail

# Navigate to the script directory
cd "$(dirname "$0")"

source secrets_rotation.conf

# Function to download and extract the Lambda function from S3
download_and_extract_lambda_function() {
    local s3_path="$1"
    local target_dir="../lambda-code/$2"
    local dist_dir="${target_dir}/dist"

    echo "Downloading ${s3_path} to ${target_dir}/lambda_function.zip..."

    # Remove existing lambda_function.zip and dist directory
    rm -f "${target_dir}/lambda_function.zip"
    rm -rf "${dist_dir}"

    if ! aws s3 cp "${s3_path}" "${target_dir}/lambda_function.zip"; then
        echo "Error: Failed to download file from S3."
        exit 1
    fi

    echo "Download complete."

    echo "Extracting lambda_function.zip to ${dist_dir}..."
    mkdir -p "${dist_dir}"
    unzip -o "${target_dir}/lambda_function.zip" -d "${dist_dir}"
    cp -p "${target_dir}/lambda_function.zip" "${dist_dir}/lambda_function.zip"
    echo "Extraction complete."
}

# Create directories if they don't exist
mkdir -p ../lambda-code/mysql-single-user
mkdir -p ../lambda-code/mysql-multi-user
mkdir -p ../lambda-code/redshift-single-user
mkdir -p ../lambda-code/redshift-multi-user

# Download and extract Lambda functions
download_and_extract_lambda_function "${MYSQL_SINGLE_USER_S3_PATH}" "mysql-single-user"
download_and_extract_lambda_function "${MYSQL_MULTI_USER_S3_PATH}" "mysql-multi-user"
download_and_extract_lambda_function "${REDSHIFT_SINGLE_USER_S3_PATH}" "redshift-single-user"
download_and_extract_lambda_function "${REDSHIFT_MULTI_USER_S3_PATH}" "redshift-multi-user"

echo "Build complete."

By running this script at the time of deployment, the code can be updated. Conversely, the conventional code can be used continuously unless running this script.

Lambda function and Dynamo DB to notify results of Secret Rotation

A notification of Secret Rotation results is executed with PUT of CloudTrail as a trigger. We considered modifying the Lambda function for Secret Rotation to simplify things. However, this would have complicated explaining our effort to fully utilize the code provided by AWS.

Before starting development, I initially thought all we needed was to use a PUT trigger for notifications. But, things were not that easy.

Let’s see the whole picture again.

The whole architecture

Its notification process involves creating a Slack notification thread at the start and adding a postscript to the thread when the notification is completed.

Slack notification

Events we use this time are as follows:

  • Event at the start of the processing
    • Event of PUT to Cloud Trail RotationStarted
  • Event at the end of the processing
    • Event of PUT to Cloud Trail when the processing succeeds RotationSucceeded
    • Event of PUT to Cloud Trail when the processing fails RotationSucceeded

On the occasion of RotationStarted, an event at the start of the processing, its Slack time stamp can be stored in DynamoDB and we can add postscripts on the thread by using it.

Considering these, we had to examine by which unit DynamoDB would become unique. Consequently, we chose to combine SecretID of Secrets Manager and scheduled date of the next rotation to make it unique.

Main structure of columns of DynamoDB is as follows: (In actual, more information is being stored in them)

  • SecretID: Partition key
  • NextRotationDate: Sort key
    • Schedule of the next rotation; Obtainable with describe
  • SlackTS: Time stamp sent first by Slack at the event of RotationStarted
    • Using this time stamp, we can add postscript on the Slack thread.
  • VersionID: Version of SecretID at the event of RotationStarted
    • By keeping the last version to reverse to the previous state at once if a trouble happens, it is possible to restore the password information before the rotation

The biggest challenge we faced was that multiple Lambda functions were triggered in steps due to several PUT events being activated during a single Secret Rotation process. Even though i understood this in theory, it proved to be extremely troublesome.

We had to pay attention to the following consequently:

  • Processing of Secret Rotation itself is a very high-speed one.
    • Since the timing of PUT to Cloud Trail is almost identical for RotationStarted and RotationSucceeded (or RotationFailed), the execution of Lambda for notification will take place twice, almost simultaneously.
    • But Lambda for notification also handles Slack notification and DynamoDB registration, an event at the processing end may run before the RotationStarted process completes.
      • When this happens, a new script will be added to Slack without knowing the destination thread.

To solve this, we chose a simpler approach where processing to notify Slack should be halted for a couple of seconds in case of the name of event is other than RotationStarted.

Secret Rotation may fail due to an error of setting and such. In most cases, a product will not be affected by this at once as it becomes an error before DB password updating.

In such a case, a recovery can be executed with the following command.

# VersionIdsToStages obtains the version ID of AWSPENDING
$ aws secretsmanager describe-secret --secret-id ${secret_id} --region ${region}
    - - - - - - - - - - Output sample of Versions - - - - - - - - - -
        "Versions": [
        {
            "VersionId": "7c9c0193-33c8-3bae-9vko-4129589p114bb",
            "VersionStages": [
                "AWSCURRENT"
            ],
            "LastAccessedDate": "2022-08-30T09:00:00+09:00",
            "CreatedDate": "2022-08-30T12:53:12.893000+09:00",
            "KmsKeyIds": [
                "DefaultEncryptionKey"
            ]
        },
        {
            "VersionId": "cb804c1c-6d1r-4ii3-o48b-17f638469318",
            "VersionStages": [
                "AWSPENDING"
            ],
            "LastAccessedDate": "2022-08-30T09:00:00+09:00",
            "CreatedDate": "2022-08-30T12:53:22.616000+09:00",
            "KmsKeyIds": [
                "DefaultEncryptionKey"
            ]
        }
    ],
    - - - - - - - - - - - - - - - - - - - - - - - - 

# Delete the subject version
$ aws secretsmanager update-secret-version-stage --secret-id ${secret_id} --remove-from-version-id ${version_id} --version-stage AWSPENDING --region ${region}

# From the console, to make the subject secret “rotate at once”

Although this has not occurred, if the database password is changed due to an issue, we execute the following command to retrieve the previous password.

Since we also use alternate user rotation, it doesn't immediately disable product access to the database. We believe it won't be an issue until the next rotation is executed.

$ aws secretsmanager get-secret-value --secret-id ${secret_id} --version-id ${version_id} --region ${region} --query 'SecretString' --output text | jq .

For # user and password, we will set a parameter obtained by aws secretsmanager get-secret-value
$ mysql --defaults-extra-file=/tmp/.$DB username for administration}.cnf -e "ALTER USER ${user} IDENTIFIED BY '${password}'

# Check connection
$ mysql --defaults-extra-file=/tmp/.user.cnf -e "STATUS"

As for the things to do up to here, we were able prepare a foundation to achieve the following:

  • Detect and notify the start, completion, success, or failure of a secret rotation to the relevant product teams.
  • Ensure recovery from a failed secret rotation without affecting the product.

Our battle did not stop here

Although we could prepare the major functions as described, we identified three additional tasks that we needed to address.

  • Execute secret rotation within the company’s defined governance limits.
  • Align rotation timing with the schedule set by users registered in the same DB Cluster.
  • Monitor compliance with the company’s governance standards.

In order to achieve them, we had to develop peripheral functions.

To build a mechanism to monitor the degree of compliance has been observed for the standard of the governance constrains defined by the company

What we should do in this is, in a nutshell, to obtain lists of all users existing in every DB Cluster, and to check if dates of password updating for every user should be within a duration required by corporate governance.

We can obtain the latest password updating date of every user after logging in each DB Cluster and executing the following query.

mysql> SELECT User, password_last_changed FROM mysql.user;
+----------------+-----------------------+
| User           | password_last_changed |
+----------------+-----------------------+
| rot_test       | 2024-06-12 07:08:40   |
| rot_test_clone | 2024-07-10 07:09:10   |
            :
            :
            :
            :
            :
            :
            :
            :
+----------------+-----------------------+
10 rows in set (0.00 sec)

This should be executed in every DB Cluster. However, we have already obtained metadata of all DB Clusters every day and automatically generated Entity Relationship Diagram and my.cnf, and executed a scrip to check if there is any inappropriate settings in database.

We could solve this simply by adding a processing to obtain lists of users and the latest password updating dates to save them in DynamoDB.

Main structure of columns of DynamoDB is as follows:

  • DBClusterID: Partition key
  • DBUserName: Sort key
  • PasswordLastChanged: Latest password updating date

In practice,

  1. Users automatically generated for the use of RDS but we cannot not control
  2. Users with the name of “_clone” generated by Secret Rotation function

The above users should be excluded. For this reason, we obtain the really necessary data by the following query.

SELECT
	CONCAT_WS(',', IF(RIGHT(User, 6) = '_clone', LEFT(User, LENGTH(User) - 6), User), Host, password_last_changed)
FROM
	mysql.user
WHERE
	User NOT IN ('AWS_COMPREHEND_ACCESS', 'AWS_LAMBDA_ACCESS', 'AWS_LOAD_S3_ACCESS', 'AWS_SAGEMAKER_ACCESS', 'AWS_SELECT_S3_ACCESS', 'AWS_BEDROCK_ACCESS', 'rds_superuser_role', 'mysql.infoschema', 'mysql.session', 'mysql.sys', 'rdsadmin', '');

In addition, we prepared a Lambda for SLI to gather information of DynamoDB. Consequently, the output is like this:

SLI notification

Its output content is as follows:

  • Total Items: The number of all users existing in all DB Clusters
  • Secrets Exist Ratio: Ratio of SecretIDs that comply with the naming rule for Secrets Manager used in KINTO Technologies
  • Rotation Enabled Ratio: Ratio of activated Secret Rotation functions
  • Password Change Due Ratio: Ratio of users who comply with the corporate governance rule

The important thing is to make Password Change Due Ratio 100%, There is no need to depend on Secret Rotation function as long as this ratio is 100%.

With this SLI notification mechanism, we can achieve the following:

  • Monitor compliance with the company’s governance standards.

A mechanism to synchronize rotation timing with the schedule set by users registered in the same DB Cluster.

We had to write two code sets to realize this mechanism.

  1. A mechanism to decide the execution time of rotation for a DBClusterID.
  2. A mechanism to set a rotation on Secrets Manager by the time determined by the above

Each of these is described below.

The mechanism to decide the execution time of rotation for a DBClusterID.

On the assumption, execution time of Secret Rotation can be described by a schedule called rotation window. Description and the usage of rotation window can be summarized into two as follows:

  • rate equation
    • This is used when we want to set a rotation interval by a designated number of days
  • cron equation
    • This is used when we want to set a rotation interval in detail such as specific day of the week or time.

We decided to use cron equation as we wanted to execute our setting in daytime of weekdays.

Another point to set is “window duration” of a rotation. By combining these two, we can control the execution timing of a rotation to some extent.

The relation between rotation window and window duration is as follows:

  1. Rotation window means the time when a rotation ends, not starts
  2. Window duration determines allowance for execution against the set up time by the rotation window
  3. Window duration’s default is 24 hours

That means, if the rotation window is set at 10:00AM of the fourth Tuesday every month but the widow duration is not specified (24 hours),

the timing for Secret Rotation will be executed sometime between 10:00AM of the fourth Monday and 10:00AM of the fourth Tuesday every month,

as a case. This is hard to follow intuitively. But, if we don’t get this relationship, Secret Rotation may be executed at unexpected timing.

With those assumption in mind, we determined the requirement as follows:

  • Rotation for DB users by DBClusterID will be executed at the same timezone
  • Window duration is for three hours
    • Setting by too short timing may lead to see problems occurring simultaneously during a timezone from a trouble reporting to its recovery
  • Timing of the execution is set at between 09:00 to 18:00 of weekdays Tuesdays to Fridays
    • We don’t execute on Mondays as it is more likely that a public holiday falls on that day.
    • As the window duration is going to be fixed as three hours, what can be set in cron equation is six hours between 12:00-18:00
    • Only UTC can be set in cron equation
  • Timings of execution should be dispersed as much as possible
    • This is because many Secret Rotations run at the same timing, restrictions of various API may be affected.
    • And if an error of some kind may occur, many alerts will be activated and we cannot respond to them at the same time

The whole flow of Lambda processing will be as follows:

  • Data acquisition:
    • Acquire a DBClusterID list from DynamoDB
    • Acquire setting information of existing Secret Rotation from DynamoDB
  • Generation of schedule
    • Initialize all combination (slots) of week, day and hour
    • Check if the subject DBClusterID does not exist in the setting information of existing Secret Rotation
      • If it exists, embed DBClusterID in the same slot of setting information of existing Secret Rotation
    • Distribute new DBClusterID to slots evenly
      • Add new data to empty slot and if it is not empty, add data to the next slot
    • Execute repeatedly until the last one of DBClusterID list
  • Storing data:
    • Data is stored after filtering setting information of the new Secret Rotation that does not duplicate with the existing data.
  • Error handing and notification:
    • When a serious error occurs, an error message is sent to Slack for notification.

Then, DynamoDB’s column to be stored is as follows:

  • DBClusterID: Partition key
  • CronExpression: cron equation to set at Secret Rotation

It’s a bit hard to follow, but we make a state as follows, as an image:

Slot putting in image

A mechanism to decide the execution time of rotation for a DBClusterID up to here.

However, this doesn’t work to set up the actual Secret Rotation. Then, we need a real mechanism to set up Secret Rotation.

The mechanism to set a rotation on Secrets Manager by the time determined by the above

We don’t believe that a mechanism of Secret Rotation is the only means to keep the corporate governance. More important thing is to see compliance with the governance standard defined by the company Accordingly, instead of enforcing to use this mechanism, we need a mechanism that make our users want to use it as the safest and simplest one conceived by DBRE.

Perhaps, we may find such requests from the users in DBCluster, like one user wishes to use Secret Rotation, while the other use insists to manage by themselves with different method.

To satisfy such requests, we will need a command line tool for setting of Secret Rotation in the unit of database user linked to DBClusterID required.

We have been developing a tool called dbre-toolkit for converting our daily work to command lines as DBRE. This is a package of tools such as the one to execute Point In Time Restore easily, the one to acquire DB connecting users in Secrets Manager to create defaults-extra-file.

This time, we added a subcommand here:

% dbre-toolkit secrets-rotation -h
2024/08/01 20:51:12 dbre-toolkit version: 0.0.1
It is a command to set Secrets Rotation based on
Secrets Rotation schedule linked to a designated Aurora Cluster.

Usage:
  dbre-toolkit secrets-rotation [flags]

Flags:
  -d, --DBClusterId string   [Required]  DBClusterId of the subject service
  -u, --DBUser string        [Required]  a subject DBUser
  -h, --help                 help for secrets-rotation

It was intended to complete a setting of Secret Rotation by registering the information to Secrets Manager after acquiring a combination of DBClusterID and DBUser as designated from DynamoDB.

We could achieve the following with this:

  • Execute secret rotation within the company’s defined governance limits.
  • Align rotation timing with the schedule set by users registered in the same DB Cluster.

We completed what we had decided finally by doing all these.

Conclusion

Here’s what we have achieved:

  • We developed a mechanism to detect and notify relevant product teams about the start, completion, success, or failure of a secret rotation.
    • This involved creating a system to detect CloudTrail PUT events and notify appropriately.
  • Ensure recovery from a failed secret rotation without affecting the product.
    • We prepared steps to handle potential issues.
      • We found that understanding how Secret Rotation works helps minimize the risk of fatal errors.
  • Execute secret rotation within the company’s defined governance limits.
    • To develop a mechanism for SLI notification.
    • We implemented a mechanism to perform secret rotation within the company’s defined governance limits.
  • Synchronize rotation timing with the schedule set by users registered in the same DB Cluster.
    • We developed a mechanism to store cron expressions to DynamoDB as an equation for setting to Secret Rotation in the unit of DBClusterID.
  • Monitor compliance with the company’s governance.
    • To develop a mechanism for SLI notification.

The whole image became like this as follows:

The whole image

It is more complex that we imagined. In other words, we can say that we had thought a managed Secret Rotation too simple in a sense.

The function of Secret Rotation provided by AWS is very effective if you simply use it.

However, we discovered that we needed to prepare many elements in-house because the out-of-the-box solution did not fully meet our requirements. We went through numerous trials and errors to reach this point.

We aim to create a corporate environment where everyone can use the KTC database seamlessly with the Secret Rotation mechanism we've developed. We also we strive to ensure that the database can be used safely and continuously.

KINTO Technologies’ DBRE team is currently recruiting new team mates! We welcome casual interviews as well. If you're interested, please feel free to contact us via DM on X. In addition, we wish you to follow our corporate exclusive X account for recruitment!

Facebook

関連記事 | Related Posts

We are hiring!

【DBRE】DBRE G/東京・名古屋・大阪

DBREグループについてKINTO テクノロジーズにおける DBRE は横断組織です。自分たちのアウトプットがビジネスに反映されることによって価値提供されます。

【プラットフォームエンジニア】プラットフォームG/東京・大阪

プラットフォームグループについてAWS を中心とするインフラ上で稼働するアプリケーション運用改善のサポートを担当しています。