KINTO Tech Blog
AWS

How We Reduced AWS Costs by 65%—and What We Discovered Beyond That

Cover Image for How We Reduced AWS Costs by 65%—and What We Discovered Beyond That

This article is the entry for day 23 in theKINTO Technologies Advent Calendar 2024🎅🎄

1. Introduction

Hi there! I’m Nishida, a backend engineer at KINTO FACTORY.

Today, I’d love to share how we managed to slash our AWS costs.

2. What Made Us Start Working on Cost Reduction?

At KINTO Technologies, we use Amazon QuickSight to visualize our AWS usage fees, making it easy to track costs for each product.

A little while after launching KINTO FACTORY, we casually checked the project’s costs, and to our surprise, it turned out to be the second most expensive product in the entire company. We certainly didn’t anticipate incurring such high costs so early on. This "Wait, what?!" moment served as the catalyst for our applications team to spring into action and begin working on cost reduction efforts.

3. What We Actually Did

Now, let’s dive into the specific cost-cutting steps we took.

What We Actually Did

When we broke down the costs, ECS Fargate stood out as the clear front-runner in expenses. Not too surprising, since KINTO FACTORY’s applications run on ECS Fargate but still, we figured there had to be ways to optimize it.

The first thing that caught our eye was: "Wait... are the number of instances in the development environment the same as production?" That definitely didn’t make sense. The development environment shouldn't require nearly as much computing power. So, we reduced the number of instances in the development environment to only those that were absolutely essential.

Upon further investigation, we discovered that there is another Fargate launch type called Fargate Spot that is different from the usual one. Fargate Spot is a system that lets you tap into unused AWS resources, offering discounts of up to 70% compared to the regular Fargate. Honestly, why wouldn’t you use it?

Fargate Spot allows Amazon ECS tasks that can handle interruptions to run at a much lower cost. It works by running tasks on AWS’s spare compute capacity. When that capacity is needed elsewhere, tasks are interrupted, with a two-minute warning that is provided to wrap things up.

-- https://docs.aws.amazon.com/en_us/AmazonECS/latest/developerguide/fargate-capacity-providers.html

That said, as noted in the documentation, Fargate Spot relies on spare compute capacity, so tasks may occasionally be interrupted if AWS reclaims those resources. Applying this to the production environment wasn’t really an option, so the settings were adjusted for the development environment, where occasional interruptions wouldn’t cause any issues.

fargate_spot

Automating Startup and Shutdown of the Development Environment

The development environment had also been running 24/7, just like production environment. To avoid unnecessary costs, it was set up to automatically shut down during off-hours — late nights, weekends, and holidays — when it wasn’t needed.

Step Functions were used to automate the start-up and shutdown processes for both the application and the database.

stepfunctions

In the first stage, EventBridge is used to send start and stop triggers to Step Functions based on schedules set with cron.

One key point: database startup takes time. Starting ECS right after the DB starts can lead to connection errors. To prevent that, the system first checks the DB status before launching ECS.

For reference, here’s a sample code (YAML file) showing the Step Functions workflow:

    "DB Startup": {
      "Type": "Task",
      "Parameters": {
        "DbClusterIdentifier": "${db_cluster_identifier}"
      },
      "Resource": "arn:aws:states:::aws-sdk:rds:startDBCluster",
      "Next": "Wait"
    },
    "Wait": {
      "Type": "Wait",
      "Seconds": 300,
      "Next": "Checking DB status after startup"
    },
    "Checking DB status after startup": {
      "Type": "Task",
      "Parameters": {
        "DbClusterIdentifier": "${db_cluster_identifier}"
      },
      "Resource": "arn:aws:states:::aws-sdk:rds:describeDBClusters",
      "Next": "Check if startup is complete"
    },
    "Check if startup is complete": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.DbClusters[0].Status",
          "StringEquals": "available",
          "Next": "Start up each service"
        }
      ],
      "Default": "Wait"
    },

Moving to a Serverless Architecture

Next, while reviewing the batch processing for a certain feature, it became clear that the process (which took less than a minute) was running as a resident service on ECS.

This setup wasn’t exactly cost-friendly too, so we moved to a serverless architecture using Lambda. Lambda, with its pay-as-you-go model based on request count and execution time, is perfect for short, quick processes or tasks that don’t need to run constantly.

serverless

4. Cost-Saving Results

The efforts paid off quickly, with costs dropping by 65% compared to the peak. Less than half; who would’ve thought?

aws_cost

Seeing just how much room there was to cut costs was honestly surprising.

5. Conclusion

This time, the focus was on sharing cost reduction efforts.

The steps taken weren’t complicated or anything fancy, but they still delivered solid results.

Along the way, some valuable insights emerged. Cost reduction isn’t just about saying, “Look, we saved money!”

  • By reviewing and eliminating unnecessary resources, we were able to gain better visibility into the entire system.
  • It also provided a great opportunity to revisit the architecture.
  • Asking questions like, "Is this resource really necessary?" naturally became part of the process.

That mindset shift brings long-term benefits, making the effort more than worth it.

Hopefully, this can be helpful for anyone facing similar challenges.

Facebook

関連記事 | Related Posts

We are hiring!

【クラウドエンジニア】Cloud Infrastructure G/東京・大阪

KINTO Tech BlogWantedlyストーリーCloud InfrastructureグループについてAWSを主としたクラウドインフラの設計、構築、運用を主に担当しています。

【クラウドエンジニア(クラウド活用の推進)】Cloud Infrastructure G/東京・大阪

KINTO Tech BlogWantedlyストーリーCloud InfrastructureグループについてAWSを主としたクラウドインフラの設計、構築、運用を主に担当しています。

イベント情報

Mobility Night #3 - マップビジュアライゼーション -