From a New Graduate Data Analyst to a Data Engineer
Introduction
Hello. I am Koike, a data engineer in the Analytics Group. At my previous job, which I started as a new graduate, I mainly performed analyses for service growth, but in my current position, I am developing a data analysis platform. To put it simply, I changed my career from a data analyst to a data engineer. In this article, I would like to talk about my journey from being a data analyst to a data engineer.
Data Analysts and Data Engineers
As there might be some who are not familiar with the roles of each, I'd like to begin by outlining the responsibilities of a data analyst and a data engineer. The responsibilities of each role are shown in the diagram below.
Data engineers are responsible for preparing data for data analysts to aggregate and analyze. Their specific tasks are as follows.
- Acquiring data from other systems and data sources.
- Processing data obtained in 1 into a form that is easy for data analysts to use.
- Delivering data processed in 2 to allow data analysts to access it.
In contrast, a data analyst is responsible for aggregating and analyzing the data and suggesting how to improve the business. Their specific tasks are as follows.
- Aggregating data prepared by data engineers using SQL or other tools.
- Analyzing the data aggregated in 1.
- Suggesting how to improve the business based on the results of the analysis in 2
Or, they may do the following tasks.
- Aggregating data prepared by data engineers using SQL, or other tools.
- Summarizing the data aggregated in 1 in a dashboard and creating an environment where data can be observed at a fixed point
I hope this gives you a general idea. With this in mind, in this article, I would like to show you how I changed my career from a data analyst to a data engineer and started working as the latter. I hope this information will serve as a reference for those considering a career transition to become data engineers.
The Data Architecture of KINTO Technologies
Before talking about data platform development, I will explain the company's data architecture first. It is mainly composed of AWS services, and the general flow of data is as follows.
- Using a service called Glue, data obtained from external sources is converted, processed, and stored in S3
- An SQL query is performed on S3 data using a service called Athena
Our data engineers mainly do part 1, creating an environment in which various data can be aggregated and analyzed in Athena.
Glue Workflow Development
Now that you understand the data architecture, I will talk about the Glue workflow I developed as my first step from data analyst to engineer. Glue has three main features.
- Job: Function that does preprocessing of analyses (data extraction, conversion, loading)
- Crawler: Function that creates metadata in the Data Catalog.
- Trigger: Function that runs jobs and crawlers manually or automatically
A Job is the function that performs preprocessing for an analysis. For example, you can define a series of processes such as reading, processing, and outputting CSV data as a single job. The Crawler has the function of creating metadata in the Data Catalog. Basically, you can define data types such as table input/output formats and column names, and put them in a box called a Data Catalog. A Trigger is the function that runs a Job or Crawler manually or automatically. It can be executed at a fixed time every day, or when the previous job is successfully completed. In addition, a workflow combines these three processes into a series of processes to make them easier to manage.
By developing a workflow that allows data from external sources to be aggregated in Athena, I got a general understanding of the data flow and took my first step as a data engineer. By the way, when I was a data analyst in my previous job, I was in an environment where data engineers aggregated and analyzed formatted data, so I did not pay much attention to how the data was created. However, during development, I was able to better understand how triggers and jobs are combined and other parts of data preprocessing. It was a great experience.
Comparing the Skill Set of Analysts and Engineers
So far, I have talked about the job of a data engineer, but I would like to outline the skill sets that are required for data analysts and data engineers.
The Data Analyst Skill Set
- Analysis and design
- Aggregation
- Analysis
- Explaining results of analyses
The first skill required by a data analyst is the ability to analyze and design. For example, if a marketer says, “I want this data.” You can just output data as you are told, but doing so could lead to rework. Therefore, it is necessary to clarify the original purpose by asking why they want you to output that data, and determine what kind of data you can output and analyze to achieve that purpose. That is analysis and design. Next is the ability to aggregate. This refers to extracting data that you want using SQL or other tools. It is surprisingly difficult to learn how to check figures to check there are no mistakes in the extracted data written with SQL, or how to write SQL with few mistakes. Next is the ability to analyze. Basically, it is the ability to think logically without subjectivity. To do this, you may need to know about statistics, machine learning, and so on. The last skill is the ability to explain the results of analyses. No matter how sophisticated your analysis is, it has no value unless it can be applied to the business. It only has value when you explain it properly to the decision makers and get them to understand it.
The Data Engineer Skill Set
- Data pipeline Design
- Code design
- Data processing
The first is the ability to design data pipelines. To compare with what I have explained in this article, you can think of it as the ability to determine how to configure the data processing workflow to produce the data you are looking for. Next is the ability to design code. I think this true for all engineers, not just data engineers, but coding does not end with writing and may need to be modified later. Therefore, it is important to write code that is easy to maintain. The last skill is the ability to process data. We mainly use SQL and Python, it is necessary to be able to handle them sufficiently.
We have now compared the skill sets necessary for a data analyst and a data engineer.
Future Outlook
So far I've told you about half a year of my experience since I changed my career from a data analyst to a data engineer. Looking back, I have been able to grow my skills as a data engineer little by little, but now I feel like I am losing my perspective as a data analyst because I'm concentrating too much on development. Therefore, in the future, I would like to remember to create a platform that is easy for users to use. No matter how beautiful the data platform is for developers, it has no value to the business unless the data analyst effectively communicates its output to the business side. In order to avoid this, I will use my experience as a data analyst and as a data engineer and work hard every day to be able to connect data to value from start to finish!
関連記事 | Related Posts
A Look into the KINTO Technologies Analysis Group
KINTOテクノロジーズ 分析グループを紹介します
AWS Glueを用いた自動拡張型ETLでスピーディな分析基盤を構築
The need for DBRE in KTC
Transforming Development Methodologies: Our Journey from Waterfall to Agile in Crafting the Prism Japan App
Building a culture of MLOps by holding a SageMaker Study Session (4/4)
We are hiring!
【データエンジニア】分析G/名古屋・大阪
分析グループについてKINTOにおいて開発系部門発足時から設置されているチームであり、それほど経営としても注力しているポジションです。決まっていること、分かっていることの方が少ないぐらいですので、常に「なぜ」を考えながら、未知を楽しめるメンバーが集まっております。
【データサイエンティスト(リーダークラス)】分析G/東京・名古屋
分析グループについてKINTOにおいて開発系部門発足時から設置されているチームであり、それほど経営としても注力しているポジションです。決まっていること、分かっていることの方が少ないぐらいですので、常に「なぜ」を考えながら、未知を楽しめるメンバーが集まっております。