KINTO Tech Blog
General

What are UUIDs and which version should you use?

Cover Image for What are UUIDs and which version should you use?

What are UUIDs and which version should you use?

Recently, we had to respond to an incident: a service was down because of duplicate keys in its database. My team and I were scratching our heads because these were UUIDs - you know, those supposedly 'unique' identifiers. How could we have duplicates?
Turns out the issues were caused by the service trying to add the same event twice, not that the same UUID was generated twice. This incident got me thinking about UUIDs. What they are? How are they generated? What are their use cases? And most importantly, which version should you use?

What is a UUID?

UUIDs are usually used to provide an ID for a resource. UUID stands for "Universally Unique IDentifier". Looking at the name, there seems to be strong expectations of uniqueness for the values being generated. That's with good reasons: even if we generated a huge amount of UUID, for instance a few quadrillions UUID (it's what comes after trillions), there is a 99,999% chance of them being unique.
If you are interested in the math behind these odds, I recommend reading this really great article.

UUIDs are 'practically unique' rather than 'guaranteed unique.' The probability of a collision is so small that for most applications, it's more likely that your hardware will fail or that a cosmic ray will flip a bit in your machine's memory than it is that you'll experience a UUID collision.

However, it's worth noting that these probabilities assume proper random number generation. If your random number generator is flawed or predictable, the actual probability of collisions can be much higher. I'll explain bit more later in the article.

If you work in software, you probably already know what a UUID looks like, but just in case: UUIDs are 128 bits wide and composed of 5 parts, separated by hyphens. They are usually represented using hexadecimal digits, looking something like this:

  • ccba8c00-cbed-11ef-ad79-1da827afd7cd
  • 74febad9-d652-4f6b-901a-0246562e13a8
  • 1efcbedf-13bf-61e0-8fb8-fe3899c4f6f1
  • 01943a0e-dd73-72fd-81ad-0af7ce19104b

But wait! These UUID were actually generated using different version of UUID! In the list of UUID above, they are generated using the following versions in this order: version 1, version 4, version 6, version 7. Try to figure out where the version is indicated in the UUID.

Hint: it's somewhere in the middle.

Hopefully you noticed that the version of the UUID is indicated in the first character of the third part of the UUID, right in the middle of the UUID. There is also a variant being indicated in the first character of the fourth part. The version is used to indicate how the UUID was generated and the variant is used to indicate the layout of the UUID, but you probably won't need to worry about the variant, the version matters the most.

UUID structure diagram

So as we discussed, there are multiple versions of UUIDs. Aside the version indicator that we discovered earlier, what are the differences between each version? Are they all equally able to generate unique UUIDs? Also, why would you use one version over another? Obviously, you should use the latest and greatest version of UUID, right? Very good question! Let's take a look at the different versions of UUID.

Version 1 and Version 6

Version 1 and 6 UUID are generated using the current time and the MAC address of the computer that generated the UUID. The timestamp part is located at the front of the UUID, and may include random bits or incremented counters depending on the computer's CPU. The MAC address part is located at the end, so if you use the same computer that part should never change. Interestingly because the MAC address can be retrieved from a UUID, there is a privacy risk when generating a UUID version 1 or 6. But that's also one of the pro of this version of UUID: two computers cannot generate the same UUID. That makes these versions useful in distributed systems where global uniqueness is needed.

The difference between version 1 and 6 is the order in which parts of the timestamps is used in the UUID. Unlike version 1, version 6 UUID can be sorted chronologically, which can be useful for ordering in databases.

UUID version 1 and 6 structure diagram

As version 1 and 6 uses predictable elements (the time of generation and the mac address), it is possible to guess a UUID, which makes it unsuitable for uses that requires the UUIDs to remain secret.

Version 2

Version 2 is similar to Version 1 in that both use a timestamp and the MAC address of the computer generating UUIDs. However, Version 2 also uses additional identifier data, namely the POSIX UID or GID. This makes Version 2 less random and use less of the timestamp than Version 1 and 6. As a consequence, there is a limited number of UUID v2 that can be generated at a given time, making it less desirable for most uses. It is rarely used and usually not supported by most libraries. It's also not documented in the UUID specification.

Version 3 and 5

Version 3 and 5 are quite different from the other UUID. While the other versions aim to be random, Version 3 and Version 5 aim to be deterministic. What does that mean? They both use hashing algorithms to generate the UUID, making the UUID reproducible. There is no randomness or timestamp used to produce the UUID, a given input should always produce the same UUID. Version 3 uses the MD5 hashing algorithm while Version 5 uses SHA1.

These versions are particularly useful when you need to generate the same UUID repeatedly from the same input data. For example, imagine you're creating UUIDs for users based on their email addresses - you'd want the same email to always generate the same UUID, even across different servers or times. Another good example would be when you need to generate a primary key based on some data to avoid duplicates, but using the data itself as the primary key is not a good option.

When choosing between Version 3 or Version 5, you should keep in mind that SHA1 is a little more secure but also more compute intensive. If that is a concern for your use case, you might want to use Version 3 to reduce usage of compute resource but most of the time you should pick Version 5, as it is much more secure. It's also more likely that you will experience a collision with MD5 than with SHA1, but the probability is still very low.

Version 4

Version 4 is the most widely used version of UUID. It uses random bits to generate the UUID, making them unique and unpredictable. It relies heavily on random number generation, but not all random number generators are actually capable of generating true random numbers. Shocking, I know.

Many programming languages use what's called a Pseudo-Random Number Generator (PRNG), which is fine most of the time, but for UUID generation you'll want to ensure your system uses a Cryptographically Secure PRNG (CSPRNG).

Why? A regular PRNG might be predictable if someone analyzes enough of its output. CSPRNGs, on the other hand are specifically designed to make predicting their output practically impossible, even if an attacker knows all previously generated values. Most modern UUID libraries use CSPRNGs by default but it's worth checking just to be sure.

Like for the other version, the only predictable part is the version indicator, so you could try impressing your friends by guessing that part.

They are great for most usage, generally when you need to generate a large amount of UUID and don't need to sort them or reproduce them later. They are often used as keys in databases.

Version 7

Version 7 is designed to be a chronologically sortable variant to Version 4. Like Version 4, it uses random bits but includes a timestamp, making the UUID sortable and unique. They can be a great alternative to Version 4 where you want uniqueness, but want to be able to sort them by creation time.

Version 7 also uses Epoch time for its timestamp, while Version 1 and 6 use the number of 100-nanosecond intervals since 15 October 1582. This makes Version 7 a little easier to work with.

Version 8

Version 8 is a bit special, because it is custom. Vendors can implement it how they wish. You can implement it yourself, and you just need to respect the UUID version placed in the third part of the UUID. You probably will never need to use it.

So, what should you use?

For most people, it will be version 4. It has the greatest guarantee of uniqueness and is relatively secure (as long as the random number generator is not predictable).
If you want to be able to sort you UUID by creation time, you can reach for version 7 or even version 6 as long as you don't have any privacy concern with leaking your MAC address.
For some cases, version 3 and 5 are useful, but for most applications their use is limited.

Database keys?

Maybe you've seen discussion about using UUID for database key, and there are a few facts that you should keep in mind if you are thinking of using UUID for your database keys:

  • UUIDs are large, they take up 128 bits. If you do not plan to store large amounts of data, the extra space taken for your UUID might be significant. Alternatively, a 32 bits auto incremented integer should give you about 2147483647 rows, and if that's not enough a 64 bits BIGINT goes up to 18446744073709551615. That should be enough for most use cases.
  • For some databases, if you use UUID for your keys, insert performance may suffer. If insert performance is a concern, you might want to consider using auto incremented integer, or at least test the performance of your database with UUID.
  • UUID make it easier to migrate data, as you will have collision when using an auto incrementing integer but probably won't have that issue with UUID.
  • Even if some UUID are sortable, they are not easy to read. Looking at two UUID, it's quite hard to know which one came first. That's quite minor but it's something to keep in mind.

Most database have some kind of module or function to generate UUID, so you can check the documentation of your database to see how to generate UUID. They will probably tell you there if there are some performance issue or special consideration to take into account when using UUID.

Conclusion

Hopefully you now understand UUIDs and their different versions a bit better than before reading this article.

Version 4 UUIDs remain the go-to choice for most applications. They have strong uniqueness guarantees and unpredictability, which is probably what you want from UUIDs. They're mostly used for database keys, distributed systems, and any scenario where you need globally unique identifiers without coordination.

Version 7 is a good alternative when chronological sorting is desirable, as it offers a good balance between randomness and sortability.

Version 1 and 6 are useful in distributed systems where global uniqueness is needed, but they come with privacy concerns due to the inclusion of MAC addresses.

Version 3 and 5 are useful when you need to reproduce the UUID from a given input, but keep in mind that MD5 is not as secure as SHA1.

If you plan to use UUID in your systems, keep in mind these factors when choosing UUID version choice:

  • Your uniqueness requirements
  • Whether chronological sorting is needed
  • Privacy concerns (especially if using versions that include MAC addresses)
  • Storage space constraints (maybe you don't need 128 bits for your keys)

While UUID collisions are theoretically possible, they're so improbable that they shouldn't be a primary concern in your system design - as long as you're using a proper implementation with a cryptographically secure random number generator.
If you do encounter a UUID collision (congratulations on defying astronomical odds!), it's more likely due to an application logic issue, like duplicate event processing, rather than an actual UUID generation collision. In such cases, focus on investigating your application's handling of unique constraints rather than questioning the UUID generation itself.

Facebook

関連記事 | Related Posts

We are hiring!

【バックエンドエンジニア】my route開発G/東京

my route開発グループについてmy route開発グループは、my routeに関わる開発・運用に取り組んでいます。my routeの概要 my routeは、移動需要を創出するために「魅力ある地域情報の発信」、「最適な移動手段の提案」、「交通機関や施設利用のスムーズな予約・決済」をワンストップで提供する、スマートフォン向けマルチモーダルモビリティサービスです。

生成AIエンジニア/生成AI活用PJT/東京・名古屋・大阪

生成AI活用PJTについて生成AIの活用を通じて、KINTO及びKINTOテクノロジーズへ事業貢献することをミッションに2024年1月に新設されたプロジェクトチームです。生成AI技術は生まれて日が浅く、その技術を業務活用する仕事には定説がありません。

イベント情報

Appium Meetup Tokyo #1 - モバイルE2Eテスト/自動テスト/ソフトウェアテストについてQAエンジニアが語りまくる夜 -
Developers Summit 2025【KINTOテクノロジーズ協賛】