Optimized Performance S3 Naming

      No Comments on Optimized Performance S3 Naming

Today, I failed on AWS Dev Associate certification. I wasn’t feel sad but I regretted myself. There are many questions about basic knowledge that was appeared in my test.

I got 2 questions about the best practice rule when you use S3 to store the large number of files. That is best practice of Key Name.

When: You expect a rapid increase in the request rate for a bucket to more than 300 PUT/LIST/DELETE requests per second or more than 800 GET requests per second.

But in my thinking, you should apply this as more as possible, it will make you always think about the best way.

Crop on Keyword: AWS describes: “Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored in UTF-8 binary ordering across multiple partitions in the index.The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. If you introduce some randomness in your key name prefixes, the key names, and therefore the I/O load, will be distributed across more than one partition.”

Principle: What you need to care just is “the key name sequence avoidance“.

I’m developer that’s I like to focus the easiest way for developer. That is clearly example.

BAD Naming

examplebucket/2013-26-05-15-00-00/cust8474937/photo1.jpg
examplebucket/2013-26-05-15-00-00/cust1248473/photo2.jpg
...
examplebucket/2013-26-05-15-00-01/cust1248473/photo1.jpg
examplebucket/2013-26-05-15-00-01/cust1248473/photo2.jpg

GOOD Naming

Case 1: Add a Hex Hash Prefix to Key Name: strongly recommend using a hexadecimal hash as the prefix

Pattern: s3://BUCKET_NAME/[GROUP_NAME_OPTIONAL]/hexadecimal[-_]FILE_NAME.extention

Sample:

 examplebucket/7b54-2013-26-05-15-00-00/animation1.obj
 examplebucket/921c-2013-26-05-15-00-00/cust125/animation2.obj
 examplebucket/animations/7b54-2013-26-05-15-00-00/cust385/animation1.obj
 examplebucket/animations/921c-2013-26-05-15-00-00/cust124/animation2.obj
 examplebucket/videos/ba65-2013-26-05-15-00-00/video1.mpg
 examplebucket/videos/8761-2013-26-05-15-00-00/video2.mpg

Case 2: Reverse the Key Name String: if the GROUP_NAME are incremental sequence ID, you can reverse to get the best random key name but still keep your ID.

Pattern: s3://BUCKET_NAME/[REVERSE_ID_STRING]/[GROUP_NAME_OPTIONAL]/FILE_NAME.extention

Sample:

Normal:
 examplebucket/2134857/data/start.png
 examplebucket/2134857/data/resource.rsrc
 examplebucket/2134858/data/start.png
 examplebucket/2134858/data/resource.rsrc

Optimized:
 examplebucket/7584312/data/start.png
 examplebucket/7584312/data/resource.rsrc
 examplebucket/8584312/data/start.png
 examplebucket/8584312/data/resource.rsrc

Hope you get closer with AWS.

 

Reference:

S3 request-rate-perf-considerations

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.