A to Z with Amazon S3

In this article, I'm going to cover securing, storing, and serving assets with S3 and CloudFront.

I'm also going to take a bit of a deeper dive into AWS with serving our assets through a CNAME subdomain (ex. cdn.jackrothrock.com), using signed urls for downloads, as well as reducing hot linking using WAF.

For those who like watching videos, here's a video I've created showing how to set this up. It's definitely viewed best at 2X speed.

For those of you who like text and pictures, follow along below. There's probably more information in the video, but the text closely matches it.

GitHub Repository with files: https://github.com/jrothrock/a-to-z-with-amazon-s3


What is S3?


S3 stands for simple storage service, which really is just an object storage service. This method is used by many large corporations including Microsoft's Azure BLOB - with BLOB meaning Binary Large Object.

So that's cool, but what are the other ways of storing data? Why this particular method?

Another method is file system storage, which is what usually comes as the default when starting new Web Apps, as well as what operating systems use when storing files.

Then there is block storage which is usually found in SAN, but can abe used for special applications like databases and virtual machines. Some examples of companies using block storage include Google Cloud and Digital Ocean's Storage product.


Creating our S3 Bucket.



There are few things we need to understand before we can dive deeper into setting up S3. First, S3 is comprised of buckets. One can have an unlimited amount of buckets, and one won't be charged for them - you get charged for the amount of storage space you take up.

You can also name the buckets anything, but the names have to be unique. What I mean by this, is that we can't just have a name bucket of "test" - as this has most likely been used in the past. A great way to choose a unique name is just to use our domain name. However, we can't include the .(com/org/net/io) in the bucket name, as putting a period (.) in the name will cause our bucket to not be covered by Amazon's SSL Wildcard.

If you're looking to serve assets from a subdomain - say cdn.mywebsite.com - we can set that up using a CNAME to our CloudFront distribution, but for now let's just stick to creating a simple bucket name.


Setting up our bucket



Once we are in the Amazon Web Service portal, in the S3 section we are going to click on "Create Bucket".

Once the screen pops up, we have to create a bucket name - try NOT to include periods as this will cause the subdomain to NOT be covered by Amazon's Wildcard SSL - as well as a location for the bucket.

I'm going to choose jackrothrock - for this blog - as well as the location of US West as that is the closest to where I reside.

Setting Our Bucket Name

For now we aren't going to mess with any other settings at this time. So, we are just going to continuously click through the "Set Properties" page, the "Set Permissions" Page, as well as the review page.

Once we have clicked "Create Bucket" on the review page, we should now see our newly created bucket in our list of buckets.

Setting Our Bucket Name

*While not neccessary, I'm going to follow the above steps again and create another bucket called "jackrothrock-logging" which will be used for logging.


Setting up our bucket.



If we click on our bucket, we should see a modal popup which shows three sections - properties, permissions, and management. To secure our bucket, we will need to click on the permissions section, which will bring us to the access control area of our bucket.

While this area is important, we are going to leave it alone for now.

If we tab over between "Access Control List", "Bucket Policy", and "CORs Configuration", we will see that our bucket policy is empty, while our CORS Configuration already has some default settings.

The default settings for the CORS section basically says that anyone from any IP address can do a get request to the content in bucket. Their browser will then cache these settings for 3000 seconds.

Heading back over to bucket policy we again note that it is empty. This basically means nothing can be done with this bucket outside of the dashboard . What I mean by this, is if we upload an image through the dashboard, we won't be able to view it as we haven't set up a bucket policy to allow so.

If we scroll down to the bottom, we will see "Policy generator" next to documentation. Let's select that.

Once on the policy page, we need to change the "Select Type of Policy" to be "S3 Bucket Policy". Beneath that, we will see that AWS service has changed to be Amazon S3.

In the next section, Principle, we are going to place a * which basically just means wildcard - or in the case anyone.

Since we only want to give everyone access to performing a GET request on our S3 bucket's objects, we're going to select "getObject".

Beneath that, we need to place arn:aws:s3:::/ --- or in my case arn:aws:s3:::jackrothrock/*.

The * basically just means apply this bucket policy to the bucket of jackrothrock and all folders within it.

###

If we were doing logging in the same bucket, we could set the Amazon Resource Name to something like:
arn:aws:s3:::jackrothrock/content/*

###

We should end up with a form that looks like this:

Setting Our Bucket Name

Let's then click "Add Statement" - and then "Generate Policy". Let's copy the generated policy and drop it back into our bucket policy and save it.

Sweet. So now we, and everyone else, can perform a getObject request on our object.

We can see this if we upload a photo to our bucket, then select the image and go to the link that it generates.

So this is great, but what if we want to add assets to our bucket from outside the S3 dashboard, like adding photos to our bucket that users upload from our website? Well, we could change the "Action": "s3:GetObject" to "Action": "s3:*".

The only issue with this, is that anyone can do any action to our bucket's folders and objects - such as a post or delete. That's not really secure.

What we really want is for a single "user" to have access to this, then we use that "user" to sign off on requests that users want to preform on the bucket. Other than a getObject request of course.

This is where IAM comes into play.


Setting up IAM.



In order for us to designate which users we want to have access to certain S3 privileges, we need to dive deeper into Amazon's IAM service.

Amazon's Identity and Access Management (IAM) "enables you to securely control access to AWS services and resources for your users." That's the crux of it really.

When we head into the IAM console we should see a Dashboard that shows how many users, groups, and roles we have.


Creating an IAM Group


On the left side let's go to groups and create a new group.

I'm personally just going to set the name to be jackrothrock, and on the apply filter page, I'm just going to search for S3 and add the "AmazonS3FullAccess", then create the group.

Cool. We now have a group, but we need to a add some/a user(s) to the group.


Creating an IAM user



On the left side, let's select the users link and add a user.

After having selected the "Add User" button, let's fill out a username, in my case I'll use "jack", and select the programmatic access as this will generate an access key and secret which can be used in the API, CLI, SDKs, and other third party software such as saws.

On the next page, let's add our user to the group we just created, and continue through to the review page, and then create that user.

  • by doing this, this user will inherit all of the policies that group contains.

Once we've created the user we will be met with a page that says we have succeeded, and displays our Access Key ID as well as our secret.

Once we've either downloaded the credentials.csv, or copied the keys, we can close out of the page.


Securing our bucket.



Heading back to our bucket, we should double check to make sure that currently, our bucket only allows people to access objects - instead of having full S3 actions.

Now, let's create a policy where we will harness the power of an IAM user to perform S3 actions from outside the S3 dashboard.

Open up a new tab, and go back to IAM. Once at the dashboard, go to the users section and select the user we created.

At the top we should see a ARN for that user. We need to copy that as we're going to set that as our principle in our bucket policy.

User ARN

Back in our bucket policy, we now need to add a section giving our IAM user all S3 permissions. This means permissions to not only the buckets folder and objects "Resource": "arn:aws:s3:::jackrothrock/*" but to the bucket itself "Resource": "arn:aws:s3:::jackrothrock".

Once we've add that section, our policy should now look something like this:

{
  "Id": "Policy1505089947843",
  "Version": "2012-10-17",
  "Statement": [
     {
            "Sid": "AddPerm",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::530120981511:user/jackrothrock"
            },
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::jackrothrock/*",
                "arn:aws:s3:::jackrothrock"
            ]
    },
    {
      "Sid": "Stmt1505089275790",
      "Action": "s3:GetObject",
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::jackrothrock/*",
      "Principal": "*"
    }
  ]
}

What this is effectively saying is that anyone can get objects from our bucket, but the IAM user jackrothrock can perform any action to the bucket and the folders/files within.


Setting up CloudFront



Before we go any further, I have to say that in order to use a CDN with S3 - with SSL - you have to use CloudFront (see comments). With that being said, let's see how we can setup CloudFront to A) serve assets from a subdomain for our website - cdn.jackrothrock.com and B) setting it up so that all GET requests have to go through CloudFront. This means that all S3 GET requests that don't go through our CNAME will be blocked.

Once we're in the CloudFront console, we will create a new distribution. On the first page let's go ahead and select the web option.

Once on the web option's page, we will see a whole host of options for our distribution.

In the origin domain name, once we have clicked on it, a dropdown will appear showing all of our S3 buckets - for my case I'm going to select "jackrothrock.s3.amazonaws.com".

After that, toggle the "Restrict Bucket Access" to yes. This will force all S3 requests to go through CloudFlare. This is nice as it will allow us to A) create a CNAME for our CloudFront endpoint, B) it's cheaper, and C) we can use signed URLs for downloads and such.

Once you've toggled "Restrict Bucket Access", we need to set "Origin Access Identity" to "Create a New Identity", and then have the "Grant Read Permissions on Bucket" set to "Yes Update Bucket Policy". Once we're done setting up CloudFront, we will go back to our bucket to block all requests that aren't sent through CloudFlare.

Leave the rest of first section fields as is, unless you want/need to include custom headers.

Cloud Front Settings

In the Default Cache Behavior Settings area, we should redirect http to https, allow "GET, HEAD, OPTIONS" and leave the rest of the "Default Cache Behavior Settings" as is...

If you want to ONLY use signed links, select "Yes" for "Restrict Viewer Access (Use Signed URLs or Signed Cookies)". If you only want to have some of your links be signed and others not - like you want to serve photos but still have downloads - you can select "NO", and later on I will show you how to do signed links using the Ruby AWS-SDK.

In the Distribution Settings, choose your pricing area. The fact of the matter is, when looking at Cloud Front pricing, and the pricing regions, Australia and India are nice additions to have, but by doing so, you also include South America - the most expensive region by far. Decisions. Decisions.

For my case, I'm going to select All and just hope I don't get peppered with too many South American requests. To anyone residing in South America reading this, please use a VPN. That way we can line our pockets.

Leave AWS WAF (Web Application Firewall) to none for now, but we will comeback to this a little later.

In CNAMEs, let's add the endpoint subdomain that we're going to use for our site - in my case it's going to be cdn.jackrothrock.com.

In the SSL certificate section, we're going to select the button that says, "Request or Import a Certificate with ACM."

######

Now before we move any further, there are few things we need to do. First off, we need an A reference for our endpoint, cdn.jackrothrock.com. It can point to anything, but it needs to exist so that Amazon can then check who registered the domain.

If you use a mail service, and have MX records, create an email alias for admin@yourwebsite.com, as this is the email that Amazon will send an email to in order to verify that you're in fact the domain owner.

######

Once in the certificate manager console, let's add our subdomain endpoint - in my case cdn.jackrothrock.com. Then click "Review and request", then "Confirm and request".

In a few seconds, you should receive an email from Amazon asking you to confirm the request. If you haven't received an email, check the box on the left side of the certificate, then click the "Actions" button, and hit "Resend validation email".

Once you've opened up the email, have selected the link, and the page has loaded, click through the "I Approve". If you reload the certificate manager page, you'll see that the status has now changed to issued.

Exiting out of the tab and going back to the tab where we are creating our CloudFront distribution, we can now see that the "Custom SSL Certificate (example.com)" is enabled. So let's select that and double check to make sure that the drop down is selected for the certificate that we just created.

Selecting Our Created SSL Certificate

Now, by doing this we are only going to be able to support Web Browsers which have support for Server Name Indication. Lucky for us, caniuse states that SNI is supported in 97% of currently used browsers - including IE9! Woo!

SNI Supported Browsers

If we are using the same bucket for logging, we need to set "Default Root Object" to the specific folder that we want CloudFront to distribute our assets from. If we used a separate bucket, we can leave this field as is.

Let's turn logging on, and set it to the bucket that we want to house our logging. I'm going to specify a Log Prefix of CloudFront, which will place all logs in a folder of CloudFront. For now, I'm going to leave cookie logging to off.

Once I've done that, I will create the distribution.

If you get redirected to a "How-to Guide", on the left side at the top, click the distributions link.

When looking at our newly created distribution, you will see that it's current state is "in-progress". It'll take around 5-15 minutes for this to resolve. If you haven't completed your AWS-Certificate, this state will remain in-progress.

Heading back over to our bucket, we will find that CloudFront has now added the rule that assets can be accessed through the CloudFront unique identity, or endpoint:

Now that this is done, let's go ahead and delete the statement that allows a getObject request from anywhere outside CloudFront.

Once that's deleted, our policy should look like this:

{
  "Id": "Policy1505089947843",
  "Version": "2012-10-17",
  "Statement": [
      {
        "Sid": "1",
        "Effect": "Allow",
        "Principal": {
            "AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity E1PVWZ64R6WX7U"
        },
        "Action": "s3:GetObject",
        "Resource": "arn:aws:s3:::jackrothrock-test/*"
    },
    {
        "Sid": "AddPerm",
        "Effect": "Allow",
        "Principal": {
            "AWS": "arn:aws:iam::589174822833:user/jack"
        },
        "Action": "s3:*",
        "Resource": [
            "arn:aws:s3:::jackrothrock-test/*",
            "arn:aws:s3:::jackrothrock-test"
        ]
    }
  ]
}

So now our bucket allows people to upload assets to it through our IAM user, when we sign off on it of course, and we are distributing the assets through CloudFront.

We've almost made it full circle. The one thing we probably want to do it restrict hot linking. Now, there may be a better way of doing this, and if you do know a better way please write a comment in the section below.

Remeber when we passed up on the "AWS WAF Web ACL", and left it as none? We're now going to head over into the Amazon Web Application Firewall and set up some rules for some of our headers.

So once we have selected the "WAF & Shield" we may be greeted with a tutorial page, if so, create a Web ACL.

###

If you're not greeted with this page, and more of a tutorial, click the "Create Web ACL". Make sure that the filter is set to CloudFront, or it won't create the Web ACL for that region - in our case CloudFront.

###

If you do see this page, also select the "Create Web ACL" button in the top left.

For the "Web ACL name" I'm going to fill in my website name, and just append rules to it. The "CloudWatch metric name" should look just like the "Web ACL name" but with all special characters and spaces removed.
We then need to set the Region to "Global (CloudFront)" and then select our CloudFront distribution.

Continuing, I'm going to scroll down and create a string match condition. For the name, I'm just going to use something generic such as "Match Domain".

For the "Part of the request to filter on" I'm going to select Header. From the "Header" field I'm going to select Referer. For the "Match type" field I'm going to set that as "Starts With". For Transformation I'll set that to None, I'll leave "Value is base64 encoded" unchecked.

Last but not least, I'll set the "Value to match" as the referrer. In my case, "https://jackrothrock.com/". Once those are all filled in, I'm going to check the add filter, and it should then looking something like

What String Match Condition Should Look Like

If it looks good, let's go ahead and create it. Once it's created, we should see a little green box next to "String match conditions" area.

Now that it's created, let's continue on.

So now we have to add the rule we just created to our Web ACL. So let's select the "Create Rule" and fill in a name. Again, I'm just going to do something generic - "Wed Domain Rule". I'm going to keep the "Rule type" as "Regular rule" and continue on.

In the "Add Conditions" section I'm going to keep the first select as is (with), I'm going to change the second select to "match at least one of the filters in the string match condition", and I'm going to select the rule we just created. I don't have any other rules created, so then I'm going to click create.

I should now see a "Rule Created Successfully" box. In the rule we just created we need to set the Action to Allow, and for the "Default action", I'm going to select "Allow all requests that don't match any rules".

What String Match Condition Should Look Like

Once that's done, I'm now going to "Review and create", then "Confirm and create".

Now if we try to visit an image on our CloudFront distribution, we won't be able to view it as we don't have an origin header of the one we set. Yet if we place that same image on our site, we'll be able to view it.

There you have it. We have created an S3 bucket that is not only secure, but serves our content globally through CloudFront that meet specific requirements/rules.


Discussions

Reddit: https://www.reddit.com/r/webdev/comments/70wipy/a_to_z_with_amazon_s3