De-mystifying AWS S3 Usage

AWS S3 is a fantastic resource for cloud object storage! The only complaint that I often hear is in the lack of transparency to understand current usage. Although there are monthly costs, its sometimes pretty hard to see exactly where you're using space most heavily.

Problem

Because S3 is an object storage engine, your files are not stored hierarchically or registered centrally - as with a traditional file system - which essentially means that each file stored on S3 is an individual entity. Although you can see them in a pseudo hierarchical view on the S3 dashboard for instance, this is actually simply a rendering of each file's path metadata. This means that there is no built-in traversal of files and directories, and no easily accessible way to see total space usage for a given bucket or bucket folder.

Solution

AWS CLI to the rescue! Using the modern cli we are able to query our buckets for more information.

NOTE: If this is your first time using the cli, you need to follow the setup instructions that includes how to install and authorize your cli.

$ aws s3 ls s3://assets.cloud66.com
PRE dir1/  
PRE dir2/  
PRE dir3/  
...

This isn't massively useful just yet, luckily there is a --recursive option that will loop through every object in the bucket!

$ aws s3 ls s3://assets.cloud66.com --recursive
...
2016-08-11 12:44:38          0 dir1/  
2016-10-04 10:15:37          0 dir1/images/  
2016-10-04 10:20:09      21394 dir1/images/load_balancer_dns.png  
2016-10-04 10:20:08      40650 dir1/images/load_balancer_icon.png  
2016-10-04 10:20:09      19940 dir1/images/load_balancer_page.png  
...

We're getting somewhere! Next we can apply some grep magic to exclude non-file results:

$ aws s3 ls s3://downloads.cloud66.com --recursive  | grep -v -E "(Bucket: |Prefix: |LastWriteTime|^$|--)"
...
2016-08-11 12:44:38          0 dir1/  
2016-10-04 10:15:37          0 dir1/images/  
2016-10-04 10:20:09      21394 dir1/images/load_balancer_dns.png  
2016-10-04 10:20:08      40650 dir1/images/load_balancer_icon.png  
2016-10-04 10:20:09      19940 dir1/images/load_balancer_page.png  
...

The next step is to total up the sizes from our results. For this we can use the handy awk command:

$ aws s3 ls s3://downloads.cloud66.com --recursive  | grep -v -E "(Bucket: |Prefix: |LastWriteTime|^$|--)" | awk 'BEGIN {total=0}{total+=$3}END{print total/1024/1024" MB"}'
2631.37 MB  

And there you have it - a single-line command to calculate the total AWS S3 space usage for a bucket or bucket folder

...and now to delete some stuff we don't really need anymore!

Acknowledgements
  • StackOverflow