Bash Wizardry for Asset Pipeline Issues

Quite often, the task of investigating voodoo wizardry behind asset pipeline issues falls to me - a case in point is this previous blog post where software versioning was the issue. Another time, it was faulty vendoring.

The usual place to start is to see what files are actually different on the servers whose assets are not the same. At the very least, the application folder (under the environment variable $STACK_PATH on Cloud 66 Rails stacks) and the gem installation path should be checked.

Of course, running diff file by file is not acceptable - that is far too time consuming. Enter the wonders of bash!

The following one liner should do the trick! It will tell you the SHA checksum for all the files of the subdirectories in the directory you're in. (See the appendix for a thorough explanation!)

find -L . -maxdepth 1 -type d -print0 | sort -z | xargs -0 -I{} bash -c "find -L '{}' -type f -print0 | sort -z | xargs -0 shasum | cut -d ' ' -f1 | shasum" | paste - <(find -L . -maxdepth 1 -type d | sort)

For example, given the following output for ls -al in my application folder:

total 96
drwxrwxr-x 13 cloud66-user cloud66-user 4096 Apr 4 15:51 .
drwxrwxr-x 3 cloud66-user cloud66-user 4096 Apr 4 11:53 ..
drwxrwxr-x 7 cloud66-user cloud66-user 4096 Apr 4 11:53 app
-rw-rw-r-- 1 cloud66-user cloud66-user 974 Apr 4 11:55 assets_manifest.json
drwxrwxr-x 2 cloud66-user cloud66-user 4096 Apr 4 11:53 bin
drwxrwxr-x 2 cloud66-user cloud66-user 4096 Apr 4 11:53 .bundle
drwxrwxr-x 2 cloud66-user cloud66-user 4096 Apr 4 11:53 .cloud66
drwxrwxr-x 5 cloud66-user cloud66-user 4096 Apr 4 11:53 config
-rw-rw-r-- 1 cloud66-user cloud66-user 156 Apr 4 11:53 config.ru
drwxrwxr-x 3 cloud66-user cloud66-user 4096 Apr 4 11:53 db
lrwxrwxrwx 1 cloud66-user cloud66-user 2 Apr 4 15:42 directory_symlink -> db
lrwxrwxrwx 1 cloud66-user cloud66-user 7 Apr 4 15:46 file_symlink -> Gemfile
-rw-rw-r-- 1 cloud66-user cloud66-user 1214 Apr 4 11:53 Gemfile
-rw-rw-r-- 1 cloud66-user cloud66-user 3891 Apr 4 11:53 Gemfile.lock
drwxrwxr-x 8 cloud66-user cloud66-user 4096 Apr 4 11:53 .git
-rw-rw-r-- 1 cloud66-user cloud66-user 455 Apr 4 11:53 .gitignore
drwxrwxr-x 4 cloud66-user cloud66-user 4096 Apr 4 11:53 lib
lrwxrwxrwx 1 cloud66-user cloud66-user 46 Apr 4 11:55 log -> /var/deploy/dl-local-rails/web_head/shared/log
-rw-rw-r-- 1 cloud66-user cloud66-user 73 Apr 4 11:53 Procfile
drwxr-xr-x 3 nginx nginx 4096 Apr 4 11:55 public
-rw-rw-r-- 1 cloud66-user cloud66-user 251 Apr 4 11:53 Rakefile
-rw-rw-r-- 1 cloud66-user cloud66-user 889 Apr 4 11:53 README.md
-rw-rw-r-- 1 cloud66-user cloud66-user 41 Apr 4 11:53 REVISION
-rw-rw-r-- 1 cloud66-user cloud66-user 19 Apr 4 11:53 .ruby-gemset
-rw-rw-r-- 1 cloud66-user cloud66-user 11 Apr 4 11:53 .ruby-version
drwxrwxr-x 6 cloud66-user cloud66-user 4096 Apr 4 11:53 test
drwxrwxr-x 2 nginx nginx 4096 Apr 4 11:55 tmp

The command gives the following output:

38403a327e4a5abc95f329f9c5bce7c7f8f4e278 -	.
7e6bd60c49d04b243904d3e7fb507657d9e97cee -	./app
6ffd41403eded24ea1e97c3d90107fc6b860d8e8 -	./bin
bffbcb319f0d7287eb5d1d7f92dd0e70079f873a -	./.bundle
b47aadba255a3a997f5a2ef58c99058261713978 -	./.cloud66
6a57ababb298c73b594e28cbed3ed7e07a157b96 -	./config
391983a36275ba6f048abfc424e9e2a7f68dc37d -	./db
391983a36275ba6f048abfc424e9e2a7f68dc37d -	./directory_symlink
98ae90c3e6eb3794b031e1ee7cccac646db393cd -	./.git
10a2462f3124522aa5dde0fa12d2cb29ab179100 -	./lib
054cf2441edf08bf080c1d6821886da213a6b237 -	./log
1f77c387482cd15e993c9de6201af7815922b1b4 -	./public
5b1244258ec888cd3c1b0f650789bd89f950f911 -	./test
b3ad3015ccff8ce01aa9b2ebfd70d1a5c2115bb1 -	./tmp

I can now compare values across the servers and see which subdirectories have different contents.

If the SHA checksum is different for a given subdirectory, I can go into that subdirectory and repeat the SHA command. If all the subdirectories have the same SHA checksums, but the folder I am in (meaning . in the output above) has a different checksum, then the files in . are different. To find which file it is, run a simplified version of the SHA command that looks for files without going into subdirectories:

find -L . -maxdepth 1 -type f -print0 | sort -z | xargs -0 shasum

I hope this helps, and I wish you all happier debugging times!

Appendix

Command Explanation

Bash one-liners without comments are almost impossible to figure out. As a matter of fact, the explanation for the above command is already slowly but surely slipping from my brain - let's write it down!

find "$DIRECTORY" is a wonderful command that I use often - as suggested, it will output a list of all files and directories in "$DIRECTORY". By default, it will recursively dig into all the subdirectories.

Adding the -L flag to find before "$DIRECTORY" will make it follow symlinks. This is necessary if you have symlinks to directories, but will break if you have circular symlinks - use as required.

Flags for find after "$DIRECTORY" are usually to specify what type of things you are searching for. -maxdepth 1 means I only want to list only immediate subdirectories of "$DIRECTORY" without going deeper, and -type d means I want only directories, not files. -print0 will make the delimiter between entries the \0 character instead of \n - it will become obvious why once we handle xargs.

So far, we have find -L "$DIRECTORY" -maxdepth 1 -type d -print0, which will list all immediate subdirectories of "$DIRECTORY". We will pipe this through sort -z for consistency across runs - the -z flag tells sort that the delimiter is the \0 character.

Then we pipe to xargs which will pass each argument from the preceeding command to the one that follows. In essence, we are passing each subdirectory to the bash command that follows. The -0 flag tells xargs that the delimiter between arguments is the \0 character. Otherwise, the default delimiter is the space character, which would break for subdirectories with spaces! Finally, the -I{} flag will tell xargs pass arguments to where it sees {} - otherwise by default it will pass arguments to the end of the command.

So now, we are passing each subdirectory to the command inside bash -c "..." - the reason we have to call bash is because we are passing each subdirectory to a command that contains pipes. Without it, the pipe would apply to the whole input.

The command inside bash -c "..." is similar to the previous one, except now we are finding -type f - that is files, and there is no -maxdepth. This will get the SHA checksum for each file in the subdirectory, and output it in the following format:

...
54f9b246c163b271ed47ac2088d0fbb105bb69be ./.git/refs/heads/master
...

We will now cut only the SHA checksum from this output using cut, which can be used to give you a specific column from an output. -d ' ' means that the delimiter between columns is the space character, and -f1 means I want the first column - the SHA checksum. This is so that the file name does not interfere with the next shasum in the case of a directory symlink.

We then pipe all the file SHA checksums through a shasum to get the overall SHA checksum for that subdirectory!

The final paste command is simply used to append the file name to the subdirectory SHA checksums. Without it, the output would look like

...
54f9b246c163b271ed47ac2088d0fbb105bb69be -
...

since we passed through a string of SHA checksums instead of a file in the final shasum.

Normally, paste works given two files. Given two files, it will append each row in one to each row in the other. Since we are piping to paste though, we can use the - character to mean 'whatever came from the pipe'. Finally, I want the output of find -L . -maxdepth 1 -type d | sort to be appended to whatever came from the pipe (the SHA checksums). But I don't want to create a seperate file for this purpose. I can get around this by using bash process substitution, denoted by <(...). This will make the output of ... to be placed in a temporary file in /dev/fd or a named pipe (FIFO), which can be used in the same way as a file normally is.

Bash Wizardry for Asset Pipeline Issues

Appendix

Command Explanation

Related Articles