I have tried to make the guide as complete as possible in itself and have given the related warnings and possibilities of failures throughout the guide. However, I have found certain caveats and have committed mistakes which I believe are easy to make. This article lists down those possible problems and mistakes which you might face as well.
This guide from the very beginning has assumed that the code should live in the deploying user’s home directory to avoid issues with permissions. Yet, on two occasions I made mistakes:
The first mistake was that I ran a
chmod command on the server instead of running it on my development machine (thanks to too many open tabs). This failed the script because it could not fetch the git commits and new changes were not being applied. Since the main script does not perform a lot of checks around permissions and just assumes that everything is accessible, it took me some time to figure out that the
scm directory on the serverhad been set to
400 disabling all write operations.
In such a case, look at the output of the script carefully and check the steps which went successfully and the point after which it failed. Since it is all bash commands and most bash commands output errors when they encounter insufficient permissions, you would be able to catch the culprit. Gazing at the output is the key here.
Executable permissions on script
In one of the newer projects, I copied the script to the server but forgot to make it executable. Trying to run the script just said that command could not be found (or something to that effect) which is a bit misleading. If you happen to encounter such a error, make sure you check the permissions on the script. Run a
chmod u+x on the script to make sure it runs.
The invisible deploy.lock file
This one is weird!
At a time when I had to update the production back to back testing small changes, I found that some subsequent deployments were complaining about the
deploy.lock being present despite the previous deploys being successful! I had another SSH session open in a second terminal tab; so I did a
ls using that session in my base_dir and the file (
deploy.lock) was not there. So I ran the deployment again (from a different tab, using a aliased command as explained here) and it worked.
This behaviour never occurred when I was not logged into the server from a separate tab! I am not sure why this happens; maybe the session from the second tab holds the file list in the OS cache and when the deployment script runs, the cache was being used and that caused the trouble.
Whatever be the reason, if you come across this, run
ls in the second terminal from which you SSHed into the server and that should solve it. Better yet, if you don’t need to be logged into the server, just don’t run the second SSH session.
Puma restart fails
This too is unexplainable. But it happens at times: the puma restart step reports success but puma does not restart. This causes a 502 (Bad Gateway) error because nginx fails to connect to the puma backend! Most of the time, re-running the deployment script (which hardly takes 4-5 seconds) solves the issue.
The problem is - there are no error messages shown by pumactl and so you don’t know if it silently failed and why!
It is recommended that every time you deploy your app, try to access any page/API from the app which does not change any data. If you get a 502, re-run the deployment. If the server responds as expected, puma did not fail.
Remember that you don’t need to restart nginx for the 502. The fact that you are getting any response means that nginx is working fine (unless you made any other changes to its config file and reloaded/restarted nginx).
Sidekiq PIDs not being reported
I have seen this happening with Sidekiq restarts at times - that the script just reports a blank PID after sidekiq restarts. This is because sidekiq takes quite a bit of time to restart (especially of there are jobs running). It is best to introduce/increase the delay between sending the restart command to sidekiq and reading its PID file.
In my experience waiting for 15 seconds mostly works (the PID file is refreshed with the new PID within that duration) but can be longer if your app is larger in size.