Using Rsync to Deploy Hugo
The Hugo website provides a good guide for setting up ssh
and rsync
to deploy to your web host (see Deployment with Rsync). But these are general instructions and I found that for my shared hosting on Namecheap I needed a bit more tweaking.
Update
It turns out the original solution I created worked well for uploading the site files generated by Hugo, but it didn’t remove any old files from the server that I had removed from my source. Since I was only including the current files, and telling rsync
to ignore (i.e. exclude) any remote files that weren’t in the include files, old files were not being removed.
The alternative is to set up explicit exclusions (e.g. a list of files and directories on the server I want to keep) and let rsync
remove everything else that isn’t in the hugo-generated public/
folder.
Warning
This new script is more dangerous if you get the exclusion list wrong, so make sure you have a good backup of the non-hugo-generated files and directories on the server before you run this new script!
Table of Contents
Assumptions
This article assumes you have some experience or familiarity with using ssh
and rsync
, and that you have already set things up as instructed on the hugo documentation site. It also assumes you have already verified that you can connect to your remote site with ssh
and rsync
– configuring these tools is outside the scope of this article.
Disclaimer
I provide this information in an attempt to be helpful, but you use this information at your own risk! Make sure any important files are backed up on your shared hosting provider! That way, if rsync
does delete files you didn’t intend, you can restore them.
The Situation
As configured, my hosting provider sets up my document root (the root directory for my web site) in ~/public_html
but that folder already contains some other folders and files (including .htaccess
) that would be deleted by rsync
when I pushed the generated website files to the host. I also have no way to change the document root for the domain and no way to create Apache Virtual Folders (since this is a shared host). I even tried using the Apache rewrite module with rewrite rules in the .htaccess
file, but this didn’t work well (I also didn’t have the patience to dig into the rewrite rule formats to figure out how to fix this).
Some possible solutions to this situation:
- Switch to a dedicated hosting option instead of shared hosting
(Would provide full[er] access to the web server config, but costs more) - Put in a request with my hosting provider to adjust my virtual directory settings for my site
(Since this is a shared host, it’s uncertain if this would be worth the effort and it would take additional time to set up) - Adjust my script to include exactly what gets
rsync
’d after generating the files
(Makes surersync
only uploads the site files and ignores any other existing files on the web server) - Create a script to exclude specific files and directories on the server that I don’t want deleted, then let
rsync
upload the files generated by hugo (allowingrsync
to delete any files that aren’t in the exclude list)
(This keeps the site clean, with only the current hugo-generated site files and the server files and directories I chose to keep)
Being lazy, I chose option #3 originally and it does work — but it has limitations that tend to clutter my site (see the UPDATE at the beginning of this document). I have left the original script in this document for reference purposes.
The new (recommended) script implements option #4.
Deployment Script
General script configuration options
Both scripts use the following variables, so you will need to make the following changes to either script:
- USER — change this to the username used for accessing your hosting provider
- HOST — change this to the hostname (domain name) of your site
- PORT — set this to the
ssh
port required by your hosting provider - DIR — set this to the document root directory on your hosting provider
Important
If the path to your document root contains a tilde (~
) you will need to escape the tilde with a backslash (as shown in the script example). This is because bash
treats the tilde special and will expand the '~' into the full path of $HOME
when you use the DIR
variable in the rsync
part of the script. Since we don’t want the tilde to be expanded, we have to escape it.
New script
The following shell script is what I’m currently using to deploy my site to my hosting provider. Make sure you have a backup of any non-hugo-generated files on the server that you want to keep safe (e.g. the .htaccess file and cgi-bin/ directory)! When creating this new script I made a mistake and ended up deleting the server files — but I had a backup and was able to quickly restore them.
If you are not routinely adding other web applications offered by your hosting provider (i.e. you are just running a hugo-generated site and only need to protect a few files and directories on the server), then this new script might suit your needs. Set up correctly, it makes sure to keep your site clean and automatically removes any old files that are no longer needed by your hugo-generated site.
Important
Make sure to adjust the RMT_EXCL variable to match your requirements (see the comments in the script).
Remember
Modify the other variables as needed (i.e. USER, HOST, DIR and PORT). See the general info about these variables in the previous section.
#!/bin/bash
USER=hosting-username
HOST=example.com
DIR=\~/public_html # might sometimes be empty!
PORT=22
### LIST OF FILES/DIRECTORIES TO EXCL #######################################
# create a bash array containing the files and directories on the server
# you want to protect.
# IMPORTANT: Directories MUST end with a '/'! You only need the first
# level of the directory -- all subdirectories below the
# listed directory will also be excluded.
declare -a RMT_EXCL=( ".ht*"
"parking-page.shtml"
"cgi-bin/"
".well-known/"
"dl/"
"nc_assets/"
)
#############################################################################
filename=${0##*/}
scriptname=${filename%.*}
# set the filename for the exclude file
tmp_exclude=${TMP}/${scriptname}-exclude-${RANDOM}.tmp
# make sure there isn't a previous exclude file
if [ -f "${tmp_exclude}" ]; then
rm -f "${tmp_exclude}" > /dev/null 2>&1
fi
# Generate the exclude file
for i in "${RMT_EXCL[@]}"; do
echo "- $i" >> "${tmp_exclude}"
if [[ $i == */ ]]; then
echo "- ${i}**" >> "${tmp_exclude}"
fi
done
# run hugo
## adjust the hugo command line if needed
## (e.g. you might have additional options or you don't need --baseURL)
if hugo --cleanDestinationDir --baseURL "http://${HOST}"; then
# run rsync with the exclude file created above
/usr/bin/rsync -rtvz --exclude-from="${tmp_exclude}" -e "ssh -p ${PORT}" \
--delete-after public/ ${USER}@${HOST}:${DIR}
else
echo "** HUGO GENERATE FAILED **"
echo " (rsync did NOT run)"
fi
# remove the exclude file
if [ -f "${tmp_exclude}" ]; then
rm -f "${tmp_exclude}" > /dev/null 2>&1
fi
exit 0
Original script (not optimal, but relatively safe)
The following shell script is what I originally used to deploy my site to my hosting provider. It seems to be pretty safe, since it includes only the hugo-generated files in public/
and excludes everything else – but this method will not remove old hugo-generated files from the server. You would periodically need to manually remove any old hugo-generated files from your site. This is not a major issue, and if you use this script it should be quite safe (though standard “don’t blame me if something blows up” disclaimer applies…)
This is also a good script choice if you routinely install other web applications provided by your hosting provider, since these other apps will also create new folders in your document root and you don’t want rsync
to delete these folders.
You need to run this from a bash
shell (e.g. Git Bash on Windows).
#!/bin/bash
USER=hosting-username # replace with your username on your hosting site
HOST=example.com # your domain name
PORT=22 # port 22 is the ssh default, but yours might be different
DIR=\~/public_html # might sometimes be empty! Be sure to escape any tilde '~'
filename=${0##*/} # get just the script filename (without path)
scriptname=${filename%.*} # remove the extension (if it has one e.g. '.sh')
# create the temporary file that will contain all the directories and
# files to include in the rsync
tmp_include=${TMP}/${scriptname}-include-${RANDOM}.tmp
# run hugo and only do the rsync if it succeeds
if hugo --cleanDestinationDir --baseURL "http://${HOST}"; then
# create rsync include-from file from the hugo-generated
# directory (./public/)
## first, get a list of directories and prepend a '+ ' to each line
find public/ -type d | sed 's,^[^/]*/,,' | sed '/^\s*$/d' | \
sed 's,^\(.*\)$,+ \1\/,' > "${tmp_include}"
## second, get a list of files and prepend a '+ ' to each line
find public/ -type f | sed 's,^[^/]*/\(.*\),+ \1,' | \
sed '/^\s*$/d' >> "${tmp_include}"
### You now have a list of files for rsync to upload...
### rsync processes 'includes' first and 'excludes' last so you're
### telling it to include all the directories and files in the ./public/
### folder, but exclude everything else (files and directories) that
### aren't part of the hugo-generated file structure (thus excluding all
### non-hugo remote files)
# run rsync
/usr/bin/rsync -rtvz --include-from="${tmp_include}" --exclude="*" \
--exclude="*/**" -e "ssh -p ${PORT}" --delete \
./public/ ${USER}@${HOST}:${DIR}
else
# Return an error message if hugo failed
echo "** HUGO GENERATE FAILED **"
echo " (rsync did NOT run)"
fi
# delete the temporary include file
if [ -f "${tmp_include}" ]; then
rm -f "${tmp_include}" > /dev/null 2>&1
fi
exit 0
Caveats
As I discovered while creating these scripts, rsync
can be a bit quirky to work with. So make sure you have a good backup of your remote files before attempting the first sync — if something goes wrong and rsync
deletes all the non-hugo-generated stuff on your remote host, you’ll be able to restore your non-hugo-generated files.
Both of these scripts work for me, but your mileage may vary depending how your hosting provider configures their servers and which versions of ssh
and rsync
are used on each end of the connection.