Using Rsync to Deploy Hugo

The Hugo website provides a good guide for setting up ssh and  rsync to deploy to your web host (see  Deployment with Rsync). But these are general instructions and I found that for my shared hosting on Namecheap I needed a bit more tweaking.

Update

It turns out the original solution I created worked well for uploading the site files generated by Hugo, but it didn’t remove any old files from the server that I had removed from my source. Since I was only including the current files, and telling rsync to ignore (i.e. exclude) any remote files that weren’t in the include files, old files were not being removed.

The alternative is to set up explicit exclusions (e.g. a list of files and directories on the server I want to keep) and let rsync remove everything else that isn’t in the hugo-generated public/ folder.

Warning

This new script is more dangerous if you get the exclusion list wrong, so make sure you have a good backup of the non-hugo-generated files and directories on the server before you run this new script!

Assumptions 

This article assumes you have some experience or familiarity with using ssh and rsync, and that you have already set things up as  instructed on the hugo documentation site. It also assumes you have already verified that you can connect to your remote site with ssh and rsync – configuring these tools is outside the scope of this article.

Disclaimer

I provide this information in an attempt to be helpful, but you use this information at your own risk! Make sure any important files are backed up on your shared hosting provider! That way, if rsync does delete files you didn’t intend, you can restore them.

The Situation 

As configured, my hosting provider sets up my document root (the root directory for my web site) in ~/public_html but that folder already contains some other folders and files (including .htaccess) that would be deleted by rsync when I pushed the generated website files to the host. I also have no way to change the document root for the domain and no way to create Apache Virtual Folders (since this is a shared host). I even tried using the Apache rewrite module with rewrite rules in the .htaccess file, but this didn’t work well (I also didn’t have the patience to dig into the rewrite rule formats to figure out how to fix this).

Some possible solutions to this situation:

  1. Switch to a dedicated hosting option instead of shared hosting
    *(Would provide full[er] access to the web server config, but costs more)*
  2. Put in a request with my hosting provider to adjust my virtual directory settings for my site
    *(Since this is a shared host, it’s uncertain if this would be worth the effort and it would take additional time to set up)*
  3. Adjust my script to include exactly what gets rsync'd after generating the files
    *(Makes sure rsync only uploads the site files and ignores any other existing files on the web server)*
  4. Create a script to exclude specific files and directories on the server that I don’t want deleted, then let rsync upload the files generated by hugo (allowing rsync to delete any files that aren’t in the exclude list)
    *(This keeps the site clean, with only the current hugo-generated site files __and__ the server files and directories I chose to keep)*

Being lazy, I chose option #3 originally and it does work — but it has limitations that tend to clutter my site (see the UPDATE at the beginning of this document). I have left the original script in this document for reference purposes.

The new (recommended) script implements option #4.

Deployment Script 

General script configuration options 

Both scripts use the following variables, so you will need to make the following changes to either script:

  • USER — change this to the username used for accessing your hosting provider
  • HOST — change this to the hostname (domain name) of your site
  • PORT — set this to the ssh port required by your hosting provider
  • DIR — set this to the document root directory on your hosting provider

Important

If the path to your document root contains a tilde (~ ) you will need to escape the tilde with a backslash (as shown in the script example). This is because bash treats the tilde special and will expand the '~’ into the full path of $HOME when you use the DIR variable in the rsync part of the script. Since we don’t want the tilde to be expanded, we have to escape it.

New script 

The following shell script is what I’m currently using to deploy my site to my hosting provider. Make sure you have a backup of any non-hugo-generated files on the server that you want to keep safe (e.g. the .htaccess file and cgi-bin/ directory)! When creating this new script I made a mistake and ended up deleting the server files — but I had a backup and was able to quickly restore them.

If you are not routinely adding other web applications offered by your hosting provider (i.e. you are just running a hugo-generated site and only need to protect a few files and directories on the server), then this new script might suit your needs. Set up correctly, it makes sure to keep your site clean and automatically removes any old files that are no longer needed by your hugo-generated site.

Important

Make sure to adjust the RMT_EXCL variable to match your requirements (see the comments in the script).

Remember

Modify the other variables as needed (i.e. USER, HOST, DIR and PORT). See the general info about these variables in the previous section.

#!/bin/bash
USER=hosting-username
HOST=example.com
DIR=\~/public_html   # might sometimes be empty!
PORT=22

### LIST OF FILES/DIRECTORIES TO EXCL #######################################
# create a bash array containing the files and directories on the server
# you want to protect.
# IMPORTANT: Directories MUST end with a '/'! You only need the first
#            level of the directory -- all subdirectories below the 
#            listed directory will also be excluded.
declare -a RMT_EXCL=( ".ht*" 
                    "parking-page.shtml"
                    "cgi-bin/" 
                    ".well-known/" 
                    "dl/" 
                    "nc_assets/"
                    )
#############################################################################

filename=${0##*/}
scriptname=${filename%.*}

# set the filename for the exclude file
tmp_exclude=${TMP}/${scriptname}-exclude-${RANDOM}.tmp
# make sure there isn't a previous exclude file
if [ -f "${tmp_exclude}" ]; then
    rm -f "${tmp_exclude}" > /dev/null 2>&1
fi

# Generate the exclude file
for i in "${RMT_EXCL[@]}"; do
    echo "- $i" >> "${tmp_exclude}"
    if [[ $i == */ ]]; then
        echo "- ${i}**" >> "${tmp_exclude}"
    fi
done

# run hugo 
## adjust the hugo command line if needed
## (e.g. you might have additional options or you don't need --baseURL)
if hugo --cleanDestinationDir --baseURL "http://${HOST}"; then
    # run rsync with the exclude file created above
    /usr/bin/rsync -rtvz --exclude-from="${tmp_exclude}" -e "ssh -p ${PORT}" \
                         --delete-after public/ ${USER}@${HOST}:${DIR}
else
    echo "** HUGO GENERATE FAILED **"
    echo "   (rsync did NOT run)"
fi

# remove the exclude file
if [ -f "${tmp_exclude}" ]; then
    rm -f "${tmp_exclude}" > /dev/null 2>&1
fi

exit 0

Original script (not optimal, but relatively safe) 

The following shell script is what I originally used to deploy my site to my hosting provider. It seems to be pretty safe, since it includes only the hugo-generated files in public/ and excludes everything else – but this method will not remove old hugo-generated files from the server. You would periodically need to manually remove any old hugo-generated files from your site. This is not a major issue, and if you use this script it should be quite safe (though standard “don’t blame me if something blows up” disclaimer applies…)

This is also a good script choice if you routinely install other web applications provided by your hosting provider, since these other apps will also create new folders in your document root and you don’t want rsync to delete these folders.

You need to run this from a bash shell (e.g. Git Bash on Windows).

#!/bin/bash
USER=hosting-username   # replace with your username on your hosting site
HOST=example.com        # your domain name
PORT=22                 # port 22 is the ssh default, but yours might be different
DIR=\~/public_html      # might sometimes be empty! Be sure to escape any tilde '~'

filename=${0##*/}       # get just the script filename (without path)
scriptname=${filename%.*}  # remove the extension (if it has one e.g. '.sh')

# create the temporary file that will contain all the directories and
# files to include in the rsync
tmp_include=${TMP}/${scriptname}-include-${RANDOM}.tmp

# run hugo and only do the rsync if it succeeds
if hugo --cleanDestinationDir --baseURL "http://${HOST}"; then
    # create rsync include-from file from the hugo-generated 
    # directory (./public/)
    ## first, get a list of directories and prepend a '+ ' to each line
    find public/ -type d | sed 's,^[^/]*/,,' | sed '/^\s*$/d' | \
                           sed 's,^\(.*\)$,+ \1\/,' > "${tmp_include}"
    ## second, get a list of files and prepend a '+ ' to each line
    find public/ -type f | sed 's,^[^/]*/\(.*\),+ \1,' | \
                           sed '/^\s*$/d' >> "${tmp_include}"
    ### You now have a list of files for rsync to upload...
    ### rsync processes 'includes' first and 'excludes' last so you're
    ### telling it to include all the directories and files in the ./public/
    ### folder, but exclude everything else (files and directories) that 
    ### aren't part of the hugo-generated file structure (thus excluding all 
    ### non-hugo remote files)
    # run rsync
    /usr/bin/rsync -rtvz --include-from="${tmp_include}" --exclude="*" \
                         --exclude="*/**" -e "ssh -p ${PORT}" --delete \
                         ./public/ ${USER}@${HOST}:${DIR}
else
    # Return an error message if hugo failed
    echo "** HUGO GENERATE FAILED **"
    echo "   (rsync did NOT run)"
fi

# delete the temporary include file
if [ -f "${tmp_include}" ]; then
    rm -f "${tmp_include}" > /dev/null 2>&1
fi

exit 0

Caveats 

As I discovered while creating these scripts, rsync can be a bit quirky to work with. So make sure you have a good backup of your remote files before attempting the first sync — if something goes wrong and rsync deletes all the non-hugo-generated stuff on your remote host, you’ll be able to restore your non-hugo-generated files.

Both of these scripts work for me, but your mileage may vary depending how your hosting provider configures their servers and which versions of ssh and rsync are used on each end of the connection.

Owner & Founder

Related