Skip to content

Data Transfer with scp and rsync

The two tools scp and rsync are the principal methods for transferring data to/from Science IT infrastructure using a command line.

scp

You can transfer files with the scp command. The first argument is the source file while the second argument indicates the target location. For example, consider transferring my_local_file.txt to a remote system's home directory (~).

scp my_local_file.txt <username>@<remote_ip_or_url>:~

To copy a file from a remote system, you specify the server and the remote path as the first argument and local path as the second.

scp <username>@<remote_ip_or_url>:/path/to/file.txt .

The . (i.e., "dot") character stands for the current directory. You can specify any other location either with an absolute path or path that is relative to your current directory.

As well, you can transfer the whole directory using an -r flag.

scp -r my/local/dir <username>@<remote_ip_or_url>

rsync

For transfers that involve many files or directories, it is often more efficient to use rsync. This program synchronises files between the source and destination. Thus, if your transfer fails or if only some of your files have been updated, rsync would be more efficient as it does not transfer the identical data present in both locations. For example, the following command can be used in place of the previous scp command for transferring to a remote's home location:

rsync -az --progress my/local/dir <username>@<remote_ip_or_url>:/target

As with scp, the first location is the source file/directory while the second is the target location. The -a flag invokes the archive mode that, roughly speaking, recreates the structure and permissions of the source directory on the target machine. The -z flag instructs rsync to compress the data before the transfer, which can make the transfer faster especially when your connection speed is low. As the name suggests, the --progress option would show the transfer progress information.

Before running the synchronisation, you can run the command with -n to preview which files will be transferred. It is necessary to specify --progress in this case. Otherwise, rsync will not display any output.

rsync -azn --progress my/local/dir <username>@<remote_ip_or_url>:/target

You can exclude files and directories from synchronisation with --exclude. This parameter can be specified multiple times. For example, the following command will ignore all files and directories named cache as well as all files that have .tmp extension.

rsync -azn --progress --exclude='cache' --exclude='*.tmp' my/local/dir <username>@<remote_ip_or_url>:/target

By default, rsync does not remove any local files even if they have been deleted from the source directory. The deletion of old files can be enabled with --delete. It is strongly recommended to preview the changes with -n before running rsync with the --delete flag. If you specify the wrong target directory, all files in that directory will be deleted without confirmation.

rsync -az --progress --delete <username>@<remote_ip_or_url>:/source my/local/target

Trailing slash at the end of the source directory instructs rsync to synchronise the contents of the source directory rather than the directory itself. Let us suppose, for example, that the source directory scratch/data has one single file test.txt. If you do not specify the trailing slash (i.e., /), rsync will create data directory in your local directory and transfer the contents there.

rsync -az <username>@<remote_ip_or_url>:/source/data my/local/target

ls my/local/target
# data
ls my/local/target/data
# test.txt

If you add the trailing slash /, rsync will place test.txt directly into your target directory.

rsync -az <username>@<remote_ip_or_url>:/source/data/ my/local/target

ls my/local/target
# test.txt

ScienceCloud

The ScienceCloud training materials include a specific example on how to use scp when transferring to/from a ScienceCloud VM.

ScienceCluster

You can refer to this page for the ScienceCluster transferring to the ScienceCluster, especially when sharing data amongst colleagues. There is also a relevant example in the ScienceCluster training materials.