# Command Line Training ---- To make the most out of computing infrastructures like the [ScienceCloud](https://www.zi.uzh.ch/en/teaching-and-research/science-it/computing/sciencecloud.html), [ScienceCluster](https://www.zi.uzh.ch/en/teaching-and-research/science-it/computing/sciencecluster.html), as well as the [Supercomputer - Alps](https://www.zi.uzh.ch/en/teaching-and-research/science-it/computing/supercomputer.html), you may find it helpful to learn to use the Command Line Interface (CLI) for scientific computing. All our computing services run on open source, Linux-based operating systems. You may have heard of [Ubuntu](https://ubuntu.com/desktop), a popular Linux distribution that also powers massive supercomputers. See [Context](#context) below, for more info about open source and scientific computing. --- These training materials should help you: - Learn the fundamentals about the command line and how it relates to your research computing workflows. - Introduce you to the structure of the Linux filesystem - Acquaint you with the basic commands and syntax of a shell programming language (which will transfer to multiple shells and operating systems). !!! important "Bash" **Please note:** although many of these concepts apply across shell languages, the provided training materials use the Bash shell language. Using a different shell on your personal computer may result in slight variations from the examples. For the best experience consider taking the [Command Line training workshop](https://zi-training.zi.uzh.ch/en/page/scientific-computing/science-it-linux-command-line). !!! note "Getting Started Independently" If you would like to begin practicing on a command line immediately, consider publically available sites that offer example CLIs: [WebVM](https://webvm.io/) (which allows user access to a sandboxed CLI) and [`container2wasm`](https://ktock.github.io/container2wasm-demo/amd64-debian-wasi.html) (which offers full `root` access to a sandboxed CLI). --- ## How do I work through these materials? The **terminal** application you use to access a command line on your local computer depends on your operating system; here are the default options: - **MacOS** and **Linux**: use the **Terminal** application - **Windows**: use **PowerShell**, or consider [WSL](https://learn.microsoft.com/en-us/windows/wsl/) or [Multipass](https://documentation.ubuntu.com/multipass/stable/) - both will give you the ability to install an Ubuntu Linux virtual machine. Either take the [training workshop](https://zi-training.zi.uzh.ch/en/page/scientific-computing/science-it-linux-command-line) to gain access to an Ubuntu Linux virtual machine, or use a **terminal** application on your local machine. From these terminal applications you can either use your operating system's default shell (which may result in slightly different outputs than shown in these examples), or you can use the `ssh` command/program to connect to a machine that uses Bash (the shell language used to develop these materials). For the course, as well as whenever you use the [ScienceCloud](https://docs.s3it.uzh.ch/training/cloud#4-login-to-your-instance) and the [ScienceCluster](https://docs.s3it.uzh.ch/cluster/quickstart#connecting-to-the-cluster), you will [use `ssh` to connect to a virtual machine](https://docs.s3it.uzh.ch/general/ssh_keys) (VM). --- ## What is the "command line"? The "command line" interface (often abbreviated "CLI") is a system that allows users to interact with a computer using typed commands. Learning how to use a CLI will not only give you greater [skills with computers](https://docs.s3it.uzh.ch/training/commandline#open-source-operating-systems), it will allow you to customize your research workflows so that you can make optimal use of the most powerful [computing infrastructures](https://www.zi.uzh.ch/en/teaching-and-research/science-it/computing.html). --- ## Filesystems --- ### What is a "filesystem"? At an abstract level, one could model a computer as a machine that necessarily includes: - Datasets and a system for storing such data - Programs and applications that run both the computer system itself in addition to manipulating the available data All data for a computer (i.e., datasets, user software, operating system software, etc.), is stored within what is called a **filesystem**. It is the filesystem that dictates how data is structured on any storage device (e.g., a hard-drive, a USB stick, etc.). --- Importantly, there are multiple types of filesystems, and not all filesystems are compatible with all computer operating systems. Examples of filesystems include: - **vfat**: an older filesystem used by MS DOS - **ntfs**: the default filesystem for Windows - **ext4**: the default filesystem in most GNU/Linux distributions; used for [ScienceCloud volumes](https://docs.s3it.uzh.ch/training/cloud#format-and-mount-the-volume) - **apfs**: the MacOS filesystem --- ### Structure Although not all filesystems are identical, many of them share a similar **hierarchical tree** structure. In a hierarchical tree filesystem everything starts from the **root directory**, which is represented in Bash and other command line languages as `/`. !!! danger "⚠️ The `/` Character" The `/` character alone represents *the entire* root directory and *all* its subdirectories. If a command acts or operates on the `/` symbol, especially recursively, then it will affect *the entire* filesystem. --- Here's a diagram of a sample filesystem: ```text / ├─ bin/ │ └─ ... ├─ home/ │ ├─ user/ │ │ ├─ Documents/ ← this is an example directory! │ │ │ └─ example.txt ← this is an example file! │ │ └─ Pictures/ │ │ └─ photo.png │ └─ second_user/ │ └─ ... ├─ sbin/ │ └─ ... ├─ var/ │ └─ ... └─ .../ ``` --- Within a hierarchical filesystem, a **directory** is a "branch" on the hierarchical tree. When using a GUI to control a computer's files, directories are commonly represented as folders. Thus, files can be thought of as being located at (or within) a specific directory (just as files can be considered as being within folders on a graphical desktop). Directories themselves can have directories within or under them, which are called **subdirectories**. --- Some familiar directories/locations you will see in many filesystems are: - `bin/` : includes system user command binaries ("bin" is short for "binaries") - `home/` : includes all `user` home directories for the system - `sbin/` : includes essential system command binaries - `var/` : includes "variable length" types of files (e.g., logs, temporary files, etc.) There are many other directories/locations you'll find across operating systems. It's important to remember: not all locations in the filesystem are safe to freely alter. Changing files in certain locations can lead to operating system failure or corruption. --- #### Dotfiles In order to help keep filesystems as accident-proof as possible, filesystems make use of **dotfiles**. A dotfile is exactly what the name states: a file (or a directory) that begins with a `.` character. Unless you take specific actions to display them (e.g., use the `-a` flag with the `ls` command), they will not be displayed by default. !!! note "Dot Directories" Directories can also start with `.` (dot directories). As with dotfiles, they are hidden by default. Otherwise, they act like and can be treated like standard directories. --- ### Paths As noted, a **directory** is a "branch" within a filesystem where files (or other subdirectories) can be located; i.e., a directory is a **location** in a filesystem. To refer to any location (i.e., directory or file) within a filesystem, a **path** to the location of interest is used. There are two types of paths: - **Absolute** paths: include the entire location of a directory or file starting from the root directory; absolute paths always start with `/` - **Relative** paths: include the location of a directory or file in relation (i.e., *relative* to) the user's current location in the filesystem (see [below](https://docs.s3it.uzh.ch/training/commandline#beginning-on-the-cli)) --- From the sample filesystem above, an example of an absolute path to a file is: ``` /home/first_user/Documents/example.txt ``` To reiterate: all absolute paths start with `/`. The same `/` character is also used in paths (both absolute and relative) to distinguish between depths or levels of the hierarchical tree. --- An example of a relative path is: ``` Documents/example.txt ``` In contrast to absolute paths, relative paths *never begin* with `/`. They describe the path to a file or directory with reference to your **current working directory**, which is the current location of your session within the filesystem. The current working directory in the `Documents/example.txt` example is (with reference to the [sample filesystem diagram](https://docs.s3it.uzh.ch/training/commandline#structure)) the `/home/first_user/` directory. How do you know your current location? The [command prompt](https://docs.s3it.uzh.ch/training/commandline#command-prompt) tells you, or you can use the `pwd` command. > Further info: > > Paths to files and directories are formatted identically, though some programmers prefer to write directory paths with a trailing `/` character. > > In most cases it is equivalent to include the final `/` character. However, some command line tools will interpret a path with a trailing `/` character differently (e.g., [rsync](https://docs.s3it.uzh.ch/general/scp_and_rsync#rsync)). --- ## Beginning on the CLI --- ### Command Prompt When you arrive at a CLI you see what is called the **command prompt**. It is designed to help communicate **who** and **where** you are on a system. It often looks something like this: ``` username@hostname:~$ ``` The values `username` and `hostname` in this example are specifically chosen as these are two of the principal values that comprise the command prompt. --- Piece by piece, the example command prompt includes: - the `username` is your current authenticated username on the computer - the `hostname` is the name of the computer to which your command line session is connected - the `@` character connects the `username` with the `hostname` - the `:` separates the `username@hostname` information from the displayed location *within* the computer's filesystem - the `~` is the special symbol used to denote the `home` directory for the user; this is often the default location when starting a command line session on a machine - the `~` symbol will change to show path locations as you navigate through a filesystem (e.g., with `cd`) - in other words, this part of the command prompt shows *your current location within the filesystem* - the `$` denotes the end of the command prompt; your typed commands will come afterwards > Further info: > The specifics of your command prompt may vary according to your operating system. --- ### What is a "shell"? When inputting commands into a command prompt, what exactly happens with/to/from those commands? To answer this question, it's necessary to understand that a computer's **operating system** is the entirety of the software (sometimes called the "software stack") that makes the computer functional. Within the operating system exists a variety of software types, including: - a "kernel": the software that directly controls hardware processes (e.g., memory management, process scheduling, etc.); one of the most commonly encountered kernels is [Linux](https://docs.s3it.uzh.ch/training/commandline#context) - system libraries and utilities: collections of code and programs that allow installed applications to interact with the hardware via the kernel; these include the command line programs [mentioned below](https://docs.s3it.uzh.ch/training/commandline#commands) (e.g., `ls`, `cp`, etc.) - user space programs: the software that the user can customize then utilize for their tasks --- The **shell** is one of these user space programs. It's the specific software that interprets your commands then executes them. There are a variety of shells used across operating systems: - **Bash**: the default shell for many [Linux distributions](https://docs.s3it.uzh.ch/training/commandline#open-source-operating-systems) - [ScienceCloud](https://docs.s3it.uzh.ch/cloud2/overview) and [ScienceCluster](https://docs.s3it.uzh.ch/cluster/overview) users will use Bash - **Zsh**: the default shell for MacOS, but can also be used in Linux - **PowerShell**: the default shell for Windows Encouragingly, these shells share a common command [syntax](https://docs.s3it.uzh.ch/training/commandline#syntax), meaning the skills involved in using one shell language will translate to other shells (and [operating systems](https://docs.s3it.uzh.ch/training/commandline#open-source-operating-systems)). --- ## Syntax --- ### Command Structure The basic structure of a shell command is as follows: ```bash
[-optional_arguments]
``` - the `
` is the specific command you're using (e.g., `ls`, `cd`) - spaces are used to separate commands and arguments - the `[-optional_arguments]` are inputted via **flags**; as they are *optional* they are therefore never strictly required - the `
` are the specific inputs to the `
` you're using, often paths to files or directories If the required arguments are omitted with commands, the command will either fail or use its default value. Accordingly, it's best to familiarize yourself—at least a little—with every command you run. --- #### Flags The `[-optional_arguments]` of a command are inputted via **flags**. Flags come in 2 types, **short** and **long**: - **Short flags** use single letters, a single hyphen `-`, and can be combined; for example: - `ls -alh` is the same as `ls -a -l -h` (and any combination of the single letter flags). - order (generally) does not matter, and you can always check the documentation of any specific command to confirm - **Long flags** use full words, double hyphens, and must be written individually; for example: - `ls --all --human-readable -l` (which is the same as `ls -ahl`) --- ### Special Symbols There are a number of special characters in most shell languages (including Bash) that reduce how much you need to type. Here are a selection of them: - `/`: the symbol for the root directory of the filesystem and the delimiter between directories and subdirectories (i.e., depths of the filesystem tree) - `~`: an abbreviation for the home directory (i.e., shorthand for the path to the `home` directory for the user) - `.`: refers to the current working directory; can be used in the same way as a file path - `..`: refers to the parent of the current working directory; can be used in the same way as a file path - `|`: called the pipe character, it forwards the textual output from one command directly into another command as input; e.g., with the `grep` function - `>`: called the redirection operator, it "redirects" the textual output of a command to write to a new text file or overwrite the existing text file (or value) - ⚠️ use the `>` character carefully as it will overwrite existing files/values by default! - `>>`: a variation on the redirection operator that *appends* textual output of a command to a text file (rather than overwriting) --- ## Fundamentals --- ### File Permissions Before operating any commands on files in a filesystem, it's first helpful to understand **permissions**. Permissions are the concept in an operating system and filesystem that allow multi-user functionality in a safe, secure, and accident-reduced way. Without the appropriate permissions, you (as a `user`) may or may not be able to: - **read** a file/directory - **write** to (i.e., change) a file/directory - **execute** (i.e., run) a file --- **File permissions** are structured so that multiple users on the same machine can have a unified, accident-protected, and secure way to manage their files. The easiest way to see file permissions (in your current working directory) is to run [`ls -l`](https://docs.s3it.uzh.ch/training/commandline#metadata). The output should resemble the following (fabricated example): ``` drwxrwxrwx 2 user group 4096 Jan 01 00:00 Documents -rwxr--r-- 1 second_user second_group 4096 Jan 01 00:00 example.txt ``` --- The first 10 characters of each line share a common format: - the first character will be a `d` for directory or `-` for not a directory - the next 9 characters are separated into 3 sets of 3 characters; each set of characters is identical in format, defining read (`r`), write (`w`), and execute (`x`) permissions for: - `user`, `group`, and `other` - a value of `-` means that specific permission is **not** assigned; a value of `r`, `w`, or `x` indicates the specific permission is assigned - the first column of numbers (`2` and `1`) is usually a number indicating number of values (i.e., files and directories) underneath an entry - the next 2 columns denote the `user` and `group` assignment for the file/directory - the assigned `user` has read, write, execute permissions defined via the first series of 3 characters - the assigned `group` has read, write, execute permissions defined in the second series of 3 characters - users on the machine that are not named `user` and are also not a member of an entry's assigned `group` have permissions defined in the third series of 3 characters (i.e., `other`) --- This text-based diagram may be helpful: ``` 1 2 3 4 5 6 7 8 9 10 | | | | | | | | | | - r w x r - - r - - ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | | | | | | | | | +--- `other` execute (-) | | | | | | | | +----- `other` write (-) | | | | | | | +------- `other` read (r) | | | | | | +--------- `group` execute (-) | | | | | +----------- `group` write (-) | | | | +------------- `group` read (r) | | | +--------------- `user` execute (x) | | +----------------- `user` write (w) | +------------------- `user` read (r) +--------------------- entry type: d = directory - = regular file ``` --- The principal [commands to edit permissions](https://docs.s3it.uzh.ch/training/commandline#permissions) and ownership values are `chmod` and `chown`. The special command `sudo` can be prepended to any other commands to "elevate" the command so it's treated as having been run by the special `root` user. The `root` user is a default user written into the operating system that has complete control over all aspects of a filesystem. --- !!! important "`sudo` Access" Due to the security issues and accident-potential associated with `sudo` and `root` permissions, only specific systems from Science IT allow `sudo` access. Please plan your workflow accordingly: - ScienceCloud VMs, launched and managed by a user, come equipped with root access by default (secured by default using [SSH keys](https://docs.s3it.uzh.ch/general/ssh_keys)). - ScienceCluster and the Alps System do not allow users `sudo` and `root` permissions. --- ### File Types When operating on a command line it's helpful to categorize files into 2 types: - **Binary** files: require a specific program/application to be used or read; e.g., `.mp3`, `.pdf`, `.doc` - **Text** files: as the name states, they contain purely alphanumeric text and can be edited interactively To confirm a file's type, use the `file` command. To open a binary file you execute it using the command corresponding to its required program; for example: ``` libreoffice example\_libreoffice\_file.odt ``` You as a user will need to ensure you select the correct program/application command for the target binary file. --- ### Editing Text Files There are several ways to edit text directly from the command line. Some popular full-terminal text editors include: - `nano` : the default editor on most GNU/Linux systems; beginner-friendly and easy to use - `pico` : similar to nano but more lightweight - `vi` : a powerful and efficient UNIX editor; has a steeper learning curve that may be challenging for beginners --- For beginners, it's helpful to know how to start and stop `nano`: - To start `nano`, simply execute the command `nano` and your terminal application will move to the `nano` interface creating a blank document - To edit a specific text file with `nano`, run `nano
` - You can freely type with your cursor in this interface as well as paste text copied from your local computer - When you are finished editing you can exit: - Press `control + X` to initiate the exit procedure - When asked `Save modified buffer?` type `y` to confirm that you want to save the changes (or `n` to cancel without saving) - When prompted for the `File name: ...` either update the file name or press `enter` to confirm the inputted file name --- ## Commands While there are innumerable commands on any command line, here are commands (with useful flags as noted) to consider for research computing: !!! note "The `-h` / `--help` flag" For many, but not all, commands the `-h`/`--help` flag is conventionally used to display the help dialogue for a command. --- ### Metadata - `man`: opens the manual for a command; i.e., it's used on other commands; e.g., `man man` - `ls`: lists the content of a directory; common flags: `-a`, `-l`, `-h` - `lsblk`: lists the storage devices on the system - `df`: displays usage of the storage devices; common flags: `-h` - `ps`: displays process statistics; common flags: `aux`, `-ef` - `top` and `htop`: used for monitoring and benchmarking --- ### Viewing Files - `file`: confirm a file type - `cat`: print the content of a text file - `echo`: prints a character string of interest - `echo $USER`: prints your username, where $USER is an environment variable storing the name of the currently logged-in user - `less`: open a text file to read it in your terminal; type `q` to exit - `tail` and `head`: print the end/beginning of a file, respectively; common flags: `-n
` - `grep`: stands for "[global regular expression parse](https://en.wikipedia.org/wiki/Regular_expression)" - used to find specific character strings within text, `grep
` - often being fed data via the `|` operator: `ps aux | grep ssh`, to list all processes containing "ssh" --- ### Filesystem Navigation - `pwd`: prints the current working directory - `cd`: changes the current directory to a directory of your choosing - `cd -`: brings you to the previous directory you were in - `cd ..`: moves you one directory level up from your current location - `cd ~`: will always bring you `$HOME` --- ### Moving and Copying - `cp`: copy files and directories - usage: `cp [options]
` - common flags: `-r` for recursive (i.e., apply to a directory and its contents) - `mv`: moves files from one location to another; also used for renaming - usage: `mv [options]
` - `mv` is always recursive! - `mv -i`: prompt before overwrite - `mkdir` and `rmdir` : make and remove an empty directory, respectively - `rm` : remove files and directories - `rm -i`: prompt before each removal, giving you a chance to confirm - common flags: `-r` for recursive (i.e., apply to a directory and its contents) - ⚠️ the `rm` command *does not* move files to a trash bin or temporary location, it immediately removes them; use with caution --- ### File Transfer See [our documentation](https://docs.s3it.uzh.ch/general/scp_and_rsync) on `scp` and `rsync`. To make a "clone" of a remote `git` repository, you can use: ```bash git clone git@gitlab.uzh.ch:project/folder.git ``` --- ### Permissions - `chmod` : change the permissions of files and directories - usage: `chmod [options]
` - common flags: `[ugo]±r`, `[ugo]±w`, `[ugo]±x` - e.g.,`chmod u+x,o-w file` adds `read` permissions for the `user` and removes `write` permissions for `other` - alternatively, the graphics below show how to compose the `chmod` numeric ([octal](https://en.wikipedia.org/wiki/Octal)) format for changing file permissions - the syntax for the example in the table would be `chmod 754
` ``` +----------+--------+--------+--------+--------------+----------+ | Class | r (4) | w (2) | x (1) | Total (sum) | Symbolic | +----------+--------+--------+--------+--------------+----------+ | Owner | 4 | 2 | 1 | 4+2+1 = 7 | rwx | | Group | 4 | 0 | 1 | 4+0+1 = 5 | r-x | | Others | 4 | 0 | 0 | 4+0+0 = 4 | r-- | +----------+--------+--------+--------+--------------+----------+ ``` ``` +-------+----------+------------------+ | Octal | Symbolic | Meaning | +-------+----------+------------------+ | 0 | --- | none | | 1 | --x | exec | | 2 | -w- | write | | 3 | -wx | write + exec | | 4 | r-- | read | | 5 | r-x | read + exec | | 6 | rw- | read + write | | 7 | rwx | all | +-------+----------+------------------+ ``` --- - `chown` : change ownership of files and directories - usage: `chown
:
` - `sudo` : prepended to a command to execute it as the superuser (i.e., "superuser do") - requires the current user to have `sudo`/`root` permissions - requires authentication --- ### Installing Software - `apt` is the default package (i.e., software) manager on Ubuntu - make sure to run `apt update` (or `sudo apt update`) before running any `apt install` commands - `apt install` installs software directly onto the local computer in the user's space; this may work well for private/lab computers or [ScienceCloud VMs](https://docs.s3it.uzh.ch/training/cloud#5-install-packages-on-a-debianubuntu-instance), but consider other methods for software installation - alternative methods to consider are [`uv`](https://docs.s3it.uzh.ch/general/uv), [`conda` / `miniforge` / `mamba`](https://docs.s3it.uzh.ch/general/conda), and **most importantly** [containers](https://docs.s3it.uzh.ch/general/singularity_tutorial) --- ### Connecting to Remote Computers - `ssh`: stands for "secure shell" and is the principal tool used to establish secure connections to remote machines See [our documentation](https://docs.s3it.uzh.ch/general/ssh_keys) on `ssh`, `ssh` key generation, and more. --- ## Context --- ### Open Source Operating Systems By working with virtual machines on the command line for scientific research, you by default will be exposed to an entire open-source operating system. Very often it will be a distribution of [Linux](https://en.wikipedia.org/wiki/Linux) called [Ubuntu](https://ubuntu.com/), but there are many variants (e.g., [Debian](https://www.debian.org/), [Fedora](https://www.fedoraproject.org/), [Arch](https://archlinux.org/)). !!! note "'Distributions'" A "distribution" of Linux means a version of an operating based on the Linux [kernel](https://docs.s3it.uzh.ch/training/commandline#what-is-a-shell). All Linux distributions share the same kernel but differ in other parts of the software stack. At Science IT, the recommended and default version is Ubuntu. It is widely considered one of the most user friendly distributions, especially for beginners. --- Open-source operating systems (and communities) like these form the basis of large scale scientific (and non-scientific) computing. As a researcher via the command line you can, for example, [install software](https://docs.s3it.uzh.ch/training/cloud#5-install-packages-on-a-debianubuntu-instance) to customize your runtime environment then share your software stack setup with other researchers so they can replicate your work on their own computing hardware. Moreover, by using open-source operating systems and software, researchers support the UZH's commitment to [Open Science](https://www.openscience.uzh.ch/en.html). --- ### Scripting Here's an example of a for-loop in Bash, which squares the integers between 1 and 10. ```bash for i in $(seq 1 10); do echo $((i*i)); done ``` Once you can run commands one at a time, the next step is to write shell scripts—small files that tell the computer to run those commands automatically. First, create the file you want to run, in this case, called `squares.sh`. To make it complete, put a "shebang line", which is a series of characters that always go on the 1st line of the script and tells the command line which default command (i.e., binary) should be used when running it. An example shebang line for `bash` is: ```bash #!/bin/bash ``` --- Then add your specific code of interest: ```bash echo 'for i in $(seq 1 10); do echo $((i*i)); done' >> squares.sh ``` Meaning the full script would be: ```bash #!/bin/bash echo 'for i in $(seq 1 10); do echo $((i*i)); done ``` --- Then, run it: ```bash bash squares.sh ``` or, because you have added a shebang line, simply: ```bash squares.sh ``` Shell scripts form the basis for extending your control of a computer so the machine acts according to your instructions without requiring your presence. In other words, they let you automate computers to run workflows for you. Of particular note, shell scripts (Bash) are how users [submit jobs in cluster environments](https://docs.s3it.uzh.ch/training/cluster#submission-script).