How to use Conda environments on the ScienceCluster¶
Conda is one of the software environment management tools offered on the ScienceCluster. If you need a suite of tools in a specific software language (e.g., R or Python), Conda can help you manage this environment for both portability to other systems and reproducibility.
The example script for Python shows the basics of using a Conda environment, with parallel steps outlined below.
.condarc file in your home directory¶
Because Conda environments write many different files of various sizes when you install new environments, it's best to locate all of your Conda environments within the
/data/$USER directory of the file system.
To do so, before you create your first environment you should first write a
.condarc file to your home directory. This file will tell Anaconda where to locate all of your environments and their packages when they are created.
Users are currently equipped with a default
.condarc in their home directory. You can confirm the contents of this file by running
cat ~/.condarc. If you don't see any output when running this command, continue as directed below to create a new
Use the following command to create a new file, or to overwrite an existing file:
cat << EOF > /home/$USER/.condarc # These two flags determine the default location of conda environments and pkgs envs_dirs: - /data/$USER/conda/envs pkgs_dirs: - /data/$USER/conda/pkgs EOF
Create your environment¶
Before beginning, load the Anaconda module in the cluster using:
module load anaconda3
Then, following the example script for Python, the command used to create a Conda environment is:
conda create --name myenv python=3.10
⚠️ If you do not specify a version of Python when you create the environment, the system's default Python will be used. If you install additional packages using Conda, a newer version of Python may be installed and then made default in the environment.
Make sure to provide a suitable name for each environment so that you can keep track of them. To activate this newly created and empty environment, use:
source activate myenv
⚠️ Conda will direct you to use
conda activate. Make sure to use
source activate instead.
If this occurs successfully, you will see the name of the environment prepended to your command line prompt, like so:
login0 may be another login node number, such as
login1; this is expected behavior.
Once you have your Conda environment loaded, you can proceed with installing packages of interest.
conda install numpy
Always try to use
conda install first when attempting to install packages and/or software. This command will search through Conda channels for the package(s) of interest. You can specify additional Conda channels using the
-c flag; for more details on this flag, see the
conda install Documentation.
Once your packages are installed in your environment, and the environment is activated, you'll be able to access those packages on the cluster. As such, all workflows depending on these packages will need to have the Conda environment activated at runtime. See the example script for Python for a demonstration of such a workflow.
To deactivate your Conda environment and go back to the default ScienceCluster system environment, use:
If your Conda software environment requires the use of packages that can only be installed using
pip, consider these best practices when doing so.
Mamba is a package manager that is fully compatible with Conda but performs faster than Conda on certain tasks. You can use Mamba instead of Conda by loading the
mamba module and using the
mamba command in place of