How do I install my python package?

Pace has managed a centralized python install for quite a while, but different versions of python and package dependencies made this an increasingly difficult task. Here, we show you how to have complete control over what packages that are used across the many different compute nodes in our environment.

We maintain anaconda, a free and open source python package management system. We have an academic license that allows for optimized libraries to be used on multiple nodes (each user can obtain a free personal academic license through anaconda cloud, ours is for clusters). We provide basic python libraries through an environment 'python-<version>'.

This environment can be cloned to your data directory and changed, providing complete control over what packages are added or removed. To install your own python modules:

  1. Create symlink to data directory so python environments don't fill home directory:
    1. cd ~/
    2. mkdir ~/data/.conda
    3. ln -s ~/data/.conda .conda
  2. Load anaconda:
    1. python 2: module load anaconda2/latest
    2. python 3: module load anaconda3/latest
  3. List all available environments. The * denotes the environment currently activated.
    1. conda env list

      # conda environments:
      #
      python-3.4               /usr/local/pacerepov1/anaconda/anaconda3/latest/envs/python-3.4
      tiny-20170608            /usr/local/pacerepov1/anaconda/anaconda3/latest/envs/tiny-20170608
      tiny-20171117            /usr/local/pacerepov1/anaconda/anaconda3/latest/envs/tiny-20171117
      tiny-20181119            /usr/local/pacerepov1/anaconda/anaconda3/latest/envs/tiny-20181119
      root                  *  /usr/local/pacerepov1/anaconda/anaconda3/latest

  4. Clone the latest environment. In this example, tiny-20181119 is the latest version, so we use that.  We also name it (my-env is used in this example, but it can be any single word).
    1. conda create --clone tiny-20181119 --name my-env
  5. List available environments to check that your new environment is available. Note that the file path for your environment is different - that means you can add new python packages because you have write access to that location:
    1. conda env list
      # conda environments:
      #
      my-env                  <your home directory>/.conda/envs/my-env
      python-3.4               /usr/local/pacerepov1/anaconda/anaconda3/latest/envs/python-3.4
      tiny-20170608            /usr/local/pacerepov1/anaconda/anaconda3/latest/envs/tiny-20170608
      tiny-20171117            /usr/local/pacerepov1/anaconda/anaconda3/latest/envs/tiny-20171117
      tiny-20181119            /usr/local/pacerepov1/anaconda/anaconda3/latest/envs/tiny-20181119
      root                  *  /usr/local/pacerepov1/anaconda/anaconda3/latest
  6. Activate your cloned environment. my-env is used in this example, it just has to match what was successfully listed in the previous step:
    1. source activate my-env  (please run: "bash" before run this step if you are using tcsh shell)
  7. Install whatever packages you need. Packages in the anaconda cloud resolve dependencies:
    1. i.e. for nltk:
      1. conda install nltk
    2. Some packages may require the channel is specified via '-c'. I.e. for HTseq:
      1.  conda install -c bcbio htse
  8. Run your python! To access your python in the future, load the anaconda2/latest or anaconda3/latest module and run:
    1. source activate my-env (please run: "bash" before run this step if you are using tcsh shell)

 

 

After running python, you can deactivate the current environment to switch between conda environments:

  1. source deactivate

Finally, environments count against your file quotas, so it's a good idea to remove them when no longer needed:

  1. conda remove -n my-env --all

 

There are tips for tcsh shell user running anaconda in PBS script:

please add

#!/bin/bash 

on the first line of the script

and add 

if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

right after all #PBS lines

The change in the PBS script allows anaconda virtual environment works in tcsh shell environment.