Set up your Mac for Python and Jupyter using virtual environments
It’s easy to clog up your machine with bits of software. So before downloading software and installing it, look at this method which keeps things nice and separate and clean.
Anaconda — the easiest way
If you are not very comfortable installing things from the command line (the Terminal app), of you only need to do data science on your laptop, then the best way to install on a Mac is to use the Anaconda distribution.
Here is a link to the official docs that give step-by-step installation of Anaconda.
It’s easy and fast, but it is not the best way if you are going to want to run lots of projects or are doing coding projects alongside your data science.
Virtualenv — the better way
If however you are going to do more than data science with Python on your laptop then Anaconda is probably not the best way.
For example:
- You are going to write some applications in Python or run other projects beyond just using Jupyter notebooks
- You are comfortable using the command line
In this case it is better to create an installation in a separate virtual environment for different projects. This enables you to keep a very clean separate installation for data science vs your different Python projects. They are easy to maintain and uninstall and don’t clog up your base machine environment.
Here’s how to do this.
These instructions are focused on Mac OS X. Linux setup is the same with the exception of step 1a. Windows install is quite different.
Step 1 —check that Python 3 is installed
All Macs come with Python 2.7 installed already. If you type python
on the command line in Terminal it should start Python2.7. But Python 2.7 will be out of maintenance in 2019, so you really want to use python 3.
Check if you already have Python 3 installed. On the command line type:
python3
If it is already installed it will load and show you the version of Python 3. Something like this:
Python 3.6.4 (default, Jan 6 2018, 11:51:15)[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwinType "help", "copyright", "credits" or "license" for more information.>>>
If it is already installed then that is great and you can go onto step 2.
Step 1a — install Python 3 if you don’t have it
There are many ways of installing Python 3 and many blogs on the web. I have round that the easiest is to use Homebrew. Follow the instructions here:
Step 2 — install virtual environments
A virtual environment is like a Python sandbox into which you can install whatever python packages you like with the pip
command and they will only be installed there. It is nice and clean.
Install the virtualenv software from the command line:
pip install virtualenv
If you don’t have permissions to install it, you may need to use the following and give your main user password (admin user password)
sudo pip install virtualenv
Now you should create a directory in which to store your virtual environments. You can put it whereever you like, but I have found that it is good to have a virtualenvs
(or .virtualenvs
if you prefer to make it hidden) directory just off your home directory. You can set this up by going to your home directory and creating it
cd ~
mkdir virtualenvs
Step 2a — (optional) install virtualenvwrapper
To have an easier way to activate the different virtual environments, you can also install virtualenvwrapper. This requires a bit of configuration but it is worth it in my view.
First install the virtualenvwrapper package:
pip install virtualenvwrapper
Now you need to set up your bash configuration so that it knows where to get your virtual environments from and where the virtualenvwrapper program is stored.
First to do this you need to have:
- The directory path where you created your virtualenv folder
- The directory path where your python 3 is installed — type
which python3
to see the path
Go to your home directory with cd ~
and carefully edit your .bashrc
file which may be empty. Nano is a decent editor to do this:
nano .bashrc
Add the following 3 lines inserting the right paths for your own system and save the file.
export WORKON_HOME=~/virtualenvsexport VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python3source /usr/local/bin/virtualenvwrapper.sh
Close your terminal and reopen in order to ensure that these new configuration options are read. Then type workon
to test that virtualenvwrapper is working. It should give you no errors.
Step 3 — create a new virtual environment for this Jupyter project
Go to your virtualenv folder and create a new virtual environment with the mkvirtualenv
command.
The parameters you will give it will be the location of your python 3 path (found by typing which python3
on the command line) and a name for your new virtual environment — let’s call this one jupyter
cd ~
cd virtualenvs
mkvirtualenv --python=/usr/local/bin/python3 jupyter
This will create your virtual environment and activate it as shown by the bit in brackets
(jupyter) matt$
Type cd ~
to go back to your home directory.
Step 4 — install the software you need inside the virtual environment
When you have the virtual environment activated, whenever you install software using the pip install
command, it will be installed within this virtual environment only and not clog up the rest of your machine.
Let’s install a bunch of tools we want for data science and Jupyter notebooks in this virtual environment.
First let’s install numpy, pandas, matplotlib as core data science tools
(jupyter) matt$ pip install --upgrade pip(jupyter) matt$ pip install numpy pandas matplotlib
Let’s install the jupyter package of tools and also jupyterlab.
(jupyter) matt$ pip install jupyter jupyterlab
You can install any other packages you like using the same pip install [package_name]
command.
To list what you already have installed use pip list
Step 5 — fire up your first notebook
Create a directory for your notebook projects to keep all the files together.
(jupyter) matt$ cd ~
(jupyter) matt$ mkdir my_project
(jupyter) matt$ cd my_project
Starting the jupyter notebook (the classic notebook style) or jupyter lab (a more modern UX version of the notebook is a simple command
(jupyter) matt$ jupyter notebook
Or
(jupyter) matt$ jupyter lab
You can now do your data science projects
Step 6 — Activating and deactivating your virtual environment
Using virtualenvwrapper
If you installed virtualenvwrapper then you can type workon
on the command line to see all your virtual environments and workon [env_name]
to activate any one of them.
matt$ workon analytics
jupyter
campaigns
dummymatt$ workon jupyter(jupyter) matt$
Without virtualenvwrapper
You need to call the activate
script buried within the bin
folder within the virtual environment. To activate, use the source
command:
matt$ source ~/virtualenvs/jupyter/bin/activate(jupyter) matt$
To deactivate, type:
(jupyter) matt$ deactivatematt$
Step 7 — Dealing with the ‘Python path’
Sometimes your programs might not work if the python path is not set correctly. This should be the root directory of your project, from where you start the jupyter notebook.
You can get around this by adding the python path to your virtual environment.
Without virtualenvwrapper
You can add a python path manually. Assuming your project is in the directory we created, on the command line you type the single line:
(jupyter) matt$
export PYTHONPATH="/Users/matt/my_project:$PYTHONPATH"
To see what is set you can type export
from the command time.
Using virtualenvwrapper
The postactivate
file in the bin
folder of your virtual environment will run each time you activate this environment.
Using nano
or another editor you can add python paths to the environment.
export OLD_PYTHONPATH="$PYTHONPATH"
export PYTHONPATH="/Users/matt/my_project:$PYTHONPATH"
And for good practice you can add a line to the predeactivate
file to reset the python path when you exit the environment.
export PYTHONPATH="$OLD_PYTHONPATH"
That’s it. Now you have a specific environment for your jupyter notebook work. You can install what you like in this environment and if it gets too messy just delete the virtual environment’s directory and rebuild it.
You can use separate virtual environments for different projects if they have different packages they need or specific settings. If you are also developing software you are likely to want to set up a different virtual environment for each software project you have.