Linux Package Management Guidelines

1. Generic Concepts
- 1.1. Pre-Built Binaries
  - 1.1.1. Binary Packages on Ubuntu
- 1.2. Source-Based Installation
2. Python Packages
3. Working in an Isolated Environment
- 3.1. Virtual Environments
- 3.2. Docker

1 Generic Concepts

Linux/Unix OS's generally have really nice ways of sharing, distributing, and installing packages
Depending on the day, this can be very convenient or a huge pain
Figuring out how to manage dependency resolution, compile packages from source, debug missing packages, etc. is truly complicated, and something that takes years of practice
- However, the reality is that this is just part of life as a software developer!
- These same issues crop up in OSx and Windows
Generally when installing software on Linux there are two major ways of doing it
1. Pre-built binaries
2. Source-based installs

1.1 Pre-Built Binaries

Almost all software packages contain code from low-level languages that needs to be compiled into binaries, shared objects, etc.
In pre-built packages, all of the compilation has already been done for you
- Especially convenient because it often takes many more dependencies to compile code than it does to run code
Typically pre-built binaries are packaged into a single archive file that contains both the pre-compiled code, but also a lot of meta-data
- Dependencies
- Features provided
- Installation scripts
There are several common forms of these archives and corresponding utilities for reading and dealing with them
- Utilities are called package managers
The choice of archive format and package manager is one of the biggest differences between Linux flavors
- RPMs
  - Common front-ends include yum, zypp, urpmi
  - These archives are used in Fedora, Red Hat, CentOS, Oracle Linux, openSUSE, etc.
- debs
  - dpkg is by far the most common package manager
  - This is used in all Debian variants including Ubuntu, Mint, SteamOS, Debin, CrunchBang, etc.
- pkg.tar.xz
  - Pacman is the front end for these archives
  - This is what all Arch Linux variants use
  - With this system it is much more common to download just source code, and run compilation scripts (called a PKGBUILD script)
Some archive formats don't actually include pre-built binaries
- Python, Perl, Ruby, etc.
- Source packages (some debs, and many Arch-based packages)
- They are still used for the dependency management and build instructions

1.1.1 Binary Packages on Ubuntu

On Ubuntu, you will be installing nearly every pre-built binary from a .deb file
dpkg is the low-level tool for unpacking and working with .deb files
- You don't use dpkg directly all that often; it is more common to use apt-get or aptitude which are higher level front-ends
- Common reasons for using dpkg directly are:
  - Correcting issues with packages installed on your system… sometimes it is easier to force things with dpkg than with apt-get
  - Installing different versions of packages than are contained on available repositories
  - Installing packages that aren't on repositories… often people will distribute their code as a .deb file downloaded on their website
The apt-like front-ends typically have several advantages over dpkg directly
- Ability to look at a variety of online locations and automatically download .deb files for requested packages as well as dependencies
  - The locations are called the sources and are typically listed in /etc/apt/sources.list and /etc/apt-sources.list.d/
  - PPA's are Personal Package Archives… people can set one up on Launchpad, and they get automatically built by Launchpad's build farm. Often if you are looking for a package that isn't on Ubuntu's official repositories, you can find a PPA
  - Sources are managed by editing the aforementioned files, using apt-add-repository, or through the synaptic GUI
- Cache archives that are already downloaded
- Advanced dependency resolution
- Automatically remove packages that are no longer required. For example if a package was only installed because it was a dependency for another package, and the other package is removed, then apt can automatically remove the dependency as well
If you are having trouble finding a deb file you can possibly use an RPM through something like alien, or you will have to turn to a source-based install

1.2 Source-Based Installation

Typically, if you obtain a copy of the source code, there will be either a README or an INSTALL file with compilation instructions
Source code for packages available on apt-repos can be downloaded using apt-get source
- Typically if the package is on apt you wouldn't be compiling it from source. But let's say there was a bug that you want to try and fix, this is how you would do it.
In order to compile a package, you'll likely need a variety of dependencies
- Sometimes these are listed in instructions somewhere, and you can manually run apt-get commands to install any missing dependencies. Alternatively, if the package is available on apt, you can use apt-get build-dep to attempt to install any dependencies for a source package.
- Compiling a big, complex package from source can sometimes lead down a rabbit hole of requiring you to compile lots of dependencies. You really want to try and avoid this situation.
The most common way of compiling source packages (at least on Debian-based Linux) is with a configure script and the following 3 commands:

1: ./configure
2: make
3: sudo make install

Generally speaking, those commands do the following:
1. ./configure: This is a script that was usually automatically generated using autoconf. Running this script scans your computer and uses pkg-config to look for dependencies. If all dependencies are properly met, it automatically generates a Makefile customized for your computer. Note that this script often takes arguments that can be used to customize how the Makefile gets generated, and thus modifies how the package will be compiled. An example would be setting whether to build the package using static libraries or shared libraries, or setting the install location.
2. make: Assuming the previous command ran successfully, this command parses the Makefile and calls various build tools (typically a compiler like gcc or g++) to actually compile the package.
3. sudo make install: This command runs the install target in the Makefile. Typically this just means copying the compiled binaries into a particular directory on your computer to make them globally usable. That is why this command often requires sudo; you are usually trying to copy files into /usr/local/.
This blog post contains a really great high-level description of how these configure scripts are generated.
Important You want to be very careful about randomly installing packages from source using the above commands!
- When you do the above, dpkg has no way of knowing that you installed the package (because there was no .deb file). So, this won't help you with dependency resolution issues.
- In an ideal world, the Ubuntu filesystem would be separated so that all apt packages went in one place (/usr/), and all packages you compiled go into another place (/usr/local/), but it doesn't always work out so nicely! What this means is that running make install may actually overwrite files that were installed by apt-get or dpkg. If you unintentionally replace some library that other packages depend on with an incompatible version, you could find yourself with a very broken system!
- Sometimes people set their configure scripts to also create an uninstall target in the Makefile. This can help you remove the files that were copied, but it requires you to keep a copy of the source code (or at least the Makefile) around
There are a few alternatives to blindly running the 3 commands.
- Sometimes people will setup their configure scripts to create a target for actually building a .deb file. So running something like make build-deb may create a .deb file for you that you can use to install the package. This is great because then you can also easily uninstall the package. This is pretty rare, but I have seen it. After running ./configure I will typically type make and then hit tab a few times and just see what targets the Makefile contains.
- It is possible to build your own .deb file using a tool like dh_make or checkinstall. This will take more time up front, but could save you debugging down the road. I've recently had a lot of luck with fpm.

2 Python Packages

Generally, to install a Python package, the first thing you should try doing is searching on apt-get. Many common packages have already been bundled by Ubuntu and are easily installable via apt-get. Typically the package names are preceded with python- e.g. to install the NumPy module, you'd run sudo apt-get install python-numpy
Even if the module you want to use is available on apt-get, there may be good reason why you want to install a different version (typically because of bugfixes or new features). Python has its own special way of distributing packages and managing dependencies, the Python Package Index (PyPI). They also have special tools for automatically downloading a particular package and its Python dependencies, compiling the source code, and installing it into a particular location. Note that even though Python code itself doesn't need to be compiled, many Python modules depend on low-level libraries written in C, C++, or Fortran. So many times installing a Python package requires a compilation step.
The recommended tool from getting packages from PyPI is pip. Installing numpy would be as simple as

1: sudo -H pip install numpy

Note that pip isn't great at obtaining non-Python dependencies. For example, if you want to install NumPy from source, you'll need the access to the linear algebra library liblapack. This can be installed via apt-get with the package name liblapack-dev. If you don't have this package, and you try to use pip to install NumPy, the install will simply fail. Then you'll have to interpret the error about missing files, and manually install whatever packages are required.
If you need to install some Python module that is not available on PyPI, then you somehow need to get a copy of the source code. Then inside the directory, you are likely to find a setup.py file. These files are Python's way of defining package dependency information, and writing build and install instructions. If you find one of these files, the most-common install steps would be something like

1: python setup.py build
2: sudo python setup.py install

3 Working in an Isolated Environment

Occasionally package conflicts can be a real headache – it can be challenging to ensure you don't break your system while also developing projects that rely on packages or package versions that are, for one reason or another, not compatible with your system. Mastering and understanding tools and concepts such as Comaker, GCC, and environment variables can help you to avoid having to install packages that conflict, but still allow you to develop while depending on the non-installed packages. Even so, it turns out that there are tools specifically designed for isolating a particular project and its dependencies from the rest of your system. Moreover, these tools usually also provide provisions for transporting the isolated environment from one machine to another.

Below I'll talk about two very common strategies for setting up and using isolated environments.

3.1 Virtual Environments

A Virtual Environment is an isolated Python environment created with the virtualenv tool. This is of the best tools for isolated development in Python. There is a ton of good help online, but below is a quick example of setting up and using a virtual environment, as well as a few "gotchas" that can be worked around.

In a Python project that you want to create the virtual environment in, simply cd to the project root, and type virtualenv <ENV NAME>. Example:

jarvis@test2018:~⟫ mkdir new_python_proj
jarvis@test2018:~⟫ cd new_python_proj/
jarvis@test2018:~/new_python_proj⟫ virtualenv venv
New python executable in /home/jarvis/new_python_proj/venv/bin/python
Installing setuptools, pip, wheel...done.
jarvis@test2018:~/new_python_proj⟫ ls
venv

The above created the venv directory in my new_python_proj directory, and that venv directory now contains all of the contents of the virtual environment. Note, we would never want to track anything inside of the virtual environment in Git.

To start using the virtual environment, you simply have to source the activate script. To stop using the virtual environment you then simply run the deactivate command. Note that both of these actions only apply to the terminal you are currently working in.

jarvis@test2018:~/new_python_proj⟫ source venv/bin/activate
(venv) jarvis@test2018:~/new_python_proj⟫ which python
/home/jarvis/new_python_proj/venv/bin/python
(venv) jarvis@test2018:~/new_python_proj⟫ deactivate
jarvis@test2018:~/new_python_proj⟫ which python
/usr/bin/python

Once the virtual environment is active, you can install additional packages with pip, and they should be installed just in the virtual environment.

jarvis@test2018:~/new_python_proj⟫ source venv/bin/activate
(venv) jarvis@test2018:~/new_python_proj⟫ pip install ipython numpy
Collecting ipython
  Using cached https://files.pythonhosted.org/packages/b0/88/d996ab8be22cea1eaa18baee3678a11265e18cf09974728d683c51102148/ipython-5.8.0-py2-none-any.whl
Collecting numpy
  Using cached https://files.pythonhosted.org/packages/c9/16/1134977cc35d2f72dbe80efa75a8e989ac21289f8e7e2c9005444cd17cd5/numpy-1.15.1-cp27-cp27mu-manylinux1_x86_64.whl
.
.
.
Installing collected packages: simplegeneric, six, scandir, pathlib2, pickleshare, backports.shutil-get-terminal-size, ptyprocess, pexpect, enum34, ipython-genutils, decorator, traitlets, pygments, wcwidth, prompt-toolkit, ipython, numpy
Successfully installed backports.shutil-get-terminal-size-1.0.0 decorator-4.3.0 enum34-1.1.6 ipython-5.8.0 ipython-genutils-0.2.0 numpy-1.15.1 pathlib2-2.3.2 pexpect-4.6.0 pickleshare-0.7.4 prompt-toolkit-1.0.15 ptyprocess-0.6.0 pygments-2.2.0 scandir-1.9.0 simplegeneric-0.8.1 six-1.11.0 traitlets-4.3.2 wcwidth-0.1.7
(venv) jarvis@test2018:~/new_python_proj⟫ ipython
Python 2.7.15rc1 (default, Apr 15 2018, 21:51:34)
Type "copyright", "credits" or "license" for more information.

IPython 5.8.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import numpy

In [2]: print numpy.__file__
/home/jarvis/new_python_proj/venv/local/lib/python2.7/site-packages/numpy/__init__.pyc

In [3]:
Do you really want to exit ([y]/n)?
(venv) jarvis@test2018:~/new_python_proj⟫

3.1.1 Transferring a virtual environment

If you would like to reproduce a virtual environment somewhere else, you need to create a requirements file. Then if you have that requirements file, you can re-generate the virtual environment in one step. To generate the file, simply use pip freeze > requirements.txt in the virtual environment. Feel free to track this file in Git to allow others to reproduce your environment.

To use a requirements.txt file, simply create a virtual environment and then use pip install -r requirements.txt to install all packages at their registered versions inside of the virtual environment.

3.1.2 Using IPython with a virtual environment

IPython is a bit weird because, likely, it is already on your system path, and separately configured to scan specific directories for Python modules (in other words, the system IPython might break the virtual environment isolation). In my experience, if you want to use IPython within a virtual environment the best way to accomplish this is to also install IPython inside of the virtual environment. If, even after installing IPython in the virtual environment, you are greeted with an error such as:

/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py:726: UserWarning: Attempting to work in a virtualenv. If you encounter problems, please install IPython inside the virtualenv.

You can try running hash -r to force Bash to update its understanding of what executables are available and where they are located (read more here).

3.1.3 Beware of PYTHONPATH

Note that if you have your PYTHONPATH environment variable set, then even if you run Python from within the virtual environment it will still scan the directories on the PYTHONPATH. So if you truly want an isolated environment, you should unset the PYTHONPATH when working in the virtual environment.

3.1.4 Packages not on PyPI

One issue that can sometimes crop up is with packages that are not on PyPI. While this used to happen quite often, PyPI has grown to the point that I rarely encounter this anymore. Most Python packages are packaged and distributed using seuptools. The typical workflow is to run

python setup.py build
python setup.py install

Running the above in an active virtual environment will automatically install the package into the virtual environment. Note that this breaks portability because a requirements.txt file won't be able to load this package from PyPI.

3.1.5 Python extension modules

Occasionally, I've encountered Python modules that are not on PyPI and are note distributed with setuptools. This usually happens with packages that are mostly Python extension modules. An example of this that I've run into multiple times is PyQt4. In this case, complete isolation is quite challenging and portability is out the window. However, you can usually still work around this issue and only break isolation on the offending packages. It's usually worth a bit of research, and likely you'll find the workaround. In the PyQt4 situation, I can fix this issue with a few symbolic links.

(venv) jarvis@test2018:~/new_python_proj⟫ python config-spring.py
Traceback (most recent call last):
  File "config-spring.py", line 8, in <module>
    import trep.visual as visual
  File "/home/jarvis/new_python_proj/venv/local/lib/python2.7/site-packages/trep-1.0.3.dev0-py2.7-linux-x86_64.egg/trep/visual/__init__.py", line 2, in <module>
    from visualitem import VisualItem2D, VisualItem3D
  File "/home/jarvis/new_python_proj/venv/local/lib/python2.7/site-packages/trep-1.0.3.dev0-py2.7-linux-x86_64.egg/trep/visual/visualitem.py", line 4, in <module>
    from PyQt4.QtCore import *
ImportError: No module named PyQt4.QtCore
(venv) jarvis@test2018:~/new_python_proj⟫ ln -s /usr/lib/python2.7/dist-packages/PyQt4* venv/lib/python2.7/site-packages/
(venv) jarvis@test2018:~/new_python_proj⟫ ln -s /usr/lib/python2.7/dist-packages/sip* venv/lib/python2.7/site-packages/
(venv) jarvis@test2018:~/new_python_proj⟫ python config-spring.py
(venv) jarvis@test2018:~/new_python_proj⟫

3.1.6 Specifying Python version

If you'd like to specify a particular Python version for the virtual environment to use, simply use the -p option when creating the virtual environment. For example on my system I can use the following to enforce Python 3:

virtualenv -p $(which python3) venv

3.2 Docker

Docker is a widely used tool for deploying complete software solutions across a wide range of hardware. Common applications for Docker include setting up build farms for cross-compiling, deploying web applications to cloud services (e.g. Google App Engine, AWS, Rackspace, Heroku), and setting up consistent isolated environments across systems. Quoting from the Docker Overview page (which is a great read):

Docker provides the ability to package and run an application in a loosely isolated environment called a container. The isolation and security allow you to run many containers simultaneously on a given host.

Basically that means we can create an "image" that contains a complete Linux distribution as well as all packages that we need, installed using any package manager, and across many different programming languages. Then we can start this image in a "container" and the processes running in the container are completely isolated from our host system (except where we explicitly allow interaction). Moreover, we can push images to online repositories and then pull them down at will on any machine. In this way, we can not only create isolated environments, but we can also share the environments across many machines.

Docker is very powerful, but it is also quite complicated. Learning all of the ins-and-outs is not a small task. If you are interested in just getting your feet wet and understanding the basics, I'd highly recommend reading and following along with Getting Started with Docker. When/if you are really faced with a situation where you need or want to roll a custom image you can do more research at that point. For now, I'd keep in mind that Docker exists and that it allows you to deploy nontrivial collections of Linux packages in a lightweight and computationally efficient manner.