Linux Package Management Guidelines
Table of Contents
1 Generic Concepts
- Linux/Unix OS's generally have really nice ways of sharing, distributing, and installing packages
- Depending on the day, this can be very convenient or a huge pain
- Figuring out how to manage dependency resolution, compile packages from
source, debug missing packages, etc. is truly complicated, and something that
takes years of practice
- However, the reality is that this is just part of life as a software developer!
- These same issues crop up in OSx and Windows
- Generally when installing software on Linux there are two major ways of doing it
- Pre-built binaries
- Source-based installs
1.1 Pre-Built Binaries
- Almost all software packages contain code from low-level languages that needs to be compiled into binaries, shared objects, etc.
- In pre-built packages, all of the compilation has already been done for you
- Especially convenient because it often takes many more dependencies to compile code than it does to run code
- Typically pre-built binaries are packaged into a single archive file that
contains both the pre-compiled code, but also a lot of meta-data
- Dependencies
- Features provided
- Installation scripts
- There are several common forms of these archives and corresponding utilities
for reading and dealing with them
- Utilities are called package managers
- The choice of archive format and package manager is one of the biggest differences
between Linux flavors
RPMs
- Common front-ends include yum, zypp, urpmi
- These archives are used in Fedora, Red Hat, CentOS, Oracle Linux, openSUSE, etc.
debs
- dpkg is by far the most common package manager
- This is used in all Debian variants including Ubuntu, Mint, SteamOS, Debin, CrunchBang, etc.
pkg.tar.xz
- Pacman is the front end for these archives
- This is what all Arch Linux variants use
- With this system it is much more common to download just source code, and run compilation scripts (called a PKGBUILD script)
- Some archive formats don't actually include pre-built binaries
- Python, Perl, Ruby, etc.
- Source packages (some debs, and many Arch-based packages)
- They are still used for the dependency management and build instructions
1.1.1 Binary Packages on Ubuntu
- On Ubuntu, you will be installing nearly every pre-built binary from a
.deb
file dpkg
is the low-level tool for unpacking and working with.deb
files- You don't use
dpkg
directly all that often; it is more common to useapt-get
oraptitude
which are higher level front-ends - Common reasons for using
dpkg
directly are:- Correcting issues with packages installed on your system… sometimes it
is easier to force things with
dpkg
than withapt-get
- Installing different versions of packages than are contained on available repositories
- Installing packages that aren't on repositories… often people will
distribute their code as a
.deb
file downloaded on their website
- Correcting issues with packages installed on your system… sometimes it
is easier to force things with
- You don't use
- The
apt
-like front-ends typically have several advantages overdpkg
directly- Ability to look at a variety of online locations and automatically download
.deb
files for requested packages as well as dependencies- The locations are called the sources and are typically listed in
/etc/apt/sources.list
and/etc/apt-sources.list.d/
- PPA's are Personal Package Archives… people can set one up on Launchpad, and they get automatically built by Launchpad's build farm. Often if you are looking for a package that isn't on Ubuntu's official repositories, you can find a PPA
- Sources are managed by editing the aforementioned files, using
apt-add-repository
, or through thesynaptic
GUI
- The locations are called the sources and are typically listed in
- Cache archives that are already downloaded
- Advanced dependency resolution
- Automatically remove packages that are no longer required. For example if a
package was only installed because it was a dependency for another package,
and the other package is removed, then
apt
can automatically remove the dependency as well
- Ability to look at a variety of online locations and automatically download
- If you are having trouble finding a
deb
file you can possibly use an RPM through something like alien, or you will have to turn to a source-based install
1.2 Source-Based Installation
- Typically, if you obtain a copy of the source code, there will be either a
README
or anINSTALL
file with compilation instructions - Source code for packages available on
apt
-repos can be downloaded usingapt-get source
- Typically if the package is on
apt
you wouldn't be compiling it from source. But let's say there was a bug that you want to try and fix, this is how you would do it.
- Typically if the package is on
- In order to compile a package, you'll likely need a variety of dependencies
- Sometimes these are listed in instructions somewhere, and you can manually
run
apt-get
commands to install any missing dependencies. Alternatively, if the package is available onapt
, you can useapt-get build-dep
to attempt to install any dependencies for a source package. - Compiling a big, complex package from source can sometimes lead down a rabbit hole of requiring you to compile lots of dependencies. You really want to try and avoid this situation.
- Sometimes these are listed in instructions somewhere, and you can manually
run
- The most common way of compiling source packages (at least on Debian-based
Linux) is with a
configure script
and the following 3 commands:
1: ./configure 2: make 3: sudo make install
- Generally speaking, those commands do the following:
./configure
: This is a script that was usually automatically generated usingautoconf
. Running this script scans your computer and usespkg-config
to look for dependencies. If all dependencies are properly met, it automatically generates aMakefile
customized for your computer. Note that this script often takes arguments that can be used to customize how theMakefile
gets generated, and thus modifies how the package will be compiled. An example would be setting whether to build the package using static libraries or shared libraries, or setting the install location.make
: Assuming the previous command ran successfully, this command parses theMakefile
and calls various build tools (typically a compiler likegcc
org++
) to actually compile the package.sudo make install
: This command runs theinstall
target in theMakefile
. Typically this just means copying the compiled binaries into a particular directory on your computer to make them globally usable. That is why this command often requiressudo
; you are usually trying to copy files into/usr/local/
.
- This blog post contains a really great high-level description of how these configure scripts are generated.
- Important You want to be very careful about randomly installing packages
from source using the above commands!
- When you do the above,
dpkg
has no way of knowing that you installed the package (because there was no.deb
file). So, this won't help you with dependency resolution issues. - In an ideal world, the Ubuntu filesystem would be separated so that all
apt
packages went in one place (/usr/
), and all packages you compiled go into another place (/usr/local/
), but it doesn't always work out so nicely! What this means is that runningmake install
may actually overwrite files that were installed byapt-get
ordpkg
. If you unintentionally replace some library that other packages depend on with an incompatible version, you could find yourself with a very broken system! - Sometimes people set their
configure
scripts to also create anuninstall
target in theMakefile
. This can help you remove the files that were copied, but it requires you to keep a copy of the source code (or at least theMakefile
) around
- When you do the above,
- There are a few alternatives to blindly running the 3 commands.
- Sometimes people will setup their
configure
scripts to create a target for actually building a.deb
file. So running something likemake build-deb
may create a.deb
file for you that you can use to install the package. This is great because then you can also easily uninstall the package. This is pretty rare, but I have seen it. After running./configure
I will typically typemake
and then hit tab a few times and just see what targets theMakefile
contains. - It is possible to build your own
.deb
file using a tool likedh_make
orcheckinstall
. This will take more time up front, but could save you debugging down the road. I've recently had a lot of luck with fpm.
- Sometimes people will setup their
2 Python Packages
- Generally, to install a Python package, the first thing you should try doing
is searching on
apt-get
. Many common packages have already been bundled by Ubuntu and are easily installable viaapt-get
. Typically the package names are preceded withpython-
e.g. to install the NumPy module, you'd runsudo apt-get install python-numpy
- Even if the module you want to use is available on
apt-get
, there may be good reason why you want to install a different version (typically because of bugfixes or new features). Python has its own special way of distributing packages and managing dependencies, the Python Package Index (PyPI). They also have special tools for automatically downloading a particular package and its Python dependencies, compiling the source code, and installing it into a particular location. Note that even though Python code itself doesn't need to be compiled, many Python modules depend on low-level libraries written in C, C++, or Fortran. So many times installing a Python package requires a compilation step. - The recommended tool from getting packages from PyPI is pip. Installing numpy would be as simple as
1: sudo -H pip install numpy
- Note that
pip
isn't great at obtaining non-Python dependencies. For example, if you want to install NumPy from source, you'll need the access to the linear algebra libraryliblapack
. This can be installed viaapt-get
with the package nameliblapack-dev
. If you don't have this package, and you try to usepip
to install NumPy, the install will simply fail. Then you'll have to interpret the error about missing files, and manually install whatever packages are required. - If you need to install some Python module that is not available on PyPI, then
you somehow need to get a copy of the source code. Then inside the directory,
you are likely to find a
setup.py
file. These files are Python's way of defining package dependency information, and writing build and install instructions. If you find one of these files, the most-common install steps would be something like
1: python setup.py build 2: sudo python setup.py install
3 Working in an Isolated Environment
Occasionally package conflicts can be a real headache – it can be challenging to ensure you don't break your system while also developing projects that rely on packages or package versions that are, for one reason or another, not compatible with your system. Mastering and understanding tools and concepts such as Comaker, GCC, and environment variables can help you to avoid having to install packages that conflict, but still allow you to develop while depending on the non-installed packages. Even so, it turns out that there are tools specifically designed for isolating a particular project and its dependencies from the rest of your system. Moreover, these tools usually also provide provisions for transporting the isolated environment from one machine to another.
Below I'll talk about two very common strategies for setting up and using isolated environments.
3.1 Virtual Environments
A Virtual Environment is an isolated Python environment created with the
virtualenv
tool. This is of the best tools for isolated development in
Python. There is a ton of good help online, but below is a quick example of
setting up and using a virtual environment, as well as a few "gotchas" that
can be worked around.
In a Python project that you want to create the virtual environment in,
simply cd
to the project root, and type virtualenv <ENV NAME>
. Example:
jarvis@test2018:~⟫ mkdir new_python_proj
jarvis@test2018:~⟫ cd new_python_proj/
jarvis@test2018:~/new_python_proj⟫ virtualenv venv
New python executable in /home/jarvis/new_python_proj/venv/bin/python
Installing setuptools, pip, wheel...done.
jarvis@test2018:~/new_python_proj⟫ ls
venv
The above created the venv
directory in my new_python_proj
directory, and
that venv
directory now contains all of the contents of the virtual
environment. Note, we would never want to track anything inside of the
virtual environment in Git.
To start using the virtual environment, you simply have to source the
activate
script. To stop using the virtual environment you then simply run
the deactivate
command. Note that both of these actions only apply to the
terminal you are currently working in.
jarvis@test2018:~/new_python_proj⟫ source venv/bin/activate (venv) jarvis@test2018:~/new_python_proj⟫ which python /home/jarvis/new_python_proj/venv/bin/python (venv) jarvis@test2018:~/new_python_proj⟫ deactivate jarvis@test2018:~/new_python_proj⟫ which python /usr/bin/python
Once the virtual environment is active, you can install additional packages
with pip
, and they should be installed just in the virtual environment.
jarvis@test2018:~/new_python_proj⟫ source venv/bin/activate (venv) jarvis@test2018:~/new_python_proj⟫ pip install ipython numpy Collecting ipython Using cached https://files.pythonhosted.org/packages/b0/88/d996ab8be22cea1eaa18baee3678a11265e18cf09974728d683c51102148/ipython-5.8.0-py2-none-any.whl Collecting numpy Using cached https://files.pythonhosted.org/packages/c9/16/1134977cc35d2f72dbe80efa75a8e989ac21289f8e7e2c9005444cd17cd5/numpy-1.15.1-cp27-cp27mu-manylinux1_x86_64.whl . . . Installing collected packages: simplegeneric, six, scandir, pathlib2, pickleshare, backports.shutil-get-terminal-size, ptyprocess, pexpect, enum34, ipython-genutils, decorator, traitlets, pygments, wcwidth, prompt-toolkit, ipython, numpy Successfully installed backports.shutil-get-terminal-size-1.0.0 decorator-4.3.0 enum34-1.1.6 ipython-5.8.0 ipython-genutils-0.2.0 numpy-1.15.1 pathlib2-2.3.2 pexpect-4.6.0 pickleshare-0.7.4 prompt-toolkit-1.0.15 ptyprocess-0.6.0 pygments-2.2.0 scandir-1.9.0 simplegeneric-0.8.1 six-1.11.0 traitlets-4.3.2 wcwidth-0.1.7 (venv) jarvis@test2018:~/new_python_proj⟫ ipython Python 2.7.15rc1 (default, Apr 15 2018, 21:51:34) Type "copyright", "credits" or "license" for more information. IPython 5.8.0 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. In [1]: import numpy In [2]: print numpy.__file__ /home/jarvis/new_python_proj/venv/local/lib/python2.7/site-packages/numpy/__init__.pyc In [3]: Do you really want to exit ([y]/n)? (venv) jarvis@test2018:~/new_python_proj⟫
3.1.1 Transferring a virtual environment
If you would like to reproduce a virtual environment somewhere else, you
need to create a requirements file. Then if you have that requirements file,
you can re-generate the virtual environment in one step. To generate the
file, simply use pip freeze > requirements.txt
in the virtual environment.
Feel free to track this file in Git to allow others to reproduce your
environment.
To use a requirements.txt
file, simply create a virtual environment and
then use pip install -r requirements.txt
to install all packages at their
registered versions inside of the virtual environment.
3.1.2 Using IPython with a virtual environment
IPython is a bit weird because, likely, it is already on your system path, and separately configured to scan specific directories for Python modules (in other words, the system IPython might break the virtual environment isolation). In my experience, if you want to use IPython within a virtual environment the best way to accomplish this is to also install IPython inside of the virtual environment. If, even after installing IPython in the virtual environment, you are greeted with an error such as:
/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py:726: UserWarning: Attempting to work in a virtualenv. If you encounter problems, please install IPython inside the virtualenv.
You can try running hash -r
to force Bash to update its understanding of
what executables are available and where they are located (read more here).
3.1.3 Beware of PYTHONPATH
Note that if you have your PYTHONPATH
environment variable set, then even
if you run Python from within the virtual environment it will still scan the
directories on the PYTHONPATH
. So if you truly want an isolated
environment, you should unset the PYTHONPATH
when working in the virtual
environment.
3.1.4 Packages not on PyPI
One issue that can sometimes crop up is with packages that are not on PyPI. While this used to happen quite often, PyPI has grown to the point that I rarely encounter this anymore. Most Python packages are packaged and distributed using seuptools. The typical workflow is to run
python setup.py build python setup.py install
Running the above in an active virtual environment will automatically
install the package into the virtual environment. Note that this breaks
portability because a requirements.txt
file won't be able to load this
package from PyPI.
3.1.5 Python extension modules
Occasionally, I've encountered Python modules that are not on PyPI and are
note distributed with setuptools
. This usually happens with packages that
are mostly Python extension modules. An example of this that I've run into
multiple times is PyQt4
. In this case, complete isolation is quite
challenging and portability is out the window. However, you can usually
still work around this issue and only break isolation on the offending
packages. It's usually worth a bit of research, and likely you'll find the
workaround. In the PyQt4
situation, I can fix this issue with a few
symbolic links.
(venv) jarvis@test2018:~/new_python_proj⟫ python config-spring.py Traceback (most recent call last): File "config-spring.py", line 8, in <module> import trep.visual as visual File "/home/jarvis/new_python_proj/venv/local/lib/python2.7/site-packages/trep-1.0.3.dev0-py2.7-linux-x86_64.egg/trep/visual/__init__.py", line 2, in <module> from visualitem import VisualItem2D, VisualItem3D File "/home/jarvis/new_python_proj/venv/local/lib/python2.7/site-packages/trep-1.0.3.dev0-py2.7-linux-x86_64.egg/trep/visual/visualitem.py", line 4, in <module> from PyQt4.QtCore import * ImportError: No module named PyQt4.QtCore (venv) jarvis@test2018:~/new_python_proj⟫ ln -s /usr/lib/python2.7/dist-packages/PyQt4* venv/lib/python2.7/site-packages/ (venv) jarvis@test2018:~/new_python_proj⟫ ln -s /usr/lib/python2.7/dist-packages/sip* venv/lib/python2.7/site-packages/ (venv) jarvis@test2018:~/new_python_proj⟫ python config-spring.py (venv) jarvis@test2018:~/new_python_proj⟫
3.1.6 Specifying Python version
If you'd like to specify a particular Python version for the virtual
environment to use, simply use the -p
option when creating the virtual
environment. For example on my system I can use the following to enforce
Python 3:
virtualenv -p $(which python3) venv
3.2 Docker
Docker is a widely used tool for deploying complete software solutions across a wide range of hardware. Common applications for Docker include setting up build farms for cross-compiling, deploying web applications to cloud services (e.g. Google App Engine, AWS, Rackspace, Heroku), and setting up consistent isolated environments across systems. Quoting from the Docker Overview page (which is a great read):
Docker provides the ability to package and run an application in a loosely isolated environment called a container. The isolation and security allow you to run many containers simultaneously on a given host.
Basically that means we can create an "image" that contains a complete Linux distribution as well as all packages that we need, installed using any package manager, and across many different programming languages. Then we can start this image in a "container" and the processes running in the container are completely isolated from our host system (except where we explicitly allow interaction). Moreover, we can push images to online repositories and then pull them down at will on any machine. In this way, we can not only create isolated environments, but we can also share the environments across many machines.
Docker is very powerful, but it is also quite complicated. Learning all of the ins-and-outs is not a small task. If you are interested in just getting your feet wet and understanding the basics, I'd highly recommend reading and following along with Getting Started with Docker. When/if you are really faced with a situation where you need or want to roll a custom image you can do more research at that point. For now, I'd keep in mind that Docker exists and that it allows you to deploy nontrivial collections of Linux packages in a lightweight and computationally efficient manner.