Development

Koumbit developped several components to make this project work. Here’s an overview of the components:

Some of those components are more thoroughly described below.

Git-annex internals

This section shows some of the internals of git-annex and through that, explains some implementation decisions we have made regarding the way we use git-annex and how we communicate with it.

Fetching key names

We are optimising key lookups by bypassing the git-annex bootstrap and directly getting the information from git. So the command:

$ time git annex lookupkey films/Une\ contrehistoire\ de\ linternet.webm
SHA256E-s370358233--502d2cdbe609299f483c6172d7cc93a3be6e9057e007fd910da1f4f752a2ce27.webm
0.01user 0.01system 0:00.97elapsed 2%CPU (0avgtext+0avgdata 16288maxresident)k
26856inputs+0outputs (111major+1103minor)pagefaults 0swaps

simply becomes, with only git:

$ time basename $(readlink Une\ contrehistoire\ de\ linternet.webm)
SHA256E-s370358233--502d2cdbe609299f483c6172d7cc93a3be6e9057e007fd910da1f4f752a2ce27.webm
0.00user 0.00system 0:00.00elapsed ?%CPU (0avgtext+0avgdata 1576maxresident)k
0inputs+0outputs (0major+77minor)pagefaults 0swaps

This is much faster than the original and can be used directly on the website without caching. It is used to generate the S3 URL for remote viewing, by prefixing it with the S3 bucket name, to give, for example, the following URL:

http://s3.amazonaws.com/test/SHA256E-s370358233--502d2cdbe609299f483c6172d7cc93a3be6e9057e007fd910da1f4f752a2ce27.webm

Location tracking

Next up is trying to figure out if a given remote has a copy of the file or not. Here git-annex’ performance isn’t so great:

$ time git annex find --in 2f90b958-95e4-44e3-8d3b-e780b63936d1 Une\ contrehistoire\ de\ linternet.webm
Une contrehistoire de linternet.webm
0.18user 0.20system 0:07.19elapsed 5%CPU (0avgtext+0avgdata 31736maxresident)k
48336inputs+5952outputs (724major+10599minor)pagefaults 0swaps

It’s doing much more work here. What we do instead of the above is to first lookup the git-annex key using the previous procedure, then grep the git-annex branch just using git:

$ time sh -c "file=SHA256E-s370358233--502d2cdbe609299f483c6172d7cc93a3be6e9057e007fd910da1f4f752a2ce27.webm ; pref=$(printf $file | md5sum| sed 's/^\(...\)\(...\).*$/\1\/\2/'); git cat-file -p refs/heads/git-annex:$pref/$file.log | grep 2f90b958-95e4-44e3-8d3b-e780b63936d1"
1407511627.234161s 1 2f90b958-95e4-44e3-8d3b-e780b63936d1
0.00user 0.00system 0:00.00elapsed 80%CPU (0avgtext+0avgdata 5788maxresident)k
0inputs+0outputs (0major+463minor)pagefaults 0swaps

Both this approach and the lookupkey mechanism have been reviewed upstream.

Git-annex Drupal integration

We have built a Drupal module to integrate with git-annex. It is currently available for download at the Koumbit Redmine or:

git clone git://git.koumbit.net/drupal-gitannex.git

There is builtin documentation, mostly in the gitannex.module file that should be good to get people started. The goal of the module is to allow Drupal to build URLs to the best location of a file. It does not handle adding or removing files into git-annex itself: this should be taken care of by an assistant running in the background. That assistant can be deployed as described in the Git-annex manual.

The module shouldn’t need any special configuration, once installed. There are two main entry points that should prove useful:

  • gitannex_get_internal_ip()
  • gitannex_get_preferred_url($file)

A copy of the module’s documentation is available below. More information is available directly in the source of the module, as PHP documentation strings.

gitannex_get_internal_ip

Find the internal IP of a media player

This is mainly used to determine if there is a media player available to the client.

This function will search the IP of the currently connected client in the git-annex repository and find which remote has this IP mentionned in its remote.log file. it returns FALSE if there is no media player present, otherwise it will return an array of metadata about the remote server.

Wrapper around gitannex_get_remote() to easily get the internal IP of a given media player.

This more or less replaces the cachingserver_get_localserver() function in the previous API, but this talks to git-annex instead of the central server and returns only the IP address instead of a list. Use gitannex_get_remote() to get an associative array of media players properties (including unique identifier and so on).

gitannex_get_preferred_url

Construct a valid URL for the given filename

This function will generate the best possible URL for a given uploaded file. it will look in the git-annex tracking information to see if the file is available in a nearby media player

This replaces the cachingserver_get_url() function in the old 2.x API, with the difference that it treats all files equally and doesn’t accept restrictions such as “type” or “option”.

Metadata sync script

There is a metadata sync script that sends IP address information to the central server with a custom Python script. The script is available in the gitannex Puppet module described above and can be easily deployed with the gitannex::metadata class.

The script writes the data in the remote.log file of the git-annex branch. A discussion also took place upstream, where the remote.log location was suggested. That file then gets synced all aroud by the assistant, along with the other changes on the git-annex branch. The data currently synced is:

  • public IP address (external_ip_address field)
  • private IP address (internal_ip_address field)

This information is synced automatically by the git-annex assistant without around a minute after it is changed by the script, which runs every five minute in a cron job configured by Puppet in the gitannex::metadata class.

The git-annex branch is written directly using the libgit2 Python bindings (pygit2). pygit2 was not available in Debian 7 “Wheezy” so required a significant backporting effort, including libgit2 and http-parser. pygit2 itself ended up not being backportable to “wheezy” at all and is currently installed with pip through Puppet. See Redmine issue #17091 for more details.

The public IP is gleaned from public services, currently httpbin.org, ip.42.pl and ifconfig.me (in that order), with a one second timeout. If more privacy is desired or we get throttled, we can easily implement our own script to do this on the central server, but this is considered premature optimisation at this point. The script can be easily extended to change the source of the public IP address, by editing the script right now. A static IP can also be provided on the commandline.

An IP address change should look something like this in the git history:

antoine@cs:/srv/gitannex-test$ git show git-annex
commit 7b21e94b8af7f914f65b3c9addad8a1f61f9be69
Author: Antoine Beaupré <anarcat@koumbit.org>
Date:   Mon Apr 6 17:29:20 2015 -0400

    saving metadata fields

diff --git a/remote.log b/remote.log
index 62d49da..7ad8d40 100644
--- a/remote.log
+++ b/remote.log
@@ -1 +1 @@
-d57de23d-0f38-4bef-b743-a9567beb853d external_ip=70.83.139.100 interna
+d57de23d-0f38-4bef-b743-a9567beb853d external_ip=127.0.0.1 internal_ip
antoine@cs:/srv/gitannex-test$ stat .git/objects/7b/21e94b8af7f914f65b3c9addad8a1f61f9be69
  File: `.git/objects/7b/21e94b8af7f914f65b3c9addad8a1f61f9be69'
  Size: 174             Blocks: 8          IO Block: 4096   regular file
Device: ca01h/51713d    Inode: 274888      Links: 1
Access: (0444/-r--r--r--)  Uid: (  999/gitannex)   Gid: (  999/gitannex)
Access: 2015-04-06 21:30:12.506830065 +0000
Modify: 2015-04-06 21:30:06.646904510 +0000
Change: 2015-04-06 21:30:06.646904510 +0000
 Birth: -

Notice how the change took less than a minute (46 seconds) to propagate to the central server. It is so fast because the media players and the central server are both running the assistant, so are in a “connected” mode.

Then the presence of a media player on a given IP address can then be found with:

$ git cat-file -p git-annex:remote.log | grep 70.83.139.100
d57de23d-0f38-4bef-b743-a9567beb853d external_ip=70.83.139.100 internal_ip=192.168.20.108

This is effectively what the Drupal module does, more or less.

Metadata purge script

The metadata purge script, described in Offline detection, is a Python script residing on the central server, in /usr/local/sbin/purge_stale_mps. It is ran every minute through a cronjob. Both the cron job and the script are deployed through the mediaplayers::purge Puppet class.

The script uses the same pygit2 library as the other metadata script, so the above comments about the backports and pip also apply here.

By default, the script looks in the remote.log file for entries having IP address information (the string external_ip=, more specifically) and then looks up the UUIDs of the media player through the PuppetDB REST API in order to find the last checkin time of the media player in Puppet. If the last check in time is older than a certain timeout, the entries for the media player are removed from remote.log completely. The timeout is by default set to 35 minutes, to cover the regular 30 minute delay at which Puppet is ran, plus 5 minutes for slower Puppet runs.

The script can be run in --dryrun mode to simulate what it would do, during tests. An example run should look like this:

antoine@cs:~$ /usr/local/sbin/purge_stale_mps --repository /var/lib/git-annex/isuma-files/ --dryrun -v
found uuids in remote.log: ['a23c90e1-baf5-42d8-9bdf-c367eba3a4a8', '2d61a8de-a24e-44e3-9aa0-54f033fec1e9']
Starting new HTTP connection (1): localhost
Starting new HTTP connection (1): localhost
host koumbit-mp-test.office.koumbit.net age: 1:11:46.820404
Starting new HTTP connection (1): localhost
host mediaplayerv25n6.office.koumbit.net age: 0:01:28.489461
found expired remotes: [(u'2d61a8de-a24e-44e3-9aa0-54f033fec1e9', u'koumbit-mp-test.office.koumbit.net')]
rewriting remote.log to remove: [u'2d61a8de-a24e-44e3-9aa0-54f033fec1e9']
not generating commit because running in --drymode, expired: [(u'2d61a8de-a24e-44e3-9aa0-54f033fec1e9', u'koumbit-mp-test.office.koumbit.net')]

In the above case, it found the koumbit-mp-test.office.koumbit.net media player that is out of date (but didn’t remove its entry because of the --dryrun flag).

We could also have restricted the run to the other media player and changed the timeout to force a timeout:

antoine@cs:~$ /usr/local/sbin/purge_stale_mps --repository /var/lib/git-annex/isuma-files/ --dryrun -v --uuid a23c90e1-baf5-42d8-9bdf-c367eba3a4a8 --timeout 0
found uuids in remote.log: ['a23c90e1-baf5-42d8-9bdf-c367eba3a4a8']
Starting new HTTP connection (1): localhost
Starting new HTTP connection (1): localhost
host mediaplayerv25n6.office.koumbit.net age: 0:03:17.836652
found expired remotes: [(u'a23c90e1-baf5-42d8-9bdf-c367eba3a4a8', u'mediaplayerv25n6.office.koumbit.net')]
rewriting remote.log to remove: [u'a23c90e1-baf5-42d8-9bdf-c367eba3a4a8']
not generating commit because running in --drymode, expired: [(u'a23c90e1-baf5-42d8-9bdf-c367eba3a4a8', u'mediaplayerv25n6.office.koumbit.net')]

The commits are generated on the git-annex branch, and git annex sync is then called to make sure the synced/git-annex branch.

Caution

We have sometimes had problems with the changes not propagating properly here, with the “union merge” driver of git-annex overriding our changes. This is a known issue, documented partly in Redmine issue #18262 and the upstream issue removing remote.log information completely.

Debian packages

Caution

This entire section is deprecated. We are phasing out the use of Debian packages for now and progressively replacing them with Puppet manifests. Only the isuma-kiosk package remains now and will also eventually be replaced.

The Isuma Media Players make an extensive use of Debian packaging to deploy software but also configuration and policy. This section describes how the packages are built and maintained.

Automated package build system

Isuma Debian packages are automatically built by Koumbit’s Jenkins server. The complete documentation about this server is available in the Koumbit wiki, this is only a summary applicable to Isuma packages.

When a change is pushed to one of the Debian packages git repository, they are automatically rebuilt within an intervall of around 15 minutes. The package is built within a Debian Wheezy environment and then uploaded into the Koumbit Debian archive, which is automatically signed.

Packages are uploaded to unstable by default. To migrate them to testing or stable, a manual operation must be performed on the Debian archive, a server only Koumbit personnel currently has the access to.

Automated package upgrades

Since isuma-local-servers 2.5.0, upgrades are automatically performed on all Media Players. This is done through the use of the unattend-upgrades package. Packages from the Koumbit archive and the main Debian archive are automatically updated. To update more packages automatically, create a new file in /etc/apt/apt.conf.d the specify a new Origins-Pattern that is appended to the existing list.

See /etc/apt/apt.conf.d/50unattended-upgrades or /usr/share/doc/unattended-upgrades/README for more information about this software.

Manually building a package

To build the current Debian packages by hand:

git clone gitolite@git.koumbit.net:isuma-local-servers.git
cd isuma-local-servers
git-buildpackage

To issue a new version, edit files, commit them, then bump the package version and rebuild:

edit file/foo.txt
git commit -m"update foo" file/foo.txt
dch -r -i "updating foo" # increments the version number and inserts a commit in debian/changelog
git-buildpackage # or debuild

Make sure you use -D stable, if you want to make a hotfix for stable. Package is now in .. or ../build-area.

To upload the package:

scp isuma-local-servers_* antoine@cs.isuma.tv:/var/www/debian/incoming

then on the central server:

sudo -u www-data reprepro -b /var/www/debian/ processincoming incoming

kind of klunky but works.

Manually installing a package

Copy the package to the local server and run:

dpkg -i isuma-local-servers_<version>_all.deb

If it complains about some dependencies not being installed, run:

apt-get install

to install them.

After installing the package, you will need to perform a few additional steps:

# get the ssh private key for the site server and place it in ISUMA_ROOT with the name .id_dsa.
scp cachingserver@isuma.tv:/home/cachingserver/.ssh/id_rsa /var/isuma/.id_rsa
# (password in issue #187)

At this point you can check the logs in /var/isuma/log and make sure things are running properly.

Manually upgrading Media Players

Mass upgrades or installs can be performed with our scripts:

mp_ssh_config | grep Online
for s in mediaplayerv25n3 mediaplayerv25n4 mediaplayerv25n5; do mp_ssh_into $s apt-get update; done
for s in mediaplayerv25n3 mediaplayerv25n4 mediaplayerv25n5; do mp_ssh_into $s apt-get install isuma-local-servers; done

This should normally not be necessary as the Media Players are automatically upgraded.