Koumbit developped several components to make this project work. Here’s an overview of the components:
- Git-annex Drupal integration
- Metadata sync script
- Metadata purge script
- Git-annex Puppet module and facts
- Hard drive sync script
- Git-annex integrity check script, also in the git-annex puppet module
- http-parser and libgit2 official Debian 7 “Wheezy” backports to have
python-git2working properly in Wheezy (required for the metadata sync script)
- vnstat Puppet facts
- improvements to the monkeysphere, sshd and apt shared modules
Some of those components are more thoroughly described below.
This section shows some of the internals of git-annex and through that, explains some implementation decisions we have made regarding the way we use git-annex and how we communicate with it.
Fetching key names¶
We are optimising key lookups by bypassing the git-annex bootstrap and directly getting the information from git. So the command:
$ time git annex lookupkey films/Une\ contrehistoire\ de\ linternet.webm SHA256E-s370358233--502d2cdbe609299f483c6172d7cc93a3be6e9057e007fd910da1f4f752a2ce27.webm 0.01user 0.01system 0:00.97elapsed 2%CPU (0avgtext+0avgdata 16288maxresident)k 26856inputs+0outputs (111major+1103minor)pagefaults 0swaps
simply becomes, with only git:
$ time basename $(readlink Une\ contrehistoire\ de\ linternet.webm) SHA256E-s370358233--502d2cdbe609299f483c6172d7cc93a3be6e9057e007fd910da1f4f752a2ce27.webm 0.00user 0.00system 0:00.00elapsed ?%CPU (0avgtext+0avgdata 1576maxresident)k 0inputs+0outputs (0major+77minor)pagefaults 0swaps
This is much faster than the original and can be used directly on the website without caching. It is used to generate the S3 URL for remote viewing, by prefixing it with the S3 bucket name, to give, for example, the following URL:
Next up is trying to figure out if a given remote has a copy of the file or not. Here git-annex’ performance isn’t so great:
$ time git annex find --in 2f90b958-95e4-44e3-8d3b-e780b63936d1 Une\ contrehistoire\ de\ linternet.webm Une contrehistoire de linternet.webm 0.18user 0.20system 0:07.19elapsed 5%CPU (0avgtext+0avgdata 31736maxresident)k 48336inputs+5952outputs (724major+10599minor)pagefaults 0swaps
It’s doing much more work here. What we do instead of the above is to first lookup the git-annex key using the previous procedure, then grep the git-annex branch just using git:
$ time sh -c "file=SHA256E-s370358233--502d2cdbe609299f483c6172d7cc93a3be6e9057e007fd910da1f4f752a2ce27.webm ; pref=$(printf $file | md5sum| sed 's/^\(...\)\(...\).*$/\1\/\2/'); git cat-file -p refs/heads/git-annex:$pref/$file.log | grep 2f90b958-95e4-44e3-8d3b-e780b63936d1" 1407511627.234161s 1 2f90b958-95e4-44e3-8d3b-e780b63936d1 0.00user 0.00system 0:00.00elapsed 80%CPU (0avgtext+0avgdata 5788maxresident)k 0inputs+0outputs (0major+463minor)pagefaults 0swaps
Both this approach and the lookupkey mechanism have been reviewed upstream.
Git-annex Drupal integration¶
We have built a Drupal module to integrate with git-annex. It is currently available for download at the Koumbit Redmine or:
git clone git://git.koumbit.net/drupal-gitannex.git
There is builtin documentation, mostly in the
that should be good to get people started. The goal of the module is
to allow Drupal to build URLs to the best location of a file. It does
not handle adding or removing files into git-annex itself: this
should be taken care of by an assistant running in the
background. That assistant can be deployed as described in the
The module shouldn’t need any special configuration, once installed. There are two main entry points that should prove useful:
A copy of the module’s documentation is available below. More information is available directly in the source of the module, as PHP documentation strings.
Find the internal IP of a media player
This is mainly used to determine if there is a media player available to the client.
This function will search the IP of the currently connected client in
the git-annex repository and find which remote has this IP mentionned
remote.log file. it returns
FALSE if there is no media
player present, otherwise it will return an array of metadata about
the remote server.
gitannex_get_remote() to easily get the internal IP
of a given media player.
This more or less replaces the
function in the previous API, but this talks to git-annex instead of
the central server and returns only the IP address instead of a
gitannex_get_remote() to get an associative array of media
players properties (including unique identifier and so on).
Construct a valid URL for the given filename
This function will generate the best possible URL for a given uploaded file. it will look in the git-annex tracking information to see if the file is available in a nearby media player
This replaces the
cachingserver_get_url() function in the old 2.x
API, with the difference that it treats all files equally and doesn’t
accept restrictions such as “type” or “option”.
Metadata sync script¶
There is a metadata sync script that sends IP address information to
the central server with a custom Python script. The script is
available in the
gitannex Puppet module described above and can be
easily deployed with the
The script writes the data in the remote.log file of the git-annex
branch. A discussion also took place upstream,
where the remote.log location was suggested. That file then gets
synced all aroud by the assistant, along with the other changes on the
git-annex branch. The data currently synced is:
- public IP address (external_ip_address field)
- private IP address (internal_ip_address field)
This information is synced automatically by the git-annex assistant without around a minute after it is changed by the script, which runs every five minute in a cron job configured by Puppet in the gitannex::metadata class.
git-annex branch is written directly using the libgit2 Python
bindings (pygit2). pygit2 was not available
in Debian 7 “Wheezy” so required a significant backporting effort, including
libgit2 and http-parser. pygit2 itself ended up not being
backportable to “wheezy” at all and is currently installed with
pip through Puppet. See Redmine issue #17091
for more details.
The public IP is gleaned from public services, currently httpbin.org, ip.42.pl and ifconfig.me (in that order), with a one second timeout. If more privacy is desired or we get throttled, we can easily implement our own script to do this on the central server, but this is considered premature optimisation at this point. The script can be easily extended to change the source of the public IP address, by editing the script right now. A static IP can also be provided on the commandline.
An IP address change should look something like this in the git history:
antoine@cs:/srv/gitannex-test$ git show git-annex commit 7b21e94b8af7f914f65b3c9addad8a1f61f9be69 Author: Antoine Beaupré <email@example.com> Date: Mon Apr 6 17:29:20 2015 -0400 saving metadata fields diff --git a/remote.log b/remote.log index 62d49da..7ad8d40 100644 --- a/remote.log +++ b/remote.log @@ -1 +1 @@ -d57de23d-0f38-4bef-b743-a9567beb853d external_ip=220.127.116.11 interna +d57de23d-0f38-4bef-b743-a9567beb853d external_ip=127.0.0.1 internal_ip antoine@cs:/srv/gitannex-test$ stat .git/objects/7b/21e94b8af7f914f65b3c9addad8a1f61f9be69 File: `.git/objects/7b/21e94b8af7f914f65b3c9addad8a1f61f9be69' Size: 174 Blocks: 8 IO Block: 4096 regular file Device: ca01h/51713d Inode: 274888 Links: 1 Access: (0444/-r--r--r--) Uid: ( 999/gitannex) Gid: ( 999/gitannex) Access: 2015-04-06 21:30:12.506830065 +0000 Modify: 2015-04-06 21:30:06.646904510 +0000 Change: 2015-04-06 21:30:06.646904510 +0000 Birth: -
Notice how the change took less than a minute (46 seconds) to propagate to the central server. It is so fast because the media players and the central server are both running the assistant, so are in a “connected” mode.
Then the presence of a media player on a given IP address can then be found with:
$ git cat-file -p git-annex:remote.log | grep 18.104.22.168 d57de23d-0f38-4bef-b743-a9567beb853d external_ip=22.214.171.124 internal_ip=192.168.20.108
This is effectively what the Drupal module does, more or less.
Metadata purge script¶
The metadata purge script, described in
Offline detection, is a Python script residing on the
central server, in
/usr/local/sbin/purge_stale_mps. It is ran
every minute through a cronjob. Both the cron job and the script are
deployed through the
mediaplayers::purge Puppet class.
The script uses the same
pygit2 library as the other metadata
script, so the above comments about the backports and pip also apply
By default, the script looks in the
remote.log file for entries
having IP address information (the string
specifically) and then looks up the
UUIDs of the media player
through the PuppetDB REST API in order to find the last
checkin time of the media player in Puppet. If the last check in time
is older than a certain timeout, the entries for the media player are
remote.log completely. The timeout is by default set
to 35 minutes, to cover the regular 30 minute delay at which Puppet is
ran, plus 5 minutes for slower Puppet runs.
The script can be run in
--dryrun mode to simulate what it would
do, during tests. An example run should look like this:
antoine@cs:~$ /usr/local/sbin/purge_stale_mps --repository /var/lib/git-annex/isuma-files/ --dryrun -v found uuids in remote.log: ['a23c90e1-baf5-42d8-9bdf-c367eba3a4a8', '2d61a8de-a24e-44e3-9aa0-54f033fec1e9'] Starting new HTTP connection (1): localhost Starting new HTTP connection (1): localhost host koumbit-mp-test.office.koumbit.net age: 1:11:46.820404 Starting new HTTP connection (1): localhost host mediaplayerv25n6.office.koumbit.net age: 0:01:28.489461 found expired remotes: [(u'2d61a8de-a24e-44e3-9aa0-54f033fec1e9', u'koumbit-mp-test.office.koumbit.net')] rewriting remote.log to remove: [u'2d61a8de-a24e-44e3-9aa0-54f033fec1e9'] not generating commit because running in --drymode, expired: [(u'2d61a8de-a24e-44e3-9aa0-54f033fec1e9', u'koumbit-mp-test.office.koumbit.net')]
In the above case, it found the
media player that is out of date (but didn’t remove its entry because
We could also have restricted the run to the other media player and changed the timeout to force a timeout:
antoine@cs:~$ /usr/local/sbin/purge_stale_mps --repository /var/lib/git-annex/isuma-files/ --dryrun -v --uuid a23c90e1-baf5-42d8-9bdf-c367eba3a4a8 --timeout 0 found uuids in remote.log: ['a23c90e1-baf5-42d8-9bdf-c367eba3a4a8'] Starting new HTTP connection (1): localhost Starting new HTTP connection (1): localhost host mediaplayerv25n6.office.koumbit.net age: 0:03:17.836652 found expired remotes: [(u'a23c90e1-baf5-42d8-9bdf-c367eba3a4a8', u'mediaplayerv25n6.office.koumbit.net')] rewriting remote.log to remove: [u'a23c90e1-baf5-42d8-9bdf-c367eba3a4a8'] not generating commit because running in --drymode, expired: [(u'a23c90e1-baf5-42d8-9bdf-c367eba3a4a8', u'mediaplayerv25n6.office.koumbit.net')]
The commits are generated on the
git-annex branch, and
sync is then called to make sure the
This entire section is deprecated. We are phasing out the use of
Debian packages for now and progressively replacing them with
Puppet manifests. Only the
isuma-kiosk package remains now and
will also eventually be replaced.
The Isuma Media Players make an extensive use of Debian packaging to deploy software but also configuration and policy. This section describes how the packages are built and maintained.
Automated package build system¶
Isuma Debian packages are automatically built by Koumbit’s Jenkins server. The complete documentation about this server is available in the Koumbit wiki, this is only a summary applicable to Isuma packages.
When a change is pushed to one of the Debian packages git repository, they are automatically rebuilt within an intervall of around 15 minutes. The package is built within a Debian Wheezy environment and then uploaded into the Koumbit Debian archive, which is automatically signed.
Packages are uploaded to
unstable by default. To migrate them to
stable, a manual operation
must be performed on the Debian archive, a server only Koumbit
personnel currently has the access to.
Automated package upgrades¶
isuma-local-servers 2.5.0, upgrades are automatically
performed on all Media Players. This is done through the use of the
unattend-upgrades package. Packages
from the Koumbit archive and the main Debian archive are automatically
updated. To update more packages automatically, create a new file in
/etc/apt/apt.conf.d the specify a new
Origins-Pattern that is
appended to the existing list.
/usr/share/doc/unattended-upgrades/README for more information
about this software.
Manually building a package¶
To build the current Debian packages by hand:
git clone firstname.lastname@example.org:isuma-local-servers.git cd isuma-local-servers git-buildpackage
To issue a new version, edit files, commit them, then bump the package version and rebuild:
edit file/foo.txt git commit -m"update foo" file/foo.txt dch -r -i "updating foo" # increments the version number and inserts a commit in debian/changelog git-buildpackage # or debuild
Make sure you use -D stable, if you want to make a hotfix for
stable. Package is now in
To upload the package:
scp isuma-local-servers_* email@example.com:/var/www/debian/incoming
then on the central server:
sudo -u www-data reprepro -b /var/www/debian/ processincoming incoming
kind of klunky but works.
Manually installing a package¶
Copy the package to the local server and run:
dpkg -i isuma-local-servers_<version>_all.deb
If it complains about some dependencies not being installed, run:
to install them.
After installing the package, you will need to perform a few additional steps:
# get the ssh private key for the site server and place it in ISUMA_ROOT with the name .id_dsa. scp firstname.lastname@example.org:/home/cachingserver/.ssh/id_rsa /var/isuma/.id_rsa # (password in issue #187)
At this point you can check the logs in /var/isuma/log and make sure things are running properly.
Manually upgrading Media Players¶
Mass upgrades or installs can be performed with our scripts:
mp_ssh_config | grep Online for s in mediaplayerv25n3 mediaplayerv25n4 mediaplayerv25n5; do mp_ssh_into $s apt-get update; done for s in mediaplayerv25n3 mediaplayerv25n4 mediaplayerv25n5; do mp_ssh_into $s apt-get install isuma-local-servers; done
This should normally not be necessary as the Media Players are automatically upgraded.