Troubleshooting

Test procedure

When a new media player is installed, it needs to be thoroughly tested. This procedure can also be used on existing media players to diagnose problems.

  1. ping/pong test
  2. new videos are downloaded on media players
  3. URL rewriting for recent and old videos is performed correctly on the website
  4. upload videos larger than 8MB without errors
  5. git-annex syncs metadata (files added, removed) with website
  6. git-annex uploads files to central server, and eventually to S3
  7. files uploaded by a media player are eventually transcoded and redistributed to other media players

Basics

Queue is full but media player sees it empty

If the queue is full of good stuff to download but the media player sees it as empty, it could be that the schedule is too restrictive. Try to disable the schedule in the central server and try again.

Password resets

If the password for the media player is lost, it can be recovered by rebooting in a special Linux mode. See Koumbit’s documentation for that purpose.

This technique is complicated and should be considered last resort, if other techniques do not work or are unavailable, as it is difficult and prone to errors.

This technique is known as “booting into /bin/sh as init(8)”.

  1. reboot the machine (by doing control-alt-delete or by holding the power button for 5 seconds and then clicking it again)

  2. you will then see the BIOS screen flash by quickly, then the GRUB menu, which should be shown for a few seconds, quickly hit the shift key to disable the timeout.

  3. hit the e key to edit the menu item

  4. you are now in an editor, with the arrow keys, navigate to the end of the line starting with linux

  5. append init=/bin/sh to the line

  6. hit control-x to boot the resulting configuration

  7. you should end up at a commandline prompt, enter the following command in order (do not type what is after the # symbol):

    mount -w -n -o remount / # -w read/write, -n don't write /etc/mtab
    mount /usr               # in case passwd is on /usr/bin
    /usr/bin/passwd          #
    sync
    umount /usr
    sync
    sync                     # it cannot hurt
    umount -a                # will mount / read only
    reboot -nf               # -n don't sync or write anything, -f don't call shutdown
    
  8. the machine should reboot with the new root password

Logging in to media players on the console

Some media players are in “Kiosk” mode by default, which makes it difficult to diagnose or see what is going on. A Linux console should be available if you type control-alt-F2. Then login with your user account.

Note

See Creating user accounts for information on how to grant access to users.

Remote login to media players

Note

This system replaces the old isuma-autossh package. Some media players may still use the old system. Refer to the 2.x documentation for those.

Each media player is configured to automatically login to the central server with a reverse SSH tunnel. The tunnel should always be up if the media player is online, as it is supervised by the autossh command (which we are looking at replacing, see Redmine issue #17967).

This tunnel allows operators to login through SSH into the media player, regardless of the firewall rules or network configuration at the other end.

By default, each media player has a random port assigned, but one can also be defined in the Puppet manifests. In any case, the most reliable way to find the port of a given media player is with lsof:

antoine@cs:~$ sudo lsof -c ssh -a -i 4TCP:22000-23000 -s TCP:LISTEN -P -a -u host-mp20120507-1.mp.isuma.tv
COMMAND  PID                          USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
sshd    5025 host-mp20120507-1.mp.isuma.tv    7u  IPv4 70526409      0t0  TCP *:22529 (LISTEN)

Note

The above media player name is from the standard hostname configuration. You may need to login to the Puppet Dashboard or look in the Puppet manifests to find that name.

In the above example, you can connect to the media player host-mp20120507-1.mp.isuma.tv on port 22529 on cs.isuma.tv. So let’s try that:

$ ssh -p 22529 -l antoine cs.isuma.tv
antoine@localhost's password:

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Could not chdir to home directory /home/antoine: No such file or directory
antoine@koumbit-mp-test:/$

The last warning is not important: it just indicates that my user has no home directory. Also note that you can login to the media player from anywhere, not just from the central server.

I am now logged into the media player and can do various maintenance tasks.

Note

See the Creating user accounts for information on how to grant access to users.

Inspecting status from the commandline

Note

This section is kept for historical purposes only.

There used to be a way to list media players from the commandline, but this has not been ported to the new system. Access the Puppet Dashboard to see a listing of media players.

Git-annex

Git-annex is an extension to the Git source control management software that allows you to store large files into git, but also to manage multiple repositories on many different storage. It was chosen because it supports S3, metadata and lots of other things, see the similar projects section for more information about why git-annex was chosen.

It is used in the media players project to keep track of files across all media players, sync them to S3, but also to minimally track the media players locations so that the main website can determine if a given file can be served through S3 or a local media player. See the Architecture overview for more information about this.

In a git-annex :repository:, files are stored as symbolic links, pointers to the real file that resides in the .git/annex/objects directory. The .git/annex directory is well described in the upstream “internals” documentation but basically, a file can be present on some or all repositories, and git-annex tracks where the files are actually located in a special git branch called the git-annex branch.

The git-annex assistant is used to automatically manage the files (addition, removals, synchronisation to S3, etc).

Caveats

Note that recent git-annex releases need fairly recent version of git, at least 1.8.1. If you are running the git-annex binary directly, this is not a problem as the standalone version ships with its own copy of git, and the Debian package ensure dependencies are properly satisfied. But if you run git annex (ie. first call git and use the annex subcommand), you will end up with an older version of git, which may cause problems.

The workaround is to use the absolute path to the git binary distributed in the standalone package. It can be in /usr/lib/git-annex.linux/exec/git, /opt/git-annex.linux/git or /usr/local/bin/git. Use the following command to figure out the git version:

$ git --version
git version 2.1.4

Above, the git version is 2.1.4, which is after 1.8.1, so no problems.

Troubleshooting stuck queues

  1. login to the server (using the above procedure)
  2. become the proper user (su www-data -s /bin/bash)
  3. look at the git-annex logfiles (/var/isuma/git-annex/.git/annex/daemon.log*)
  4. if nothing comes to mind, run git annex sync by hand

Diagnostics on the git-annex assistant

The assistant is ran automatically on the media players. It is configured through Puppet in the gitannex::daemon class. It can be stopped and restarted using a fairly standard system-level startup script:

service git-annex stop
service git-annex start
service git-annex status

When it is running, the logs are in the git annex repository, for example in /var/isuma/media/video/.git/annex/daemon.log.

A rough idea of the state of the assistant can also be found in .git/annex/daemon.status, for example, while the assistant is starting up, it will look like this:

root@koumbit-mp-test:/var/isuma/git-annex# cat .git/annex/daemon.status
lastRunning:1434577176.692046s
scanComplete:False
sanityCheckRunning:False
lastSanityCheck:

Stopping transfers

To make the assistant stop doing transfers, you can use the annex-ignore setting for a given remote. For example, to stop downloading from S3, you can use:

git config remote.s3.annex-ignore true

Also note that some old URLs are still stuck in the git history, so you will probably need to disable the web remote as well:

git config remote.web.annex-ignore true

See Redmine issue #17958 for more information about this.

The assistant may need to be restarted for those changes to take effect.

Media player not detected

If a media player running git-annex is not detected when visiting the website, it will not load the videos locally. Everything will be very slow or unusable for the users of the media players and the green ball confirming that the media player is detected will not show on the main website.

Note

Note that you may want to start this list from the bottom
for more trivial cases.
  1. To diagnose this, first make sure the media player has the cronjob configured:

    # crontab -u www-data -l
    # HEADER: This file was autogenerated at 2015-07-22 16:08:27 -0400 by puppet.
    # HEADER: While it can still be managed manually, it is definitely not recommended.
    # HEADER: Note particularly that the comments starting with 'Puppet Name' should
    # HEADER: not be deleted, as doing so could cause duplicate cron jobs.
    # Puppet Name: metadata
    */5 * * * * /usr/local/bin/save_repo_metadata --repository /var/isuma/git-annex
    

    The above is the cron job deployed by Puppet in the gitannex::metadata class.

  2. Then you can try to run the cron job by hand:

    sudo -u www-data /usr/local/bin/save_repo_metadata --repository /var/isuma/git-annex
    

    And see if any errors shows up. You can add --verbose for more information:

    # sudo -u www-data /usr/local/bin/save_repo_metadata --repository /var/isuma/git-annex --verbose
    no change detected in IP addresses ({'external_ip': u'70.83.139.100', 'internal_ip': '192.168.20.227'}), nothing committed
    

    Note

    In the above example, the IP address hasn’t changed since the last run. If the IP address changed, you would get something like this:

    # sudo -u www-data /usr/local/bin/save_repo_metadata --repository /var/isuma/git-annex -v
    saved metadata {'external_ip': u'70.83.139.100', 'internal_ip': '192.168.20.227'} into git-annex commit 2a152045d43630c60595a27c557344350960d6f1
    

    --verbose --verbose will also output debugging information, including the IP address discovery, the changes to the content of the remote.log file and so on.

    You can use the --external-ip and --internal-ip arguments to bypass the detection code if that is the piece that is missing.

  3. Those changes should show up in the remote.log file in the git-annex branch. You can inspect the branch on the media player with the command:

    cd /var/isuma/git-annex
    git cat-file -p git-annex:remote.log
    

    You should see a line like:

    2d61a8de-a24e-44e3-9aa0-54f033fec1e9 external_ip=70.83.139.100 internal_ip=192.168.20.227
    
  4. Then run the same command on the central server, which should be synchronised automatically by the assistant on both sides:

    cd /var/lib/git-annex/isuma-files
    git cat-file -p git-annex:remote.log
    
  5. Finally, also run this on the main website:

    cd /persistent/media
    git cat-file -p git-annex:remote.log
    
  6. All files should be the same. If not, run git annex sync to force a synchronisation of the branches on the machine that doesn’t have the right version.

Offline media player not detected

If a media player is offline, but still seen as offline, it is possible that the purge script has not timed out yet. To purge it, use:

/usr/local/sbin/purge_stale_mps --repository /var/lib/git-annex/isuma-files/ -v --uuid a23c90e1-baf5-42d8-9bdf-c367eba3a4a8 --timeout 0

See Metadata purge script for more details.

Unblocking the assistant

It seems the assistant on the main website sometimes stops adding and moving files to S3. This procedure bypasses the assistant and manages files by hand on the main website.

Note

Permissions are important here! Run this as the user that owns the git-annex repository, for example:

sudo -u www-data -H <command>

or:

sudo -u www-data -i
  1. inspect the status of the git repository:

    www-data@ip-10-87-135-88:/persistent/media$ git status
    # On branch master
    # Changes to be committed:
    #   (use "git reset HEAD <file>..." to unstage)
    #
    #       new file:   video/original/ruth_mc_5_revisited.mov
    #       new file:   video/small/nitvyouthshow.mov.jpg
    #
    # Untracked files:
    #   (use "git add <file>..." to include in what will be committed)
    #
    #       video/large/2005kaugjajjuk.mov.jpg
    #       video/large/2005kaugjajjuk2.mov.jpg
    #       video/large/cofounderthoughtsfrance.mov.jpg
    #       video/large/essakaneroughedit.mov.jpg
    #       video/large/essakaneroughedit2.mov.jpg
    #       video/large/europeartcirqfounders.mov.jpg
    #       video/large/fibonaccimexicoandessakane.mov.jpg
    #       video/large/fibonaccimexicoroughedit.mov.jpg
    #       video/large/highschoolchristmasfoodbank.mov.jpg
    #       video/large/igloolikhighschoolchristmasconcert.mov.jpg
    #       video/large/iglooliktofrance.mov.jpg
    [...]
    

    here you can see files were added to the git staging area (the “index”), presumably by git annex add, but were never committed. you can inspect those changes with:

    $ git diff --cached
    diff --git a/video/original/ruth_mc_5_revisited.mov b/video/original/ruth_mc_5_revisited.mov
    new file mode 120000
    index 0000000..9834110
    --- /dev/null
    +++ b/video/original/ruth_mc_5_revisited.mov
    @@ -0,0 +1 @@
    +../../.git/annex/objects/wK/Gp/SHA256E-s151210348--f43d44e93d6523728baee2acfce6a3a7a819e68a05299e28bd3c9b60522ed2ca.mov/SHA256E-s151210348--f4
    \ No newline at end of file
    
  2. the untracked files need to be inspected, sometimes we have seen files that are symlinks like in git-annex but that were not staged for commit. run this to list the files:

      $ git status --porcelain  | sed 's/?? //' | xargs ls -l
      -rw-rw-r-- 1 www-data www-data      26857 Jun 22 13:51 video/large/2005kaugjajjuk2.mov.jpg
      -rw-rw-r-- 1 www-data www-data      24108 Jun 22 14:25 video/large/2005kaugjajjuk.mov.jpg
      -rw-rw-r-- 1 www-data www-data      23231 Jun 22 15:26 video/large/cofounderthoughtsfrance.mov.jpg
      -rw-rw-r-- 1 www-data www-data      17567 Jun 22 17:30 video/large/essakaneroughedit2.mov.jpg
      [...]
    
    here you see the files are *not* symlink, which is fine.
    
    .. note:: If symlinks were found above, we could have added them
              directly with::
    
                git add <symlink>
    
              Do be careful here: adding a *non* symlinked file *will*
              create major performance issues, so make sure the file
              is a symlink if you ``git add`` it by hand.
    
  3. all untracked files (after symlink cleanup, above) can be then added with git-annex add:

    git annex add .
    
  4. then all of this can be committed into git:

    git commit -m"add uncommitted files by hand"
    
  5. files should then be moved to S3 by hand, since the assistant may not pick them up properly:

    git annex move --to s3
    

Now the files are copied over properly to S3. You will probably want to restart the assistant to fix whatever was broken there:

sudo service git-annex restart

You should also file a bug on the upstream bugtracker to describe the problem that caused this in the first place.

Changing files in git-annex

By default, git-annex doesn’t allow file modification. It is, however, possible to make modifications with a special set of commands.

  1. To edit a file, you first unlock it:

    git annex unlock <file>
    
  2. Then you can replace the file or edit it directly

  3. When done, add the file back into git-annex:

    git annex add <file>
    

    Note

    To cancel changes on the file instead of saving the new version, use:

    git annex lock <file>
    

Errors running git-annex

If you get an error like this:

/opt/git-annex.linux/runshell: 51: /opt/git-annex.linux/runshell: cannot create /.ssh/git-annex-wrapper: Directory nonexistent

It is likely that your $HOME directory isn’t setup properly. Ensure the $HOME variable is set to something reasonable (e.g. /var/www for www-data) or use sudo -u <user> -H <command> or sudo -i -u <user> when using sudo.

Evaluating disk usage

Because everything is a symlink in git-annex, traditionnal tools like du will not work as expected. This is a known issue with git-annex with various workarounds. The one we use is the git annex info command, like this:

www-data@koumbit-mp-test:/var/isuma/git-annex/video$ git annex info --fast *
directory: large
local annex keys: 15700
local annex size: 486.52 megabytes
annexed files in working tree: 15700
size of annexed files in working tree: 486.52 megabytes
directory: mp4_sd
local annex keys: 7977
local annex size: 897.83 gigabytes
annexed files in working tree: 7977
size of annexed files in working tree: 897.83 gigabytes
directory: original
local annex keys: 0
local annex size: 0 bytes
annexed files in working tree: 15800
size of annexed files in working tree: 582.74 gigabytes
directory: small
local annex keys: 15698
local annex size: 48.51 megabytes
annexed files in working tree: 15698
size of annexed files in working tree: 48.51 megabytes
directory: xlarge
local annex keys: 6213
local annex size: 207.93 megabytes
annexed files in working tree: 6213
size of annexed files in working tree: 207.93 megabytes

The local annex * lines are the files available locally and the annexed files are the files available globally on that branch of git.

Dealing with files committed by mistake

It can happen that files get committed into git (instead of git-annex) by mistake. In this case we absolutely want to remove those files from the whole git history. For this we use a tool called bfg because it can easily remove files larger than a certain threshold.

We need to do the following for every git repository:

  • install a java runtime:

    sudo apt-get install default-jre-headless
    
  • download a copy of bfg (unless it becomes available in Debian directly)

  • run this command in the repository:

    git clone --mirror /path/to/repo repo.git
    java -jar bfg-1.12.3.jar --strip-blobs-bigger-than 1M repo.git
    
  • examine the output

  • run this if you are satisfied and want to delete the remaining data:

    git reflog expire --expire=now --all && \
    git gc --prune=now --aggressive
    

This needs to be repeated for every repository.

Inspecting the git-annex branch

It can be that we need to look into the git-annex branch for some reason. There is good documentation upstream about how that branch is laid out, but this may not be immediately useful for git beginners. A few tricks:

  • to list the files in that branch, you can use the .git/annex/index file like this:

    $ GIT_INDEX_FILE=.git/annex/index git ls-files | tail -3
    schedule.log
    trust.log
    uuid.log
    
  • to read a specific file (already demonstrated above):

    $ git cat-file -p git-annex:uuid.log
    31912b57-62a5-475c-87a7-582b5492a216 WD green 1.5TB backup drive timestamp=1400246214.443942s
    31912b57-62a5-475c-87a7-582b5492a216 green_crypt timestamp=1400246182.491768s
    5adbab10-0f7a-467b-b0d8-5d7af2223103 anarcat@marcos:/srv/video timestamp=1397883325.873598s
    5adbab10-0f7a-467b-b0d8-5d7af2223103 main (anarcat@marcos:/srv/video) timestamp=1400245511.126472s
    

    in this case, we see the list of remotes and their recorded descriptions.

Removing refused commits

It is possible that the central server refuses to sync with a media player because it did an illegal modification. In this case you would see something like this:

www-data@koumbit-mp-test:/var/isuma/git-annex$ git push
Counting objects: 5, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 365 bytes, done.
Total 3 (delta 2), reused 0 (delta 0)
remote: WARNING: protected files modified, refusing commit: set(['trust.log'])
To host-mp20120507-1.mp.isuma.tv@cs.isuma.tv:/var/lib/git-annex/isuma-files
! [remote rejected] git-annex -> git-annex (pre-receive hook declined)
error: failed to push some refs to 'host-mp20120507-1.mp.isuma.tv@cs.isuma.tv:/var/lib/git-annex/isuma-files'

The way to recover from this is to reset the git-annex branch to a previously known good state. The commits that need to be removed can be found with:

$ git log --oneline --stat git-annex
2f682fc update
trust.log |    1 +
1 file changed, 1 insertion(+)
9a7b6b1 update
trust.log |    1 +
1 file changed, 1 insertion(+)
0326e0e update
c1d/68a/SHA256E-s368747492--2c1d01a79e8366e1d8ef12d14aeae8b941648f5853666fd09b95af7657d8c63d.mov.mp4.log |    3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

In the above we see the changes to the trust.log file which were refused. We also see a previous commit to a track log. We will try to reset to that commit, assuming that it is safe. First we backup the current position, just in case we want to jump back:

git tag git-annex-bak 2f682fc

Then we update the git-annex branch to the older commit and try to push. Notice how we pass the current commit as well, to avoid updating the wrong branch or losing commits that may have been added in between:

$ git update-ref refs/heads/git-annex 0326e0e 2f682fc
$ git push
Everything up-to-date

It worked! It also seems that were reset to the same commit as what was already on the remote server as well, otherwise the push would have send other commits up as well. We can now remove our backup:

git tag -d git-annex-bak

Caution

It is possible that good commits become tangled up with bad commits, and just reseting the branch like the above will lose those commits. In this case, you will need to clone the repository aside and rebase on a new branch. First, tag the known good version (we take the origin remote branch, but you can also use git log to find a better, closer, one):

git tag good origin/git-annex

Then clone the repository:

cd ..
git clone isuma-files isuma-files-fixup

Then rebase interactively against the good version:

cd isuma-files-fixup
git rebase -i good

That will start an editor where you can drop the bad commits. Use git log --stat in another window to find which commits are problematic. When done, push the changes back in the other repository:

git push origin git-annex:git-annex-fixup

Then backup the current git-annex branch and push the new one:

git tag git-annex-bak git-annex
git update-ref refs/heads/git-annex git-annex-fixup
git push

Once this works, delete the backup tag:

git tag -d git-annex-bak

Note

If the above rescue procedure is too complicated, try to checkout the git-annex branch in a clone and revert the commits:

cd ..
git clone isuma-files isuma-files-fixup
cd isuma-files-fixup
git checkout git-annex
git revert <bad>
git push origin git-annex:git-annex
cd ../isuma-files
git push

Known issues

During this project, we have filed a number of issues upstream, some of which were fixed and some that are still pending. This documents the known problems with git-annex we have documented so far.

Bugs

Missing features

Note that this list was basically created from anarcat’s contributions to git-annex. And of course, more bugs specific to Isuma are documented in the Redmine issue tracker.

S3 diagnostics

You can access the S3 buckets directly if you ever need some diagnostics. You can find the credentials on the main server in /persistent/media/.git/annex/creds/<UUID> where <UUID> is the UUID of the S3 remote, as shown in git annex info --fast. The first line is access key, and the second one is the secret key.

Those credentials can then be used with the s3cmd software to do various operations on the repository. For example, you can list the contents of a bucket:

$ s3cmd ls s3://isuma.misc.test
2015-02-13 00:37     10441   s3://isuma.misc.test/SHA256E-s10441--533128ceb96cb2a6d8039453c3ecf202586c0e001dce312ecbd6a7a356b201dc.jpg