Similar projects

This section describes various services and software alternatives to this project. There seems to be an opportunity to build a more generic content distribution system. We should, however, consider the existing alternatives and work from there or at the very least study the way they operate to avoid running into similar problems.

Commercial CDNs

The CDN service is currently mostly implemented by closed-source, for-profit corporations, like Akamai, Cloudflare and others. What we are doing is essentially the same problem, although their use case is simpler because they can usually get the content on the fly and don’t necessarily deal with large files and low bandwidth.

Debian mirrors network

There is a free software projects with similar goals and problems: Debian itself. The Debian project operates a large network of mirrors all over the world. Although a lot of the files on the mirror are small files (less than 1MiB), quite a few are much larger (Chromium is currently around 50MiB) and the total size of an archive snapshot is around 1TB, something quite close to the dataset of Isuma right now.

They have some software to sync packages between the different archives, called ftpsync. It is, however, not well released or distributed (there isn’t even a debian package!), or abstracted: it is very tied to the Debian mirrors network.

Git annex

Holy shit, GIT-annex does a lot of what we want, in implementing a distributed file system on a variety of possible remote repositories!

Integrated with gitolite providing centralized authentication/authorization, much of our required functionality would be available. Git hooks would make it pretty easy to customize and automate. GIT-annex even has a full metadata system implemented, which was a feature I had in mind for the next version of media players. Building on this seems like a good direction!

Challenges with a git-annex implementation:

  • metadata is stored in a separate git branch, by synthesizing commits, which requires git wizardry, and some overhead, see the details of the internals
  • git-annex is still in heavy development and may move under our feet, although the kickstarter phase is finishing now and the product is quite stable for basis
  • git-annex may not scale well with lots of files and lots of clients, see this complaint for example

The git annex author, Joey Hess, may be available for consulting. In this self-run crowdfunding campaign he was proposing 300$/hr consulting fee, but that is now sold out. After discussions in person with Joey, I get the impression it may be possible to hire him for specific improvements to git-annex, but those would need to be useful for the project as a whole, and not juste for our use case. For example, he may be opened to being contracted for improving scalability, but not for storing “behind NAT” metadata that we need.

Those are the things missing from git-annex at this point:

  • bandwidth limiting - although we could use –bwlimit in rsync fairly easily
  • local uploads: git-annex won’t provide you with a web interface for uploads, but will certainly take care of local files
  • glue to automatically mount drives and sync: git-annex doesn’t automatically mount drives but can certainly sync
  • cache of the HTML files - unless the website is redone in static html
  • a good view of sync progress in the various MPs for the CS
  • transcoding - if we keep doing that at all
  • glue for URL rewriting in the website

git-annex manages metadata, and can store arbitrary per-file metadata as well. That meta-data can be exchanged through the git annex assistant, through SSH or XMPP. There had been some talks of using Telehash but this somewhat fell through as the telehash.org upstream development is still incomplete. Looking back at the previous development year Joey mentionned that he will keep an eye on the project and consider other alternatives such as MaidSafe.

Those backends could be useful to help media players share data with each other and facilitate communication without talking to the central server.

Camlistore

Another storage system that may be interesting, similar to git-annex, is Camlistore.