Terminology

In this documentation, the following definitions are used:

Local servers
Physical machines installed in remote communities with low bandwidth, high-latency or no internet connectivity providing read and write access to a large media library.
Media players
The IsumaTV implementation of the local servers, mainly distributed in Canada’s northern native communities.
Central server
A Drupal website with a set of custom modules to keep an inventory of which files are on which local server, currently only deployed on cs.isuma.tv.
Website
The main Drupal site with all the content and user interface for the public, with custom Drupal modules to rewrite URLs to point to the media players if they are detected after a “ping pong test”. Currently isuma.tv.
v1.0
first generation
The first implementation of media players distributed during the first phase of the project designed jointly by Koumbit and Isuma.TV around 2010.
v2.0
second generation
3g
The second generation of media players was developped around 2012 to adress some bugs and issues with the central server, add a remote diagnostic system (isuma-autossh) and other updates. This is sometimes refered to as “3g” in the litteratue because part of the work on the second generation involved working on the design of a third generation.
v2.5
The “v2.5” is an incremental update on the second generation to improve stability and fix a lot of bugs to ease deployments. During that phase, Debian packaging, install procedures and documentation were improved significantly.
v3.0
third generation
3g
A new generation of media players, most likely a complete rewrite of the local servers code.
CDN

Content Distribution Network, to quote wikipedia:

[A CDN is a] large distributed system of servers deployed in multiple [locations]. The goal of a CDN is to serve content to end-users with high availability and high performance.
SSD

Solid State Drive, to quote Wikipedia: “is a data storage device that uses integrated circuit assemblies as memory to store data persistently. [...] SSDs have no moving (mechanical) components. This distinguishes them from traditional electromechanical magnetic disks such as hard disk drives (HDDs) or floppy disks, which contain spinning disks and movable read/write heads.

Compared with electromechanical disks, SSDs are typically more resistant to physical shock, run silently, have lower access time, and less latency. However, while the price of SSDs has continued to decline over time, consumer-grade SSDs are still roughly six to seven times more expensive per unit of storage than consumer-grade HDDs.” (SSD article on Wikipedia)

SSDs are commonly found in laptops, music players, cell phones and embeded devices, but also more and more commonly in servers.

HDD

Hard Disk Drives, to quote Wikipedia: “data storage device used for storing and retrieving digital information using rapidly rotating disks (platters) coated with magnetic material. An HDD retains its data even when powered off. Data is read in a random-access manner, meaning individual blocks of data can be stored or retrieved in any order rather than sequentially. An HDD consists of one or more rigid (“hard”) rapidly rotating disks (platters) with magnetic heads arranged on a moving actuator arm to read and write data to the surfaces.” (HDD article on Wikipedia)

HDD are high-capacity, but use more power and are more fragile than SSD drives. HDD drives are commonly found in servers, workstations and external backup drives.

RAID

There are two ways of configuring two storage devices:

  • “RAID-1” or “mirroring”: two or more drives that hold the same copy of the data. if one fails, the other still has a copy
  • “RAID-0” or “stripping”: two or more drives that hold different data. if one fails, all the data is lost.

“Stripping”, allows for storage expansion because more than one drive appear as one. The problem with this is that if one of the drives fails (or simply disconnected, if it’s an external drive), the whole data set can be lost. The way we usually work around this problem is by “stripping” multiple “mirrors”, what is basically known as “RAID-10”. So basically, instead of having two drives, we have two times two drives, so four drives.

storage enclosures

Some background: there are really 3 ways of storing disks:

  • internal, no hot-swap: disks are stored internally in the machine and are simply not hot-swappable without stopping the machine, opening the case and taking out your screwdrive and fiddling with wires. This is the current design of the media players
  • externally-accessible trays, but no hotswap: disks are in trays that are accessible from the outside but can’t reliably be replaced or added while the machine is running, in other words, the machine needs to be turned off before a disk is added
  • fully hot-swappable trays: disks are in trays that can be removed while the machine is running

And even in trays, there are some variations: some trays you need a screwdriver to attach the disk in the tray, some don’t have screws, and some don’t have trays at all (you just slide the disk in the slot)!

So basically, it’s a scale of skills required to replace a hard drive here:

  1. internal, highest skill level: operator needs to know how to open the case, remove and install the drive, connect and tell apart the different wires
  2. non-hotswap trays, with screws: operator needs to be able to power-cycle the machine, remove and insert trays, remove and install disks in trays
  3. hotswap trays, with screws: operator needs to be able to remove and insert trays, remove and install disks in trays
  4. hotswap trays, without screws: operator needs to be able to remove and insert disks

The last disk layout could be possible for anyone with a user manual. Most people should also be able to swap drives in and out of trays with a screwdriver, and probably power-cycle the machine as well. This is why hot-swappable drives are so interesting.

Keep in mind that, in all those situations, there’s always the risk that the operator removes too many disks from the array and destroy everything, so the first skill is to be able to interpret the status of the disk array. Hopefully visual indicators could help with that, but it’s something that would need to be part of the requirements, for example.

Repository
Repositories
A git-annex repository is basically a git repository, a directory with files and subdirectories in it where git-annex manages the files as symlinks. There can be many different interconnected repositories that have their own copies of the files.
Remote
A git-annex remote is another git-annex repository, from the point of view of a given repository. It allows one repository to push changes into, and pull changes from, another repository. It can also be a special remote that is not really a git-annex repository, but some special storage. We particularly use the S3 special remote that way.
S3
Amazon S3 (Simple Storage Service) is a commonly used online file storage service that is used to storing objects, often publicly, in “buckets”. Git-annex uses this to store unique copies of the files. Amazon garantees 99.9% uptime on S3, that is, not more than 43 minutes downtime per month.
PuppetDB REST API
A RESTful interface is a standard respecting the Representational state transfer standard. The exact protocol varies according to the application, but in our case, we mostly use it to communicate with the PuppetDB REST API.
Assistant

The git-annex assistant is a daemon that runs in the background and implements the various policies defined by the operator. It will automatically add or remove files from git as they are changed from the outside. It will also synchronise content to other repositories as determined by preferred content policies and sometimes even drop files locally if they are not required anymore.

If two git-annex :remotes: run git-annex, they also can do real-time synchronisation of their content.