Transition plan

This section documents the transition process from the old media players to the new media players. This consists mainly of replacing the old software shipped in Debian packages by Puppet configurations and git-annex (Redmine issue #16708 but also replacing all the previous, old media content on media players with HTML 5 ready media content (Redmine issue #14032).

We will therefore not maintain a backwards-compatible API for the new 3G software, so we will transition all media players at once. That means that non-transitionned media players will be broken until they can run Puppet and follow the transition.

General principles

We will make a clear break between the older and newer system. We will schedule a specific date and time for the transition. Media players that will be offline on that date will not be updated until they go back online. We cannot afford to ship new physical machines installed from scratch so we will need to implement an automated transition script between the two versions, which will require more time.

We will implement a typical “one, some, many” approach:

  • Create a well-defined update that will be distributed to all hosts. Nominate it for distribution. The nomination begins a buy-in phase to get it approved by all stakeholders. This practice prevents overly enthusiastic SAs from distributing trivial, non-business-critical software packages.
  • Establish a communication plan so that those affected don’t feel surprised by updates. Execute the plan the same way every time, because customers find comfort in consistency.
  • When you’re ready to implement your Some phase, define (and use!) a success metric, such as If there are no failures, each succeeding group is about 50 percent larger than the previous group. If there is a single failure, the group size returns to a single host and starts growing again.
  • Finally, establish a way for customers to stop the deployment process if things go disastrously wrong. The process document should indicate who has the authority to request a halt, how to request it, who has the authority to approve the request, and what happens next.

—Section 3.1.2.2 of the PSNA

Timeline

The three phases above will be the following:

  • Phase “one” or “alpha”: koumbit media player - week of June 1st
  • Phase “some” or “beta”: isuma office media player - June 8-22
  • Phase “many” or “production”: all media players - July

The above roadmap and dates will be confirmed by Cara before it is put into play, and Cara will be responsible in contacting the various stakeholders affected by the deployment. Updates should be deployed on 3 to 5 machines at a time for better efficiency, in 2 hours maintenance windows. Before an update is actually deployed, Cara will be notified and will coordinate with the stakeholders to ensure everything still works correctly.

Metrics and communication

The success metric for a given media player is determined by the Test procedure, that is, a media player should be basically working as before, syncing new content and allowing uploads (although the latter also depends on the one-step upload work). The success criteria is that 75% of the media players updated in a phase be working before we move on to the next batch, that is, a problem with one machine in a batch is acceptable.

Koumbit and Isuma will be communicating during the migration. Cara can file a ticket with “Immediate” priority in the Media players 3.0 Redmine project to stop operations. Cara from Isuma, Antoine, Cleve and Gabriel from Koumbit are the ones with the authority to stop a deployment.

Implementation

The actual update that will be performed on the media players will be the following. Unless otherwise noted, all steps are performed through Puppet and still remain to be implemented. Each step is grouped and numbered according to the (eventual) mediaplayers::transitionX class in which the transition will be implemented.

  1. have a hard drive sent up north and connected

  2. upgrade to wheezy (not through Puppet)

  3. Prepare for the puppet deployment

    1. add stringify_facts=false to [main] of /etc/puppet/puppet.conf

    2. generate the unique hostname for the mediaplayer using cs.isuma.tv:

      cd /var/www/cachingserver; drush sql-query 'SELECT uid,name,CONCAT("host-mp",YEAR(FROM_UNIXTIME(created)),MONTH(FROM_UNIXTIME(created)),DAY(FROM_UNIXTIME(created)),"-1.mp.isuma.tv") AS hostname FROM users WHERE name LIKE "%n6%" ;'
      
    3. make sure fail2ban is disabled on cs.isuma.tv:

      sudo service fail2ban status
      
  4. git-annex deployment (all those are automated through Puppet in the mediaplayers class)

    • install git-annex (Puppet class gitannex)

    • configure the assistant in /var/isuma/git-annex (Puppet class gitannex::daemon and define gitannex::repository)

    • configure hard drive sync script in udev (Puppet class mediaplayers::syncdrive)

    • setup a cronjob for the metadata sync (Puppet class gitannex::metadata)

    • setup reverse SSH tunnel (Puppet class sshd::autossh and site_sshd::sandbox)

    • configure the remote and bandwidth limits in the assistant (Puppet define gitannex::remote and gitannex::bandwidth)

    • test checklist:

      • new files are synchronised from the central server to the player, just look for the latest changes in the repository, and see if they correspond to latest videos uploaded on the main site:

        cd /var/isuma/git-annex
        git log --stat
        
      • the IP address is propagated in the git repo, see the Media player not detected procedure

      • the new reverse SSH tunel is up and running, see the Remote login to media players procedure

      • downloading an actual file works:

        git annex enableremote s3
        git annex get video/mp4_sd/ZACK_QIA_.mov.mp4
        
      • the bandwidth limits are effective: there should be annex.rsync-upload-options and annex.web-download-command settings that reflect the upload and download parameters in the mediaplayers Puppet class. then when there’s a download, you can look at the process list to see if the commandline flag is effective:

        ps axfu | grep wget
        

        You can still look at the actual bandwidth usage with vnstat --traffic.

      • the schedules are effective: try to set the stop time to the next hour, for example, to stop at 17h00 and start at 18h00:

        class { 'mediaplayers':
          # [...]
          sched_start_hour => ['18'],
          sched_start_minute => 0,
          sched_stop_hour => ['17'],
          sched_stop_minute => 0,
        }
        
  5. Manual deployment (those steps need to be done by hand on the media player)

    • make sure the s3 remote is configured properly:

      cd /var/isuma/git-annex
      git annex enableremote s3
      
    • disable the web and s3 remotes so that downloads are not done from the internet until the drive sync is completed:

      cd /var/isuma/git-annex
      git config remote.web.annex-ignore true
      git config remote.s3.annex-ignore true
      
    • configure the group and wanted content:

      git annex group . mediaplayer
      git annex wanted . groupwanted
      
    • remove /var/isuma/media by hand:

      rm -r /var/isuma/media
      
    • run the syncdrive script, unless Updating a synchronisation drive:

      /lib/udev/mediaplayers-syncdrive <drivepath>
      

      The drive path is the device without /dev and can be found with cat /proc/partitions or dmesg. See also Creating a new synchronisation drive for more detailed instructions on how to find the device number.

      If you are not running this by hand (if it’s already started or you don’t have a sync drive), at least disable the sneakernet remote to remove noises from the logs:

      git config remote.sneakernet.annex-ignore true
      
    • enable the web and s3 remotes so that downloads are properly done from the internet from now on:

      cd /var/isuma/git-annex
      git config remote.web.annex-ignore true
      git config remote.s3.annex-ignore true
      
    • test checklist:

      • new files are downloaded from S3 to the media player
      • the sync drive is working: content is being copied from the external hard drive to the git-annex repo
  6. deploy remaining components

    • deploy the new upload.php and pong.js.php scripts in /var/www/isuma/ (done by hand for now, to be put in the mediaplayers::upload Puppet module)
    • configure Apache and PHP to serve those files (configured by hand, to be put in the mediaplayers::upload Puppet class)
    • setup automated upgrades (apt::unattended_upgrades Puppet class)
    • test checklist:
      • ping/pong test
      • url rewrite test
      • upload test
      • apt-get test
  7. remove the old infrastructure

    • remove the old packages: koumbit-archive-keyring, ffmpeg, php-xml-rpc, isuma-local-servers and isuma-autossh packages (mediaplayers::transition puppet class)
    • rerun the entire Test procedure

Also, we will work on a separate set of AWS S3 buckets and then transition over to them during the break, dropping the old ones after two months. This will double the storage cost at the benefit of a much safer transition.

See also the Redmine issue #17242 for more detailed information on the various tasks in the transition. This includes blocking issues for the above transition. The larger 3g deployment plan is in Redmine issue #16707.