Podcast Transcription

Overview of process. We will use Launchpad code hosting for keeping track of the transcriptions of the Ubuntu UK Podcast. This process could be used for any podcast or audio media however.

A new team has been setup which will be used by our podcast, and can potentially be used by other podcasts for transcribing theirs. People who want to participate in transcribing simply need to follow the instructions below which cover in detail the exact steps required. This includes signing up for a free launchpad account and installing two pieces of software, Bazaar and Transcriber.

It is likely that you may have already done some of the following steps, however all steps are covered. If you are certain a step has already been done (such as opening a launchpad account), then go to the next step. It does look somewhat complicated, but there isn't that much to it; once your system is setup then it is a simple process to push updated transcripts back to launchpad.

Note: Once you join the team, you can optionally join the mailing list. Simply click the subscribe link on the transcribers team page on launchpad. You then need to "update subscriptions" to ensure you receive the mails.

  • Generate an ssh key (if you haven't already)

$ ssh-keygen -t rsa

Alternately one may use "Applications" -> "Accessories" -> "Passwords and Encryption Keys" and create a "Secure Shell Key" from there.

  • Add key to launchpad

$ cat ~/.ssh/id_rsa.pub

Copy and paste to the following address:

  • Install bzr and transcriber

$ sudo apt-get install bzr python-paramiko transcriber
  • Create folder in which to do transcriptions (I use uupc, but you can use anything)

mkdir -p ~/uupc/audio
  • Use bzr to branch existing transcription

$ cd ~/uupc/
$ bzr checkout bzr+ssh://LAUNCHPAD_USERNAME@bazaar.launchpad.net/~transcribers/ubuntu-uk-podcast/Transcripts

TIP: Now you have downloaded the current revision, in the future to keep up to date you need to just run

$cd ~/uupc/Transcripts
bzr update
  • Download episode(s) to transcribe into transcriptions folder - note you can use "high" or "low" and ".ogg" or ".mp3" depending on bandwidth and hatred level of freedom. (clearly other podcasts will have their own method and naming scheme for downloading)

$ cd ~/uupc/audio
$ wget http://podcast.ubuntu-uk.org/download/uupc_s01e01_high.ogg
  • Start transcribing
    • Open transcriber

Applications -> Sound & Video -> Transcriber

  • Open the transcription file for the episode you wish to help to transcribe. Note that there is currently only one Series so "S01" is correct, just change "E01" to the episode you want to help with. ~/uupc/Transcripts/S01/E01 for example
  • You will then be warned that the signal file cannot be located, this is prompting you to find the original audio file. This should be the file you downloaded earlier. If you downloaded the wrong file, or need additional files, switch to the console or a browser to get the file now and put it in ~/uupc/audio/. ~/uupc/audio/uupc_s01e01_high.ogg for example.
  • Wait for the "Shape info" dialog box to disappear.
  • For details on using Transcriber, go to Help -> User Guide.

  • Once you have finished transcribing, use File -> Save to store your changes.

  • Commit changes
    • Open a terminal
      • Applications --> Accessories --> Terminal

        • Navigate to the transcriptions folder

cd ~/uupc/Transcripts
  • If you started transcribing a NEW episode then run this command to add the new files you created, but if you modified existing files then you can skip this one line.

$ bzr add S01/E01/filename.trs
  • Add details of your changes - a short description is useful

$ bzr commit -m "Fixed a few typos"
  • Done, get a well deserved cup of tea

Summary of commands used

$ bzr checkout
  • Used to initially take a copy of the code from launchpad to your local filesystem

$ bzr update
  • Ensure local copy of code is up to date with respect to launchpad

$ bzr commit -m "Description of change"
  • Create a version stamp and comment the most recent changes

bzr add <filename>
  • Add a file into the branch for subsequent uploading

Some General Tips on Transcribing

Attempt to write accurate translogs, however:

  • Don't document "Umms" and utterances
  • Miss out 'false starts' where speakers haven't picked the correct words or phrases, and restart.
  • If you can't work out what a word spoken is, write is as <UNKNOWN>, and somebody else (maybe a the speaker), will be able to correct it.

  • If you can't work out who is speaking, just mark them as (unknown), but still mark it as a 'turn' - and it will be remedied later.
  • Mark your Resolution to 10 seconds - makes it easier to jump around, and watch spikes
  • If someone mentions a web address, include the http:// in the transcript as that may help in webifying the results later

Try to make these logs readable - similar to subtitles/CC on the television, omit items that meh, need to go

(before a transcribing session, ensure you "bzr update")

PodcastTranscription (last edited 2008-08-06 17:00:19 by localhost)