Simple Full Disk Image Backups With dd

After obtaining a new laptop one of the first things I always do is to make an image of the primary hard drive.  I then copy this image to another computer with a lot of hard disk space and leave it there as a backup should something ever go really wrong with the laptop.  There are a variety of tools both commercial and Open Source that make this process relatively easy.  The most well known tools are probably Norton Ghost and Acronis TrueImage.  In the Open Source world there are some decent alternatives such as FOG and PartImage.  Most of these tools have varying levels of intelligence that allow them to copy only the bits of the filesystem that are in use, dramatically reducing the amount of space that it takes to create an image.

However, all these solutions require me to actually obtain the software and create images will only work with that specific software. The not only results in me needed to download additional software, but also needing to hope that the software continues to be maintained in the future.  As I don’t want to be locked into a single piece of software what may not be maintained in the future, I typically do my backups with a much more simple tool, the standard unix tool dd.

The manual page for dd says that it “cop[ies] and converts a file”.  In this case, I take advantage of the fact that all block devices present themselves as files under Linux and simply make a copy of the hard drive using that method.  If I need to restore the image in the future, I just write the image back to the hard drive using dd.

Because more and more computers are shipping without CD/DVD drives, especially in the category of netbooks and ultraportables we can’t just use a Linux live CD.  In my case, I’ve used this method to back up an IBM Thinkpad x31 and a Lenovo Thinkpad x61 tablet, neither of which have optical drives.  The first step is to create a bootable USB stick with a Linux image on the disk.  You’ll use this to boot into Linux and run dd — after all, you can’t image a drive if you’re currently using it.  There are various tools to do this, but I’ve had really great luck with UNetbootin.  If you want to use the very simple method, you can just download the software, and it will take care of automatically downloading the CD image and coping it to the USB stick to make it bootable.  It also provides an option to take an ISO image already on your hard drive and make it bootable on the USB stick.  Because it has excellent hardware support and the install CD is also a live CD, I usually use the latest version of Ubuntu.

With the USB stick in hand, simply boot the computer you wish to clone from the USB drive.  Once the Linux desktop appears open up a terminal and use sudo to become root (note: some Linux images have you running root all the time).  Now you’re going to use the dd command to make an image of the primary hard disk, which in newer machines is /dev/sda (some older machines may have it as /dev/hda).  Except that we need someplace to store the image, most of the time if you’ve put a Linux CD image on a USB drive it will not be writable, so unless you’ve got another large USB storage device, you’ll need to copy it across the network to another machine.  Here’s a diagram of what we’re going to do:

Drive Imaging Process

Drive Imaging Process

Orange is operations that will happen on the local computer, and blue is operations that will happen on the remote computer where the image will be stored.  Files generated through this process can be pretty large, so doing this over a wired network is going to work better.

In the first step, we use dd to read the data from the hard disk, this is then piped via a standard unix pipe to bzip2, which will compress the data.  Alternatively at this point you can use gzip if you’re worried about the CPU overhead.  This is a long process that I usually let run for a couple of hours.  From there the output of bzip2 is sent to SSH, which connects to SSH on the remote machine.  Then, as we’re still dealing with standard output, we can use cat to redirect the output to a file.  This entire operation can be done with a single command and without the need to create any intermediate files.

dd if=/dev/sda of=/dev/stdout bs=1M | bzip2 | ssh USERNAME@remotehost "cat - > drive.img.bz2"

There are a few parameters which may need a bit of explanation.  With regards to dd, the if and of arguments specify the input and output file, here we use /dev/stdout to indicate that we want dd to send the output to standard output.  I also set the blocksize to 1 megabyte with the bs argument.  This is then sent to bzip2 which will compress the standard input stream and send it to standard output.  SSH then takes this information, logs onto remote computer “remotehost” as user “USERNAME”, and then uses cat to save the image to a file.  This process will not show any sort of progress as you do it.  You can get a rough idea of how much data has been processed by looking at the size of drive.img.bz2 on the remote machine.  In my experience images created this way tend to be about 1/10th the size of the drive.  Alternatively, you can work the pv command into your commands to get a better estimate of progress.  To restore the drive image to the drive, you can use the following command:

ssh USERNAME@remotehost "cat drive.img.bz2" | bzip2 -dc | dd if=/dev/stdin of=/dev/sda bs=1M

In most cases you can even use this method if you upgrade to a different disk, but you’ll need to run something like gparted or Partition Magic to update the size of the filesystem and do some small repairs on the drive after restoring the image.

This method doesn’t generate the smallest files possible — notably if you’re working with a disk that has already been used many times, it will not generate good compression because many of the old bits are still floating around on the disk.  More advanced solutions take into account the structure of the filesystem and just code that large segments are supposed to have no data, which can provide substantial savings.  However, for a method that is quick and works with almost every system out there, I’ve found that this works wonders, and I know that dd isn’t going anywhere.

Iron April 2009

I’m officially on the hook for competing in Iron April 2009.  It’s an interesting event organized by a real Ironman, Kevin Haugh, that takes loads of folks who normally don’t triathalons and convinces them to do the distance of an Ironman over the course of the month of April.  It’s an informal sort of competition where you’re motivated by the other folks who are also trying to accomplish the same thing.  There’s not prize for being the first person to finish, other than knowing that you managed to complete Iron April.  Each person has a page where they post their updates and progress.  I’ll be posting little short blurbs here.  In addition, sometime soon I’ll also create some sort of widget that I can use to post my overall progress on the sidebar of the blog.

I’m fairly certain that the running will be no problem.  Most weeks I break 26.2 miles without the extra motivation.  If I don’t hit 26.2 miles this month it will be because I’ve done something stupid and injured myself.  Swimming just requires me to go to the pool a couple of times and get a few hours of swimming in, not a big deal when you’re on a university campus.  The difficult thing will be biking.  I’m going to start biking to and from school every day, which gives me about 4 miles along the bike trails.  I’ll also have to do some longer bike routes on the weekend or on my off days from running.

In any case, it looks fun and is the first baby steps toward me eventually doing a real Ironman race.

Safely Migrating Recordings in MythTV

As I mentioned in my previous article about using a Drobo as primary MythTV storage, the optimal solution for using a Drobo in MythTV is to record to an internal hard drive and then migrate the recordings later to the Drobo. When migrating recordings care is required because the recording could be in use, either as a result of someone watching the recording or because transcode/commercial flagging jobs are running on the recording. Fortunately, MythTV stores almost everything in a MySQL database, which makes it really easy to find out the status of a recording.

There are two tables that you’ll need to examine to ensure a recording is not currently in use. The first table to check is the jobqueue table. This table lists every recording that has pending jobs on it, whether user jobs or internal jobs such as commercial flagging. It also leaves entries in the table for logs, so you can’t just make sure there are no entries for a program in the table. The trick here is that you’ll want to make sure that the program you’re examining has a status that is greater than 255, which indicates that there isn’t a pending job. All statuses over 255 indicate some sort of completion.

The second step is to make sure the program isn’t currently recording or being watched. This is accomplished by looking at the inuseprogram table. If a program has an entry in here, you certainly should not try to move the recording, and instead just wait until some later time to move it.

When you combine these two checks, you can be fairly certain that recording isn’t in use and moving it will be safe. Rather than having to perform these checks by hand, I’ve created a script called MythMigrator which looks at a storage group and will migrate all recordings no in use to another group. I’ve made a git repository available at http://patrick.wagstrom.net/git/mythmigrate.git where you can download the script.

To utilize the script you’ll need to have two different storage groups defined, one where your recordings are initially stored, and another where they are archived for long term storage. In my case these are called “Default” and “Long Term”. Now it’s as simple as running a cron job at some lightly loaded time, for me this is about 9am, with the following command:

python MythMigrator.py "Default" "Long Term"

That’s all there is to it. My recordings are safely migrated every morning and I hope this script can help you too.

Drobo as Primary Storage Considered Harmful?

Advertisement from mwave.com billing Drobo as your new Primary Storage

Advertisement from mwave.com billing Drobo as your new Primary Storage

In October 2008 I purchased a 1st generation Drobo to provide additional storage space for my growing collection of recordings from MythTV. The plan was to use the Drobo as primary storage for MythTV recordings, something I believed to be acceptable because recording high-definition television maxes out at a couple of megabytes a second. The Drobo is equipped with a USB2 connection and should be able to push well more than that. The Drobo also provided some great advantages, such as on the fly addition of disks to the array, and data redundancy. Along with the Drobo I purchased two Western Digital 1TB green drives. After migrating some drives from my old array I had a Drobo with 2×1TB and 2×320GB drives. I was able to take the number of drives in the actual case for my MythTV box down to just two, and in the process, reduce heat and noise from the machine.

I’m now about five months into this little experiment, and it’s been met with mixed results. After installing the Drobo I had about 700GB of a 1.6TB drive occupied. Writes were fast and I never had problems with recording multiple shows, watching recordings, and even transcoding recordings all at the same time. However, the Drobo allowed me to be lazy, I no longer needed to worry about diligently transcoding recordings. Soon I found my 1.6TB drive filling up and yellow lights appeared telling me that I had better upgrade a disk in the array. Rather than inserting a new disk, I took the opportunity to transcode hundreds of recordings from MythTV. When combined with cutting commercials and a small decrease in resolution, I can take a 4GB HDTV episode of the Office and drop it down to around 500MB with little degradation.

At this point I started to notice issues with MythTV under heavy usage. On Monday nights I often have both HD tuners recording, an analog tuner recording, transcode jobs running, commercials being flagged, and I’m often watching a high def program. This actually maxes out to a little under 10MB/s, something the Drobo should be able to do with little problem. However, recordings from that evening would frequently end up missing small bits of video and jumping ahead. For some shows this was no problem, but it could miss a critical moment in a show like 24. I quickly identified the problem as the Drobo not being able to keep up with the requests. During some down time I used Bonnie++ to calculate a synthetic estimate of disk throughput — the results were downright ugly for the Drobo:

Version 1.03c       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
spongebob        4G 10260  40 14116   8  8107   5 16746  57 18169   3  61.5   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 10085  40 +++++ +++ +++++ +++ 26325  98 +++++ +++ +++++ +++

For comparison, here is the output of Bonnie++ on a 320GB internal hard disk:

Version 1.03c       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
spongebob        4G 42271  94 55788  31 28871  14 50686  96 71219  16 128.7   1
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 22504  96 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++

I’ve put the most relevant stats in red bold. In the case of the Drobo, creating files at the block level, which should be the fastest results in a throughput of about 14MB/s, while the 320GB internal hard disk whizzes by at 55MB/s. Likewise, the read rates are dramatically different, Drobo maxes out at 18MB/s, while the single internal hard drive reads at 71MB/s. For comparison, my 4 drive software RAID 5 array used to read at 106MB/s.

Under most circumstances, switching to a disk array results in an increase in performance. Both writes and reads are spread out across multiple disks, reducing issues with seek time and allowing parallel reads and writes. However, that is clearly not the case with the Drobo. Much of the bottleneck is probably a result of the USB2 interface to the Drobo, however, that cannot be the entire reason for the slow throughput. Many other people have noted the slow performance of the Drobo (see MaximumPC, Mac360, and CNet reviews). On occasion a reviewer will get slightly confused because the device automatically slows down as it gets closer to being full — this is ensure you have enough time to get another hard disk and replace a smaller one. But what’s really strange is how sometimes the Drobo just stops writing for a period of seconds at a time. Here’s some output from dstat when copying a large file from an internal hard drive to the Drobo:

----total-cpu-usage---- --dsk/sdc1- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  1  14   0  83   0   1|   0    21M| 236B 1064B|   0     0 |1675  3260
  3  14   0  81   0   2|   0    24M| 590B  448B|   0     0 |1745  3456
  3  17   0  79   0   2|4096B   21M| 236B  448B|   0     0 |1728  3416
 12  13   0  71   0   3|   0    26M|  29k   29k|   0     0 |1784  3654
  6  33   0  59   0   3|   0    23M| 907B 1139B|   0  8192B|2020  3285
  2  10   0  86   0   2|   0    18M| 596B 3586B|   0     0 |1708  3487
  3  17   0  77   0   2|   0    20M|6262B 3530B|   0     0 |1830  3684
  4  16   0  77   0   4|   0    24M| 354B  448B|   0     0 |1952  4022
  9  12   0  77   1   2|4096B   21M|5203B 8709B|   0     0 |1843  3875
 14   9   0  76   0   3|   0    22M|4848B 5200B|   0     0 |1938  4340
  5  29   0  65   1   2|   0    16M| 224B  586B|   0     0 |1686  3143
  4  12   0  82   0   1|   0  7684k| 354B  464B|   0     0 |1370  2609
  2   5   0  92   0   1|   0  6580k| 416B  536B|   0     0 |1118  2147
  2   9   0  89   1   1|   0  8856k| 210B  510B|   0     0 |1290  2449
  1   6   0  92   0   1|   0  7860k| 462B  600B|   0     0 |1177  2131
  2   3   0  94   0   1|   0  7156k| 328B  560B|   0     0 |1094  1970
  3   4   0  94   0   1|   0  8192k| 578B  798B|   0     0 |1090  2017
  2   3   0  95   0   1|   0  3616k| 573B  743B|   0     0 | 966  1605
  2   5   0  92   0   0|   0   920k| 354B  464B|   0     0 | 939  1610
  5  12   0  82   0   1|4096B 5656k|3555B 3787B|   0     0 |1239  2240
 25  21   0  53   0   1|   0  2416k|  38k   39k|   0     0 |1105  2323
  7   6   0  86   0   0|   0  3120k| 524B  894B|   0     0 |1032  1833
  7   7   0  85   0   0|   0  6152k| 925B 1673B|   0     0 |1153  2040
  4   4   0  93   0   1|   0   960k| 708B  680B|   0     0 | 947  1683
  8  15  27  50   0   0|   0  2176k|3086B 3121B|   0     0 | 973  1587
  6  30  48  16   0   1|   0  1080k|5286B 5664B|   0     0 | 936  1397
  6   5  49  39   0   1|   0   960k|8552B 8820B|   0     0 | 954  1606
  9  18  49  24   0   0|   0     0 |6230B 6486B|   0     0 | 919  1454
  7  28  49  16   0   0|   0     0 |  10k   11k|   0     0 | 897  1313
  4   4  49  43   0   1|   0     0 | 811B 1123B|   0     0 | 945  1690
  3   3  48  46   0   0|   0     0 | 236B  464B|   0     0 | 910  1553
  4   3  50  44   0   0|   0     0 | 250B  584B|   0     0 | 965  1760
  2   2   0  96   1   0|   0   584k| 354B  432B|   0     0 | 927  1589
  4   4   0  92   0   1|   0   120k| 236B  432B|   0     0 | 866  1437
  5   5   0  89   0   0|   0     0 | 118B  432B|   0     0 | 864  1520
  4   4   0  92   0   0|   0   600k| 236B  432B|   0     0 | 910  1631
  5   5   0  90   0   0|   0   960k| 354B  432B|   0     0 | 880  1588
  5   5   0  89   0   1|   0  1680k| 236B  448B|   0     0 | 923  1495
 13  29   0  57   1   0|   0    11M|  14k   14k|   0     0 |1341  2112
  8   5   0  87   1   0|   0  6704k|1693B 2067B|   0     0 |1094  1948

I’ve highlighted the amount of data being written to the Drobo in red, where you can see that for extended periods of the file copy the Drobo decides it would rather not write anything. What’s interesting is that this occurred right after the Drobo went into what seemed like a “Turbo Mode”, writing at rates of over 20MB/s. Clearly, this is a problem, and such blocking of writes mean that Drobo cannot be used as a primary storage device for MythTV. It works wonderful for playing back of recorded programs, storing music files, a drive for my photographs, but anything where streaming directly to disk is required is not a good idead for the Drobo.

However, I have an alternative solution.

MythTV has the concept of Storage Groups. Originally they were designed to force MythTV to load balance recordings across multiple drives. When a recording is selected it appears that MythTV scans all storage groups for the recording. The solution I have adopted is to put the default recording group on an internal harddrive and create another storage group, called “Long Term”, which features the Drobo. Periodically I then copy the files from the internal hard drive to the Drobo, thus ensuring that I always have enough space to record more. Since I’ve done this I haven’t noticed any issues with disk load causing corruptions of recorded files.

Grad School is Where You Lay Your Head

On Monday, March 9th I successfully defended my Ph.D. thesis.  I haven’t finished all the little revisions yet, in fact I haven’t even opened the email from my advisors yet, but I’m largely finished with grad school.  If you’re interested in the thesis or the presentation, you can take a look at my Patrick’s Awesome Thesis Defense Page that is part of my academic home page.

One nice aspect of grad school is that it afforded a reasonably flexible work schedule and the ability to travel a fair amount.  Some semesters, it was pretty crazy, other times it was pretty relaxed.  It all just depended on how productive I was being at work.  In that spirit, I decided to make a map of all the places that I slept while in grad school.  Each of the markers represents a physical building (or tent, etc), while the lines represent wonderful overnight flights.  I haven’t done anything to show how many times I’ve slept at a location or on a flight, but you get the general idea.

Grad School is Where You Lay Your Head
View Larger Map

Confusion Abounds! DTV Transition Delayed Until June 12. Or is it?

Postponing The Rapture For Political Points!

Postponing The Rapture For Political Points!

The United States House of Representatives voted 264-158 to extend the DTV Transition until June 12th.  The vote went largely along party lines with a handful of each party switching sides, but it wasn’t enough to be called a bi-partisan approval.  However, it fails me why this should be a partisan issue in the first place.  In approving the delay, the House left in the provision that allows TV stations to switch over before June 12th if they choose.  This is an appealing option to many stations because of the high cost of running two sets of transmitters.  At a price of $0.12KWh, a 100,000W transmitter costs $12/hour to run, $288/day, and $8640/month.  A station with a 100,000W transmitter could therfore be expected to spend about $34,560 to keep their transmitter powered until June 12th.  Clearly they have incentive to commit to transition early.

The primary reason why many folks voted for this bill was because of a chart that Nancy Pelosi circulated that showed the number of consumers on the waiting list for DTV converter boxes in each congressional district (you can find the chat on Ars Technica’s coverage of the bill).  Every single congressional district has between 1200 and 9200 people on the waiting list for coupons — and a shocking 53,372 people in Puerto Rico.  Furthermore, every district saw their waiting list increase by at least 100 people, and some as many as 800 over the course of a weekend.  Ostensibly, approving a delay would ensure that these 181,013 new households and 1,781,218 previously registered households receive their coupons.

However, the program is out of money.  The only way that coupons are being sent out right now is if someone’s coupon expires.  If the list were capped at the number of people on it currently, and that seems unlikely, then the government still needs to come up with $156,983,520 to pay for those coupons.  Unfortunately, in approving the delay, no provision was made for this funding.

Furthermore, there is no evidence that delaying the transition will actually help people.  Two areas have already transitioned to DTV, Wilmington, NC which transitioned as a test in September, 2008, and the entire state of Hawaii, which transitioned last month to not interfere with a species of bird that is active in February.  If we look at Hawaii, however, there still are 2,974 people on the waiting list, with 129 new people just last weekend.  These are people who failed to submit their coupon request until after the transition happened.  My guess is that there’s a lot of grandmas out there who will be in the same boat — in addition to folks who have cable for most of their televisions, but may want to ditch it because of budget concerns (talk about a great lockin for cable companies).

So, now we’re left with an utter mess.  Instead of having a solid date for when analog broadcasting dies, stations can choose to switch anytime between Februrary 17 and June 12.  My hunch is that within a metro area, not all stations will switch at the same time, just confusing folks more — “I only get the station with the fifth grader show.  Where’s my CSI:Albany?”  Furthermore, it’s looking more and more unlikely that the government will come through with the estimated $300-$500 million necessary to fund the coupon program — they’ve got bigger things to worry about than whether or not people can continue to watch “How I met your Mother” and “Knight Rider”.  Finally, President Obama has yet to sign the bill, but has indicated support for the bill.  Therefore all the official advertising, even on dtv2009.gov still lists February 17th as the date.

Yes the transition has come at a really bad time.  We’re fighting two wars.  Our economy is collapsing.  Now, the way they’ve handled transition indicates that it will be a mess for a long time to come.  In the end, the winners may be services like Dish network and Comcast, who are sending out millions of flyers around the country advertising budget rate plans starting at $9.99 for local channels.  The losers will almost certainly be the people who waited and haven’t requested their coupons, and will be left without whatever crappy summer programs the networks unleash on us in June.  But hey, it’s June, it should be nice outside.  I hear there’s lots of things to do outside.

Taking “nice” Processes Into Account With Powernowd

Almost every modern CPU supports the ability to dynamically change its clock frequency — typically done to save power when there are lesser requirements placed on the system.  While the benefits for a laptop are obvious, more battery life and less heat, they may be a bit harder to understand for a desktop machine.  One application, however, where you should be particularly aware of CPU frequency scaling is with media center boxes.  The less often the CPU needs to speed up, the less heat, the quieter the fan, and the lower your power bill.  On my system, the differences can be pretty dramatic, with the CPU taking an extra 20-30 watts when running at full speed versus slower rates.

Within Linux, there are numerous ways that the CPU frequency is controlled, for example with powernowd or the kernel level ondemand governor, which is the default with most Linux distributions.  One of the aspects of these technologies is that when the system is heavily loaded with processes with high “nice” levels, the CPU will not speed up.  For most cases this is fine, but when running a MythTV box, it may be desirable to allow nice processes to increase the CPU speed — particularly as these nice processes may be things like transcoding or commercial cutting jobs.  Allowing the CPU to ramp up is particular helpful for evenings where many programs are recorded.  On a Monday night where my MythTV box grabs five HD programs, without allowing the CPU to speed up, it can take over a day to commercial flag and trascode everything.  When the CPU can speed up this time drops to about 8 hours.

Under Ubuntu the default configuration is to use the “ondemand” governor, which means there is no daemon running and the CPU will scale with demand and load.  However, there is nasty line buried in the /etc/init.d/powernowd startup script:

92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
use_ondemand() {
    if [ "$OPTIONS" != "-q" ]; then
        return 1
    fi
    status=1  # return error, if no cpu dirs are found
    for x in /sys/devices/system/cpu/cpu[0-9]*/; do
        if [ ! -d $x ] || [ ! -f $x"cpufreq/scaling_governor" ]; then
            continue
        fi
        echo -n ondemand > $x"cpufreq/scaling_governor"
        status=$?
        if [ $status != 0 ]; then
        return $status
        fi
        # The default behaviour of powernowd is to ignore nice load:
        if [ -f $x"cpufreq/ondemand/ignore_nice_load" ]; then
            echo -n 1 > $x"cpufreq/ondemand/ignore_nice_load"
        fi
    done
    return $status
}

Line 108 is the culprit of our problems. By echoing the string “1″ to /sys/devices/system/cpu/cpu[0-9]*/cpufreq/ondemand/ignore_nice_load, the ondemand governor will not take into account nice loads. Under previous versions of Ubuntu, or when you’re not using ondemand and actually need powernowd to run, this change was easily done altering some settings in /etc/defaults as I detailed 3 years ago. However, now you need to hack the script yourself. The simplest way is just to change the “1″ to a “0″ on line 102 of /etc/init.d/powernowd, giving the following line:

102
            echo -n 0 > $x"cpufreq/ondemand/ignore_nice_load"

Change this setting, then you can either reboot, or run /etc/init.d/powernowd restart as root. If you watch your CPU speed (check out /proc/cpuinfo), you should notice that nice processes now speed up the CPU.

Delay the DTV Transition Again?

Will Your TV Go Bye Bye?

For the last 12 months American’s have been incessantly pounded with messages about the upcoming DTV transition. In simple terminology, if you’ve got an old TV hooked up to rabbit ears, it’s going to break in the very near future. The history of this problem is long, and complex, but it generally amounts to the fact that HDTV provides better quality in a smaller swath of frequency. Frequency that is very valuable and can get the government a couple of billion dollars at auction. Last year Verizon managed to snag a good chunk of the 700MHz spectrum currently used for television for it’s next generation 4G phone services.

In order to make this transition a little easier for Americans, the government approved the DTV coupon program, which gives anyone who asks for it two $40 cards that can be used to the purchase of a digital to analog converter — in essence these boxes downsample HD content so folks with SD televisions can still use them. When I was visiting relatives over the holiday season I saw that some of them had these and they generally worked pretty well. Unfortunately, the program has been such a success that they’ve now run out of money and folks who waited are being put on a waiting list until more money is made available for the program. In essence, these folks will loose their over the air TV signals on February 17th.

Congress, of course, wants to ensure that everyone can continue to watch their Wheel of Fortune and Nightline, and don’t want to be blamed by old people who vote, but turn off their hearing aids during advertisements and haven’t paid attention to the transition, for breaking their televisions. Two weeks ago the Senate voted to approve a delay until June. The house attempted to bring up under special rules, which would require a 2/3rds vote. Republicans got together and found enough people to vote against bringing it up on special rules, largely because the process would forbid them from proposing amendments, to make the vote fail. The DTV transition was still February 17th.

Now, much more recently the Senate has again passed a bill to extend the DTV transition period until June, and the House will vote on it tomorrow (Tuesday, February 3rd). It’s anyone’s call about whether or not this bill will pass — there’s good chance it will as Democrats will most likely vote for it after shooting down Republican attempts to litter the bill with tax cuts. President Obama will then sign the bill sometime later this week, less than two weeks before the transition, and magically, after we’ve been hearing for over a year that the transition would be February 17th, it will be moved to June. Talk about confusion.

But wait, it gets even better than that. Many stations have already put plans together and scheduled contractors to take down their analog transmitters. If you’re a small station this is a no-brainer because for the last several years you’ve been broadcasting on two different transmitters — which can eat a large portion of your budget. The new bill will do nothing to stop stations from shutting down their analog transmitters. In others words, instead of having all channels going to static at once, we’ll be left with a situation where there is piecemeal and unpredictable deactivation of the analog spectrum taking place anytime between February 17th and the new deadline in June. I’m sure Grandma is going to love that.

So now, we’re in the final countdown. 15 days to go, and it’s unclear if everything we’ve been hearing is correct or not. However, the bigger problem is that there has been information about this transition for more than a year now — you even see crawls running during television shows. Any television sold post 2007 must be able to accept ATSC digital television. There still are people who don’t know about this? Plausible. After all, there are people who believe all sorts of crazy things, like that Saddam was behind 9/11. So you can’t expect news to get to these people and inform them. At what point will the government realize that some people will just have to be screwed by this transition? My hunch is that there are few people left who have heard about the transition and haven’t done anything about it. Especially considering that most Americans get their television through pay services such as cable or satellite and won’t be affected regardless.  I feel a little bad for the people who haven’t gotten their coupons yet, and I hope this is remedied soon, but fortunately February and March are pretty boring times in the US — few natural disasters, no elections, and only hockey and basketball are playing.  Maybe they should try sitting down with a book for bit instead?  Certainly that’s got more to it than the 5 phrases they see nightly on Wheel of Fortune.

Simple Script to Clean Up Your MP3 Directory

Since getting an iPhone I’ve put a bit more care into making sure that all of my music is ripped and available.  My music resides on a Drobo plugged into my Linux box and shared via Samba to Windows and iTunes.  I’ve chosen to have iTunes organize my music folder automatically.  One down side of this is that iTunes then names files slightly differently and moves files around.  This resulted in hundreds of directories laying around with no content.  It also made it a pain when I wanted to find the actual file outside of iTunes — was “Everybody Hurts” in REM, R.E.M, or R.E.M_?  Because iTunes has organized my music only one of those directories has any content in it, I want to nuke the other ones.

I set about to create a script that will look through a directory tree and find all the directories that do not contain any music at lower levels and remove those directories.  Many of the directories have other crufty files in there, but I’m pretty certain there shouldn’t be anything other than music and covers in my music repository.  However, this means that I can’t just delete everything that isn’t a music file, because I want to keep covers in directories that still have music.

I accomplished this via a two stage script that first finds all subdirectories.  Then for each subdirectory find is executed again to get a count of the number of music files in the path.  If there are no music files in the directory, then the directory gets nuked.  Here’s the script:

1
2
3
4
5
6
7
8
9
10
11
12
13
#!/bin/bash
 
IFS=$'\n'
for td in $(find . -type d); do
    FC=$(find "$td" -type f -iname "*.mp3" -or -iname "*.aac" -or -iname "*.m4?"| wc -l)
    if [ $FC == "0" ]; then
        echo "*** $td $FC"
        # uncomment these two lines if you'd like to be prompted to hit return to nuke stuff
        # ls -lR $td
        # read x
        rm -rf "$td"
    fi
done

There’s a couple of commented lines in the middle there that you can uncomment and it will show a directory listing before waiting for you to hit return and nuke the directory. As near as I can tell this worked perfectly for me, of course, it could easily eat your music files too.

New Administration, New Website, and Lost Content

President Barack Obama

I have just wiped away eight years of internets content!

One of the more interesting aspects of the inauguration today was how the website at whitehouse.gov switched over from the Bush administration to the Obama site right around noon. Shortly thereafter folks noticed the drastic contrast in robots.txt files between the two administrations. Some folks are saying this represents a difference in attitudes between the administrations, which may be true, but it’s much more likely a result of a differences in information architecture and the smaller amount of content on Obama’s whitehouse.gov.

However, this highlights something a bit more interesting, when the website was switched over massive amounts of content were immediately rendered obsolete and for most purposes lost. I’d imagine that National Archives has a copy of them, just like they archive email from the executive branch, but that doesn’t make it that easy to get access too. While much of the information on the whitehouse.gov site is partisan banter, there are some elements that are lost — like the Whitehouse Kids page (archive.org link), and Hurricane Katrina pages (not archived).

Given the increased importance of information placed on the web and the importance of a smooth transition of power, shouldn’t some work be done to archive this information? I realize that it may not be possible to continue to provide interactive elements as the system administrators have left, but a static archive would be helpful and also serve as a valuable resource for folks looking back at previous presidencies.