Command Line Updating Pages on Google Sites

About eighteen months ago I migrated my academic web pages away from a self hosted solution on a Linux box in my living room to Google Sites. Mainly this was done because I was applying for jobs and wanted to make sure that the site would be reliable. But although I came for the reliability, I stayed for the features.

It’s true that Google Sites is somewhat limiting in what you can do. You can’t do fun stuff with jQuery and highly customized CSS is verboten. It’s not going to work for someone who needs to share a design portfolio. However, for an academic it works really well. Basically, I need a set of pages about my research, copies of my papers and presentations, and various forms of my résumé and cv.  These are all typically boring pages that can be created with some simple HTML.  Google Sites manages that and even helps them look good too.

At that time I also realized that I needed to be a bit more flexible in how I handled my resume and CV. Up until this point I had a highly customized LaTeX file that generated a very pretty PDF.  The beauty was only skin deep, underneath the PDF it was ugly, difficult to maintain, and if I wanted to paste portions of the document in to an email or someone requested a word document version, I was out of luck.

At the time I still hadn’t gotten my head on straight regarding how XML should never be used by humans, so I chose the XML Résumé Library, an Open Source package that hasn’t been updated in a couple of years. The library consists of an XML DTD that defines elements of a resume and a set of XSLT files that translate your resume into various formats, including text, html, and FO. I can then use the FO files to generate PDF, DOCX, and ODT files. Simple enough. Now I have a single source document with a Makefile that compiles the file into both my résumé and cv.

The problem is that I provide each document in five different formats, which means that I needed to upload 10 documents every time that I changed something. This was not ideal at all.

Luckily, Google is in the process of making open APIs for all of their tools and last September they finally released the Google Sites API. It still isn’t 100% complete, but with the 2.0.7 release of the python libraries it is finally to the point where the python library is suitable for updating documents.

I whipped up a simple little python script that uploads files from the command line to Google Sites. It only works with documents that have previously been uploaded by hand, so in that sense, it only updates documents. You can find site_uploader.py as a github gist or it should be embedded below.

The script itself has only been tested on apps for domains and has a couple of mandatory options:

  • -s/–site: The name of the site to update. This isn’t the URL, but the name in your admin panel for the site.
  • -d/–domain: The domain name of your apps for domain setup. I’m not certain what happens if you don’t include this because all of my sites are hosted through Apps for Domains.
  • -u/–user: The username to use for accessing the Google Sites API.
  • -p/–pass: The password for the user account. The sites API provides multiple different authentication methods. For my own convenience I have my Makefile prompt me for a password with Zenity then invoke the script. I’m on a laptop which means the chance of someone else seeing my password in the process list is pretty slim.
  • ENTRY_ID: each document on your site has an entry_id that doesn’t change with updates. Think of it like a UUID.
  • NEW_DOCUMENT: the filename of the new document to store on Google sites.

When you’re first getting started with sites_uploader you can also use the –list option to get a list of all the documents on the site and their entry_id values. Here’s what a simple session might look like:

patrick@wallaby$ python sites_uploader.py -s "dummy" -d "wagstrom.net" -u "patrick@wagstrom.net" -p "PASSWORD" --ssl --list
["attachment.png", "attachment", "https://sites.google.com/feeds/content/wagstrom.net/dummy/9999384153430219999"]
["Home", "webpage", "https://sites.google.com/feeds/content/wagstrom.net/dummy/9999953700077559999"]
["files", "filecabinet", "https://sites.google.com/feeds/content/wagstrom.net/test/9999182398032899999"]
patrick@wallaby$ python sites_uploader.py -s "dummy" -d "wagstrom.net" -u "patrick@wagstrom.net" -p "PASSWORD" --ssl https://sites.google.com/feeds/content/wagstrom.net/dummy/9999953700077559999 home.html


This examines the site “dummy” under “wagstrom.net” by first listing all the documents, of which there are three: a png file called “attachment.png”, a webpage called “home”, and a filecabinet called “files”. We note the id of the webpage called “home” and wish to replace its content with that of home.html. When an operation is successful it prints out nothing.

The beauty of using id’s is that they don’t change, so once you look them up and put them into your Makefile, you’ll never need to change them again. The other nice thing about Google Sites is that it sanitizes your HTML, so you can feed it a complete HTML file and it is smart enough to just take the part that belongs in the body of the document. Pretty neat stuff indeed.

The code for the tool should be pretty straight forward, but folks have questions feel free to email me and I’ll attempt to answer.

Open Source Predictions for 2010

It’s become the in-vogue thing for experts on various issues to pontificate on events for the coming year.  As I’m now a Dr. and I received my doctorate for studying the socio-technical aspects of software engineering communities, I feel like I’m legitimately qualified to put forth predictions about Open Source, Free Software, and web technologies. Putting a prediction in this list does not necessarily mean that I want it to happen, merely that I think there is a significant chance it will happen.   I’m not an insider on any of these issues, but here’s what I think will happen in 2010, in rough order of likelihood:

  1. The Eclipse Foundation will undergo a major upheaval related to community/corporate structure and governance.This follows from some of the results of my Ph.D. research and also from more recent posts from Bjorn Freeman-Benson.  Much of this will arise from the structure of Eclipse as a 503c6 trade association and the lack of individual participation relative to commercial control of development.
  2. Highly customized hardware running on Open Source will continue to under-perform. In particular, OLPC will continue to be a yawn with most people questioning the long term value of the project over basic needs, resulting in no new deployments of more than 150,000 units and Litl will be too different for most people to understand and will fold.
  3. Despite the pleas of Open Source users and zealots, Netflix will continue to be inaccessible from open platforms. This isn’t a Netflix issue, it’s related to the content providers who want their content to be protected.  Until Open Source can provide a suitable way to protect this content it will continue to lag behind. Therefore, for the most part, Netflix will continue to require Silverlight for streaming.

    As Netflix becomes more dominant in the streaming market this may lead to some firms claiming that Netflix is abusing it’s power as a market leader by not providing open access to their streams. These people will have a fundamental misunderstanding of both the cause for Netflix requiring Silverlight and the duties of a market leader in an unregulated market.

    While this would be awesome, it ain't gonna happen.

    While this would be awesome, it ain't gonna happen.

  4. The Boxee Box will be greeted with lots of nerd hype, but will ultimately lose out to more flexible HTPC solutions based on Intel’s Pinetrail and nVidia’s Ion 2.This is largely related to the previous two predictions, but addresses broader issues of media convergence. I’m excited about the Boxee Box, but I’ve seen too many great devices fail because of various problems. Without a huge marketing blitz and without the ability to work as a Windows Media Center extender or record live TV, the beautiful Boxee Box will underperform as nerds op for a faster, more open platform.
  5. Google will suffer a major privacy leak and be very very quiet about the issue. Some will say this will hurt Google, but in the long run nothing will change. Google is too careful and conscientious about it’s position to allow this to happen to GMail.  It’s much more likely that this will occur in a smaller service such as Google Voice or Picasa.
  6. St. Ignucious - Harbinger of Doom? (Photo from Michael McCracken)

    St. Ignucious - Harbinger of Doom? (Photo from Michael McCracken)

    Tensions between Open Source developers and the Free Software Foundation will continue to grow, resulting in at least one major community issuing a formal statement about community and inclusiveness that puts it directly at odds with the Free Software Foundation. We’ve seen some signs of these issues, particularly in the GNOME community, which expressed outrage over Richard Stallman’s actions at the Gran Canaria Desktop Summit and also in a series of messages on various GNOME mailing lists in December. I’m not certain if it will be GNOME, but they are the community I’ve followed most closely on these issues.

  7. GNOME will encounter difficulties related to the foundation’s reliance on corporate sponsors and the rise of alternative platforms. GNOME was making excellent progress in the mobile space, especially with the maemo platform.  Unfortunately, Android may be a force too powerful for GNOME to compete against. Some of this loss will be offset by the increasing interest in netbooks and cloud based services, however such gains will be short lived as Android moves into netbooks and Google releases Chrome OS.
  8. Ubuntu will remove at least one major application in favor of directing users toward a cloud based solution. I have no idea what application this might be.  For end users I could easily see them removing Evolution, however it is too entrenched in the desktop at the enterprise level.  My best bet is that the dumping of GIMP for F-Spot will go poorly and Ubuntu 10.10 will direct users to an online application for photo editing and management.
  9. The issue of ownership rights of open source software that originated with one author, and was later updated by another who progressively removed all of the original author’s code will come to the forefront resulting in threats of, but no actual, legal challenges.We’ve seen the beginnings of this with Bruce Perens and Rob Landley spatting over BusyBox.  I’d imagine that the future challenge will come as the result of a commercial firm using an Open Source project as scaffolding to build a new project, eventually removing all the original Open Source software and then changing the license to something decidedly less open.
  10. Commercial ventures will continue to shy away from the GPL in favor of more standard Apache or BSD licenses. At least one major project/community will begin the process of transitioning to more business friendly licensing. We’ve seen how more open solutions (I consider the BSD to be more free than GPL) have been gaining traction, especially in the web frameworks market. It’s only a matter of time until a community begins to recommend that all projects adopt something like the BSD.  Despite this success, trolls will continue to insist that BSD is dying.

I’ll revisit these predictions as we turn the calendar to 2011. Given the probability estimates in my head I’d say that five of these will come true with my 95% confidence interval between 2 and 7. However, this is my first attempt at prognostication, so go easy on me in December 2010 when I review these, okay?

What the TSA has is a failure to communicate

On Christmas day a young Nigerian man boarded a flight from Lagos to Amsterdam and then later boarded a flight from Amsterdam to Detroit.  About an hour outside of Detroit, as the plane was descending over Canadian airspace he decided that his religion would martyr him if he blew up the plane using a slab of pentaerythritol tetranitrate (aka PETN) taped to his crotch.  Unfortunately for him and fortunately for the plane’s passengers the igniter failed and he merely started his crotch on fire and probably blew up his testicles.

As expected, the TSA felt a need to respond to yesterday’s threat today.  Apparently it is now a threat to have more than one carry on, electronics in the main cabin on international flights, anything in your lap, and to go to the bathroom within one hour of the flight landing. I’ll leave the sanity of these for other people.  What I’m interested in is the way that the TSA shared this information.

The primary way that travelers are told to stay up to date on the ever changing regulations regarding flight security is to visit the TSA home page.  Here’s a screen capture of the home page from December 27th, more than 48 hours after the incident on Northwest 253:

TSA.gov website on December 27th

TSA.gov website on December 27th

The TSA has a statement on flight 253, but there is nothing about the new flight restrictions, despite the fact that I’m sure that I, and thousands of other people, were concerned about how their flights home may be impacted.

The digerati may have also checked Twitter for information.  The TSA doesn’t have an official Twitter feed, but their blog team has a Twitter feed.  Here’s a screen capture of the Twitter feed taken at the same time:

The TSA's Twitter Stream on December 27th

The TSA's Twitter Stream on December 27th

There is a link to an article on the TSA’s blog about Northwest 253, but the article is merely the official statement and contains nothing about the changing regulations.  For more than 48th hours passengers were left to wonder about the changing regulations and how these regulations impact travel in the holiday season. The primary source of the changing regulations was a message on Air Canada’s web site.

Finally, late on the 27th the actual text of the regulations was finally made available, but not through an easy source on TSA’s website.  Instead, we’re forced to go to Christopher Elliott’s full text of the directive.

The administration has been quiet on the issue, but I believe that is more by design.  There’s no need to send everyone off to Sunday morning talk shows because a rich Muslim decided to blow his own balls off.  But what isn’t acceptable is the long delay and horrible communication regarding the new restrictions for travelers.  There’s lots more that could be written on this, but I’m trying to minimize editorializing in this article.

Will hubris kill MySQL?

Round and Round They Go!  Where They'll Stop, Monty Knows!

Round and Round They Go! Where They'll Stop, Monty Knows!

Over the weekend the twitterwebs lit up with the #savemysql tag in response to a plea from Monty, the founder of MySQL. The basic gist of Monty’s plea are as follows:

  • MySQL is a critical piece of infrastructure
  • Thousands of businesses use MySQL in functions beyond the web
  • Oracle sells a variety of competing databases
  • Oracle doesn’t have the best track record with Open Source
  • Therefore, Oracle should have to make a set of promises related to the commercial viability of MySQL.

The first four are all facts that are not in dispute.  The final point is Monty’s conclusion from the facts, that, in and of itself, doesn’t sound all that bad. If you’re the European Commission, of course you’re going to be concerned if a major tool for thousands of small and medium enterprises goes by the wayside. This would force those companies to spend untold amounts of time and money to procure another database — one most likely provided by an American company. Therefore, the merger between Sun and Oracle could result in a large amount of money flowing out of Europe and into the hands of the United States.

However, if we examine this issue a bit deeper, we see while Monty’s facts are true, the conclusion is completely unfounded.  First, the license that MySQL is distributed under is a dual license.  Anyone can download, modify, and distribute modifications the the GPL version of the database.  One the code is available under the GPL it’s forever available under that same license.  Sure, the copyright holder can change the license, but that doesn’t nullify the GPL versions of the software.  So, in that sense, MySQL cannot die from the Oracle acquisition of Sun any more than it could die from when Sun purchased MySQL AB. The dual license portion is where many people got bit by the whole thing.  Many companies believed that when they purchased a license from MySQL AB or Sun to use MySQL for purposes not covered under the GPL that they  were still getting an Open Source project.  The fact of the matter is that they were not.  The dual version of MySQL may be supported by the same tools as the Open Source version and may sound the same on the surface, but the license doesn’t guarantee the companies access to updated versions of the software.  What they were purchasing was not much different than a proprietary commercial piece of software that also has a free version.

For Monty to act like this is some travesty is entirely dishonest.  Of course he knew that this was the situation the whole time, he is the one who structured the MySQL licensing.  While there are a variety of companies that work in Open Source and made money from their expertise in the product area, performing consultations and developing custom extensions, MySQL AB went a step beyond this and also allowed firms to license the software out of the GPL.  This was because while other databases and packages were licensed under more business friendly licenses, the LGPL for JBoss and BSD for PostgreSQL, MySQL AB licensed MySQL under the GPL to create this business opportunity.  There’s nothing wrong with that, but it has always made MySQL adoption rather dicey inside of some enterprises because even the connectors for MySQL database access were GPL’d, making any program that used those connectors GPL.

One might argue that this licensing scheme is what made MySQL so valuable.  It was a critical piece of infrastructure and revenue could be derived from the dual licensing, something that while possible with PostgreSQL, is not nearly as easy.  Sun thought this too and they purchased MySQL for about $1 billion.  At this point I’m sure that Monty was now laughing all the way to the bank.  Then, oddly enough, Monty left Sun within seven months of the purchase.  Publicly stated reasons were his disagreement with Sun wanting to offer commercial extensions to some of the core of MySQL.  In February of 2009, about five months after leaving Sun, Monty announced the creation of Monty Program, a new software firm that oddly enough, seems to do exactly what MySQL would have been doing if they didn’t offer the dual license option.

However, now something was different, after the sale of MySQL AB to Sun, Monty no longer owned or had the rights to the source code.  That’s what the $1 billion in compensation was for.  Rather than being like other entrepreneurs and going into a different venture, Monty started nearly the exact same business he had before, only with a no rights to the source code.  Therein lies the problem for Monty and his call for the EC to block the Oracle purchase of Sun.  Doing so would hurt his business.  It would hurt his business because of the license that he put the software under in the first place.  It would hurt other businesses because of the ruse that he managed to play on business that didn’t completely understand Open Source licensing, and he took little effort to explain what they were purchasing.

Yes, a purchase of Sun by Oracle could hurt thousands of small and medium enterprises that rely on the dual licensed version of MySQL, but so will the continued downfall of Sun.  If Sun goes out of business it could put the code in legal limbo, leaving no one able to get a license for the code.  Furthermore, it’s hurting the MySQL community right now.  With Sun being in limbo right now, I’m sure that they’re not winning many new customers — businesses hate uncertainty, and there isn’t much greater uncertainty than wondering if your supplier is going to be independent or not in the next few months.  This whole thing smacks of selfishness and I truly hope that the EC sees through this ruse and the all the letters they get from the internet fanboys.

If that doesn’t work, then Michael Meeks has helpfully pointed out that Sun provides an LGPL connector to MySQL, but no one has actually requested the source code for it yet.  Just sayin’…

Wouldn’t it be great if there was a StackOverflow for that?

Since I started on my “seekrit project” I’ve asked a few questions on programming related topics at StackOverflow. Thus far I’ve found the answers to be really helpful. While most people know about StackOverflow, lots of folks don’t realize that there three other sites in StackOverflow empire: ServerFault – for system administration issues, SuperUser – for end user software issues, and Meta.StackOverflow – for site related questions about StackOverflow.

The Four Horsemen of the StackOverflow Empire

The Four Horsemen of the StackOverflow Empire

Today during my run in Central Park, I was thinking about the community that StackOverflow has. It’s really a remarkable thing, there are thousands of users and most posts in even the most obscure topics get lots of views quickly. If you’re lucky, you’ll even get a good answer or two — or, more likely, you’ll have someone who is smart tell you that you’re not asking the right question, and help you down the right path to rethink your problem in the correct way. This sort of input can be really helpful in many other domains, parenting, cooking, travel, etc.

It’s actually possible to run your own StackOverflow site. The software, StackExchange is available from FogCreek software who offer different hosted plans. At $129/mo for the most basic plan it’s not cheap at all, but when you consider that includes everything, it’s not a horrible idea either. It’s been a pretty decent hit so far too, the StackExchange website lists numerous sites around specific topics.

So here’s the sites that I’d really like to see:

  • NYCLikeALocal.com: Questions and answers about the self-proclaimed greatest city in the world. With a focus for people actually living in the city who may not be overly familiar with the city. In other words, folks like me who live in NYC, but aren’t native.
  • RunThroughTheWall.com: Questions and answers about training for marathons and other long distance endurance sports, like IronMen triathlons.
  • SnackExchange.com: Ideas for cooking, specifically desserts. Everyone loves deserts.

After browsing through the sites, I found out that in each of the cases someone had already beat me to it. For NYC local information there is PojoCity.com. For running someone created FitBulb which covers all sports. Unfortunately, they weren’t on the ball and let a squatter get FibBulb.com. For cooking there is cooking.stackexchange.com, which probably the least branded StackExchange site in existence.

While I’ve contributed to FitBulb, most of the sites are rather sparse. This could be for a variety of reasons, it’s likely that they haven’t advertised themselves enough. StackOverflow launched and was hugely successful because Jeff Atwood and Joel Spolsky both had legions of followers on their weblogs. When the site was launched there was already a huge pool of users. By mentioning them here I’m hoping that other people will use these three StackExchange sites and help build mass.

The second reason why the sites may have difficulty in excelling is because they’re not generally perceived as a need. For programming there was a definite need for a site where people could ask questions and get reasonable answers. StackOverflow was aimed dead center at destroying ExpertsExchange, both by providing better answers, more information about people who provide answers, and not seeming all scammy. From the looks of it, they’re succeeding.

Stackoverflow will soon kill Experts-Exchange.  And the world will be a better place.

Stackoverflow will soon kill Experts-Exchange. And the world will be a better place.

However, most of these other domains don’t have any large community at all, so it’s up to the site developers and maintainers to build one. This is no easy task. I’d like to think that I’m pretty savvy, but I haven’t seen much about communities for any of these topics.

Finally, there’s a small problem with the architecture of StackExchange. I would like to vote up some posts and answers on FitBulb, but I can’t do that until I get 15 karma. Unfortunately, when no one has any karma on the site, it’s difficult to build enough karma. So, if the site admins hear me, vote me up some so I can support other people on your fledgling site.

In any case, it’s exciting to see that there are StackExchange sites for the three sites that I really hoped would exist. I’ve already posted a bit to FitBulb and have found some useful stuff on Pojocity, but remember, community sites are only as good as the people who use them. So visit the sites and give them some love today.

Textile Markup in Trac

For a new secret project I’ve been putting together an installation of Trac to provide a web interface for task management. I found myself wanting to create a nice looking table within a Trac wiki page to document different dependencies needed by the software. In a normal installation there are three different ways to do this:

Trac’s Wiki Formatting

This is probably the simplest option, however it’s also the most limiting, there is no way to provide additional styling to elements and you can’t create columns that span multiple rows or columns.  The lack of additional styling is particularly difficult because you can’t correct it easily with CSS without modifying all of the tables.

1
2
3
4
5
6
7
8
|| '''Client Side Packages''' ||
|| '''Package''' || '''License''' || '''Purpose''' ||
|| [http://jquery.com/ jQuery] || [http://docs.jquery.com/Licensing MIT or GPL] (choice) || General Purpose javascript ||
|| [http://raphaeljs.com/ Raphaël] || [http://raphaeljs.com MIT] (see sidebar) || Vector graphics library ||
|| '''Server Side Packages''' ||
|| [http://www.djangoproject.com/ Django] || [http://code.djangoproject.com/browser/django/trunk/LICENSE BSD] || Server framework ||
|| [http://pinaxproject.com/ Pinax] || [http://github.com/pinax/pinax/blob/master/LICENSE MIT] || Server social components ||
|| [http://www.sqlite.org/ SQLite] || [http://www.sqlite.org/copyright.html Public Domain] || Server database ||
Table formatted using Trac Wiki Processing

Table formatted using Trac Wiki Processing

HTML

HTML is of course, the go to for all of this.  However, the whole reason I started looking at this is that I didn’t want to embed a huge chunk of HTML into my wiki page.  I know I can do all the formatting I want, but this really isn’t an option.

ReStructured Text

ReStructured Text is the preferred markup language of Python documentation.  It’s more robust than Trac Wiki formatting, especially for tables.  The processor for ReStructured Text in Trac is smart enough to tag tables with the class ‘docutils’ which helps out immensely when trying to style the documents.  This allows for the creation of some prettier tables.  It also allows the creation of header columns and multi-column/row cells.  However, it’s not great for this sort of general purpose document, for two reasons.  First,  it doesn’t allow for a header row to appear in the middle of the table.  I’d really like these to be part of the same table.  Secondly, you can’t have multiple short links with the same name.  So, I can have the name ‘MIT’ link to two different pages without doing implicit links.  However, this introduces a new third problem, which is that table cells are proportional in width to the RST equivalent.  Here’s the sample table structured in RST:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
+--------------+----------------------------------------------------------------------+----------------------------+
| Client Side Packages                                                                                             |
+--------------+----------------------------------------------------------------------+----------------------------+
| Package      | License                                                              | Purpose                    |
+==============+======================================================================+============================+
| jQuery_      | `MIT or GPL <http://docs.jquery.com/Licensing>`__ (choice)           | General JavaScript library |
+--------------+----------------------------------------------------------------------+----------------------------+
| Raphaël_     | `MIT <http://raphaeljs.com/>`__ (see sidebar)                        | SVG/VML Library            |
+--------------+----------------------------------------------------------------------+----------------------------+
 
|
 
+--------------+----------------------------------------------------------------------+---------------------------+
| Server Side Packages                                                                                            |
+--------------+----------------------------------------------------------------------+---------------------------+
| Package      | License                                                              | Purpose                   |
+==============+======================================================================+===========================+
| Django_      | `BSD <http://code.djangoproject.com/browser/django/trunk/LICENSE>`__ | Server framework          |
+--------------+----------------------------------------------------------------------+---------------------------+
| Pinax_       | `MIT <http://github.com/pinax/pinax/blob/master/LICENSE>`__          | Server social components  |
+--------------+----------------------------------------------------------------------+---------------------------+
| SQLite_      | `Public Domain <http://www.sqlite.org/copyright.html>`__             | Server database           |
+--------------+----------------------------------------------------------------------+---------------------------+ 
 
.. _jQuery: http://jquery.com/
.. _Raphaël: http://raphaeljs.com/
.. _Django: http://www.djangoproject.com/
.. _Pinax: http://pinaxproject.com/
.. _SQLite: http://www.sqlite.org/

Unfortunately, this results in tables that are unnaturally wide and must be split into two different tables.  While the result is passable, especially after throwing down some CSS to help out and collapse those ugly borders, the width of the tables makes them look highly awkward, and therefor unacceptable.

Table output using ReStructured Text processor

Table output using ReStructured Text processor

Markdown

Markdown is the hotness markup language that all of the web 3.0 kiddies like to use. And why not?  It’s from John Gruber, macboy extraordinairre!  Markdown is designed to be wholly readable without people needing to know how to speak HTML.  It’s quick and easy to use.  There’s a pretty good markdown processor available for Trac.  However, there are a few shortcomings.  First, by default Markdown has no support for tables, so you’ll need to drop back to HTML for the formatting.  This is less than ideal as it’s messy again.  Furthermore you cannot use Markdown inside of an HTML table.

Fortunately, there are some extensions to Markdown that provide support for tables.  As is normal for this sort of community, it’s been further extended to allow for all sorts of crazy formatting and alignment issues.  However, as near as I can tell it still does not allow cells to span multiple columns/rows.  Or, if it does, the Markdown plugin for Trac doesn’t support it.  I didn’t really get far enough to generate anything really interesting with Markdown.

Textile

In the world of Web 2.0 fanboys, the other competing markup, favored by the Ruby guys and folks at GitHub, is Textile.  It’s slightly similar to Markdown, in that it aims to be fairly readable, but advanced features let you do neat stuff at the expense of some readability.  Fortunately, using Textile I was able to make the table look exactly like I wanted to.  Although Textile allows you to use multiple header rows, I did have to apply some simple CSS styling to the header columns and rows.  Here’s the Textile I used.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
table(clean).
|_\3. Client Side Packages                                                 |
(bbot). |_. Package |_. License                |_. Purpose                 |
| "jQuery":jq       | "MIT or GPL":jql (choice)| General purpose javscript |
| "Raphaël":ra      | "MIT":ral (see sidebar)  | Vector graphics library   |
(bbot btop). |_\3. Server Side Packages                                    |
| "Django":dj       | "BSD":djl                | Server framework          |
| "Pinax":pi        | "MIT":pil                | Server social components  |
| "SQLite":sq       | "Public Domain":sql      | Server Database           |
 
[jq]http://jquery.com/
[jql]http://docs.jquery.com/Licensing
[ra]http://raphaeljs.com/
[ral]http://raphaeljs.com/
[dj]http://www.djangoproject.com/
[djl]http://code.djangoproject.com/browser/django/trunk/LICENSE
[pi]http://pinaxproject.com/
[pil]http://github.com/pinax/pinax/blob/master/LICENSE
[sq]http://www.sqlite.org/
[sql]http://www.sqlite.org/copyright.html

And here is the actual output from Trac after rendering the Textile (after a little bit of help from some CSS:

Table output using Textile processor

Table output using Textile processor

Trac Textile Macro

To get this far I needed to write a simple little Trac processing macro for Textile. Using information from the excellent Markdown Macro I was able to hack together a simple Textile Macro for Trac.  Before installing it you’ll need to install the Python Textile libraries.  Then, you can clone the git repository from git://github.com/pridkett/tractextilemacro.git.  From here you can install it like a normal Trac component.  Enable it in your trac.ini using these lines:

1
2
[components]
Textile.* = enabled

Now, just embed your Textile code in a block like this and you’ll be on your way:

1
2
3
4
5
{{{
#!Textile
 
Hello from the __magical__ land of "Textile":http://textile.thresholdstate.com/!
}}}

Shutting Down PennAve

Several years ago I created a nifty little photo gallery webapp for F-Spot users called PennAve. PennAve was designed to be incredibly simple in its use: Tag a set a photos in F-Spot as “Public”, copy the database to your web server, and boom, those photos would be displayed. It was quite simple and just worked, well for the most part.

However, in the design of PennAve I made a couple of critical errors and since that point the marketplace has changed significantly.  Some of these are issues that I should have be more aware of, other ones I couldn’t have anticipated.

Issue 1: XML? What was I thinking‽

At what point did I think that code like this would be a good idea?

At what point did I think that code like this would be a good idea?

At the time I was all hyped up on the XML goofballs and believed that making the whole thing styleable using client side XSLT would be an awesome idea.  It was just really neat that I could send an XML document and Firefox could transform it.  Think of the other clients I could use to slurp that data with little effort! In retrospect, trying to do any sort of significant styling using XSLT makes me feel like I’m slowly driving an ice pick into my head. The clean syntax of XML gives way to the disaster of XSLT, templates, variables, and overly verbose syntax.

Issue 2: Strange dependencies and poor documentation.

PennAve required things like CherryPy and SQLObject (for a while it even required a SVN version of SQLObject, scary, eh?).  Folks know how to install a PHP script, they can probably figure out how to run a Django app.  Trying to explain to everyone how to compile mod_wsgi so they could run PennAve in the web server, or use mod_proxy to redirect to CherryPy’s internal web server was a big pain.  I sought to make it really easy to create your galleries, but the installation bordered on insanity for some users.

Issue 3: Slow development of F-Spot.

I don’t use F-Spot that much anymore.  I have most of my collection tagged and edited (more than 800 different tags), so I just pop in when I need to import some new photos.  However, while F-Spot was pretty good four years ago, its development is slow enough that numerous other solutions make it look ancient.  The tools for editing photos are slightly substandard. It doesn’t have face detection. It is sometimes prone to crashing. I’m not trying to blame Stephane Delcroix (the maintainer of F-Spot) for these problems, but that’s the reality of a project that only has a small community.  Perhaps these will change with Ubuntu’s decision to make F-Spot the default instead of GIMP, but that’s a little uncertain.

Issue 4: Cheap online storage

Google recently announced that you could get 20GB of storage for $5/yr.  That’s cheap enough that you can easily manage your entire collection online.  $20 for 80GB of storage without having to worry about bandwidth and server costs is a godsend.  Reduction in prices for Flickr also affect this.

Issue 5: The Awesomeness of Face Detection

Not even my ugly mug could break Picasa's face detection!

Not even my ugly mug could break Picasa's face detection!

F-Spot has a default category of people where you can create a tag for each person.  However, this merely associates the photo with a person, not an area of a photo with a person.  Using face detection you can find people in all your photos, many times automatically.  For example, I’ve found cases where I had photos of friends from before I became friends with them.  When I put these photos in F-Spot I didn’t tag those photos, but now I’d like to do that.  Face detection takes care of that with no interaction from me.

Issue 6: Geotagging

Once again, F-Spot has a default category of places.  I carefully created a hierarchy of locations so I could quickly find all of my photos taken in Pittsburgh, or Pennsylvania, etc.  However, it’s much better if photos can say exactly where they’re taken, which is where geotagging is helpful.  Photos from my iPhone already are geotagged and the integration of Picasa with Google Earth and Google Maps makes it dreadfully simple to tag all of my other photos.

Independently none of these issues was bad enough to cause me to stop development, but in sum, they’re huge.  I never started PennAve to get hordes of fans, mainly I needed a way to show my own photos and if a few other people might find it useful, that would be great.  I’m pleased that a handful of other people have found PennAve useful, but times change and it’s time to move on.  Therefore, I’m declaring officially that I’m stopping development on PennAve.  I’ll keep the site up for the reasonable future, but I won’t make efforts to update the software or ensure the website stays operational indefinitely.  Thanks for all the help from everyone!

Ohio LinuxFest 2009

This past weekend I left New York City and traveled to Columbus, OH for Ohio LinuxFest 2009 (OLF). Unlike many shows, such as LinuxCon and the now defunct OpenSourceWorld, OLF is entirely community run.  That means you’ll notice a couple of different things: it’s much cheaper, there’s a wider range of attendees, and while many top name speakers attend, you’ll also get a smattering of other folks making their debuts on the conference circuit.  Such was the case for me.  OLF is where I presented the talk “Be A Wonk” that discussed how policy and law get made and what we can do as geeks to influence these issues.

As promised, I’ve posted my slides as the original OpenOffice.org format and also have posted a PDF version of my slides, which is decidedly less sexy.  To make things even easier, here’s a copy of the slides on SlideShare.  At some point I might record some dialog to go along with the slides, but for now this is what you get.

In previous years I may have been a bit too critical about some of the talks at OLF, there would usually be a few periods during the day that I couldn’t find anything good to see.  I’m pleased to say that was never the case this year.  Among an awesome schedule of talks there were standout talks from Paul Cutler about GNOME 3.0 (GNOME Shell could be some new hotness), Jorge Castro about how to fail at building a project, Tom Calloway about licenses for projects, and Mackenzie Morgan about how to handle translating between package management.

The keynotes were both excellent.  Shawn Powers did a great job of setting up the conference in the morning, despite having lost his slides on the flight in. Although there was some Microsoft bashing in his talk, there was also plenty of realistic Linux bashing, the subject of which will be a future post.  He addressed some of the failures of Linux and people selling Linux machines (GMA 500 anyone) and also encouraged us to be honest about the faults of Linux, rather than just glossing over them.  The conference closed with Doug McIlroy — the man who invented Unix pipes — describing some of the problems with sophisticated software and how often we just need to approach the problem from a different aspect rather than adding in extra complexity.

I have to say that OLF 2009 was a smashing success and much better than previous years.  This was my fourth time attending and the conference has certainly evolved — the team that plans and runs the conference it is much more polished and organized than in years past and we’re starting to get some great PR from the conference.  There’s a solid pool of commercial sponsors along with numerous community based projects that return year after year. It’s good to see an all volunteer event thriving.

Going forward, there are good things in store for OLF.  OLF has already put itself in a position as one of the most gender and ethnic friendly conferences in Open Source, thanks to their commitment to get women and other minorities represented.  I’m pleased that many of the key organizers were women as were some of the great speakers.  After the main conference they also held a workshop on diversity in Open Source — which addressed many topics regarding participation in OSS, one of which is why only 2% of OSS developers are female, but nearly 50% of professional software developers are female.

If you can make it to Columbus on the last weekend of September 2010, I highly recommend attending OLF 2010, as I’m certain it will be another great community organized conference.

<div style=”width:425px;text-align:left” id=”__ss_2101534″><a style=”font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;” href=”http://www.slideshare.net/pridkett/be-a-wonk” title=”Be A Wonk”>Be A Wonk</a><object style=”margin:0px” width=”425″ height=”355″><param name=”movie” value=”http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=olf2009-090930231128-phpapp01&rel=0&stripped_title=be-a-wonk” /><param name=”allowFullScreen” value=”true”/><param name=”allowScriptAccess” value=”always”/><embed src=”http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=olf2009-090930231128-phpapp01&rel=0&stripped_title=be-a-wonk” type=”application/x-shockwave-flash” allowscriptaccess=”always” allowfullscreen=”true” width=”425″ height=”355″></embed></object><div style=”font-size:11px;font-family:tahoma,arial;height:26px;padding-top:2px;”>View more <a style=”text-decoration:underline;” href=”http://www.slideshare.net/”>presentations</a> from <a style=”text-decoration:underline;” href=”http://www.slideshare.net/pridkett”>pridkett</a>.</div></div>

Antifeatures in Free Software

The Free Software Foundation has, rightfully in most cases, alerted computer and technology users to the problem of “antifeatures“. Briefly, an antifeature is when a program does something to offer a lesser experience that took significant effort to accomplish. Often times antifeatures are used as differentiating elements between different versions of a product. When you buy the lesser version everything is still there, it’s just that there is a small amount of code that disables part of program. Code that had to be written by a software developer and cost money to implement.

Common examples of antifeatures are software packages that arbitrarily the number of remote connections (e.g. Windows Server and Apple Remote Desktop) and how inexpensive digital cameras automatically process RAW image data to JPEG — the latter is an example from the Free Software Foundation that may not hold up because of the cost in allowing a user to access the image processing pipeline in the middle, although the CHDK hackers have made good headway.

With the Free Software Foundation so adamantly anti-antifeature, you can imagine my surprise when I discovered GNU IceCat. Not only is IceCat a duplication of effort of Debian’s IceWeasel software — both are versions of Mozilla Firefox that lack the registered trademark of the Firefox branding, something that requires a usage agreement (note: IceCat originated before IceWeasel and originally was named IceWeasel, however the name was ceded to Debian when the latter became more dominant and higher profile). To be fair, Mozilla has little choice in this matter. You’re legally obligated to defend a registered trademark or you can lose it — a law that is fundamentally at odds with Free software. However, what is surprising about IceCat is that rather than providing the full access to the plugin ecosystem for Firefox, IceCat has chosen to provide access to only those plugins that are also Free software.

Mozilla Firefox vs. GNU IceCat vs. Debian IceWeasel
It’s like there’s a nerd fight and no one cares!

This is clearly an antifeature. Somewhere out there in Internetland a programmer spent time to code a feature in IceCat that restricts the ability of a user to download all the plugins for Firefox. While this is very much in line with the Free Software Foundation’s stance on having all software be Free, it’s not good from a user choice standpoint, and is clearly an antifeature that is wholly different than the primary reason for the project — avoiding trademark issues with the Firefox branding.

I’m not attempting to say that IceCat is a bad project — I don’t use it, but I can understand why it is there — it serves as a bit of protection should something crazy go on with the trademark agreement for Firefox. Furthermore, the Free Software Foundation is perfectly free to create a parallel plugin ecosystem, but they should be entirely clear about it — in this case Free software isn’t about choice or freedom from licensing agreements, it’s about their ideology and how they can create a tool to enforce their worldview.

How to Handle a Bug

It’s no secret that almost all software has bugs. Even if you are lucky enough to understand how to formally verify a program, odds are that it won’t work for your program. It’s just far too difficult. Rather than eliminate all bugs, which is next to impossible to do, software engineers have sought ways to minimize the number of bugs present in a system and ensure timely responses when bugs are discovered in the field.

For a software engineer this typically means a suite of test cases. Each one is a carefully constructed situation that can be programatically reproduced. When a defect is found in the system, a test case is written that exercises the defect. Before future versions of the software are released the test cases must all pass. In rare cases you’ll find a team using full-on test-driven development where test cases are written before code is written for new functionality, but the basic process is the same.

From the end user perspective, this usually means patches for the software. In Windows these are often delivered via Windows Update or Office Update, although a myriad of programs have other more annoying ways to tell you that they need to be upgraded. In Mac OS X there is the software update feature that checks for major updates. Within Linux there is apt, or another equivalent tool that checks for updates on almost all software on the system. Why Mac OS X and Windows only check for updates on apps provided by the OS manufacturer and not for the whole system, as Linux does, could be the subject of another whole article.

What happens when a bug slips through the cracks? When happens when a bug is known, has been reported, is routinely experienced by hundreds or thousands of individuals, but doesn’t get patched through these methods? The result is an exercise in frustration that will leave many users searching for a new solution.

I’m going to illustrate this problem with a bug that I recently encountered. The time was around 1am on the Monday morning of Labor weekend. Papers were due at 6am for a major international conference, and I was wrapping up final revisions so I could go to bed. We were using Microsoft Word 2007 to create an academic paper that had many equations present. Submissions to the conference were required to be in PDF form, for which I typically use PrimoPDF in Windows, although I realize that Office 2007 has native PDF support. I clicked through to have PrimoPDF create a PDF and this is what I saw:

Those white spaces are where my equations used to be.Those white spaces are where my equations used to be.

That’s interesting, where did all of my equations go? I’m aware that word sometimes has strange settings that change the actual print output by removing things such as graphics, I looked at the print preview and double checked all my settings. Everything looked normal, the equations showed up in the print preview. I proceeded to try about two or three other PDF engines but had no luck in the process. It appeared that my document did not want to retain its equations.

I started to panic with thoughts of having to convert the paper to a different format at this late stage of the game. I googled around and found a variety of forum posts that provided suggestions. The first was simple enough, just use the native PDF support in MS Office 2007. I clicked through and I had a PDF of my document, but the quality was less than stellar, especially when on viewed on a Mac.

It seems like those curves are made up straight lines, almost as though a kindergarten student cut them out of construction paper.It seems like those curves are made up straight lines, almost as though a kindergarten student cut them out of construction paper.
Windows is on the left, Mac is on the right.  Here I though that PDF was supposed to look the same everywhere! Silly me.Windows is on the left, Mac is on the right. Here I though that PDF was supposed to look the same everywhere! Silly me.

The problem results from the fact that Office 2007 won’t embed all of the fonts into the PDF document, rather it will try to provide alternative options or convert the fonts to a bitmap format if it can’t. Why won’t it embed all the fonts? Concerns about licensing — basically unless Windows can verify it has a license to redistribute the font in the PDF file, it won’t embed the font and you’ll be left with something less than awesome.

The next suggestion was to simply save the doc as an Office 2003 document. The solution said that the equations wouldn’t be editable anymore, but that they’d still show up. That was fine as I was done with paper and just needed a one-off version to submit to the conference. Unfortunately, rather than providing something like Wikipedia, which renders the equations to a high resolution PNG file and then embeds the image in a web page, MS Office converts the equations so they’re already pre-blurred from having drank too much. Once again, this was not going to work for academic work.

Can you spot the equations?  They look like they were rendered on a Hercules graphics card.Can you spot the equations? They look like they were rendered on a Hercules graphics card.

Although those two options “worked”, they were what are termed “workarounds”, and not even good workarounds, they were poor workarounds that provided decreased functionality. I work very hard to create high quality academic papers that look good, so I expect my tools to perform the tasks they advertised when I paid for them. I wasn’t going to submit something that looked amateurish.

Digging around some more and getting more creative with my Google query and finally stumped upon the Microsoft Support Page for my bug.  Apparently, this bug only affects systems that run the combination of Microsoft Office 2007 and Windows XP. If I were to use a system with Windows Vista, I should be fine.  Great, my wife has Vista Enterprise on her Tablet PC, so I copy the document over, and attempt to create a PDF of the document.  The equations look just fine, but wait, now it’s not printing half of my images.  Ughh.

I swear I designed both the left and right sides of this figureI swear I designed both the left and right sides of this figure

Finally, I attempt the actual fix for Windows XP.  Apparently the problem is somehow tied up between Windows XP, Office 2007, and right-to-left language support (for languages like Thai and Arabic). The knowledge base article provides helpful instructions on how to install support for those languages.  After following the instructions you’ll get the warning:

You chose to install the Arabic, Armenian, Georgian, Hebrew, Indic, Thai and Vietnamese language files. This will require 10 MB or more of available disk space. The files will be installed after you click OK or Apply on the Regional and Language Options dialog box.

Oh no!  10 MB of disk space?  On my 7mbit DSL line that will take almost 20 seconds to download.  Not to mention it will take up such and small portion of my 500GB drive on my laptop I’m not even going to worry about it.  But then the dreaded message pops up:

OMG!  Are you serious‽‽OMG! Are you serious‽‽

Seriously, WTF? There is no option to download the patch from the Internet.  I see no way that such a small component could be considered a marketable piece of IP, as both Mac OS X and Linux have far superior support for multiple languages and there are no other operating systems that would want to steal this feature.  This left me with the realization that to fix my problem I was going to need to run around my still largely packed apartment in Minneapolis to find a CD that I hadn’t used in years. A CD to fix a bug that has been known about for a long time and is the result of strange interaction between two components, both of which have automatic update features. Did I mention it was now 2:30am?  I shudder to think of my frustration if I had been either 1) at my place in New York, knowing that the CD was in Minneapolis or 2) working on a machine without an optical drive and not in possession of an optical drive around to install the upgrade.  Of course, at that point I probably wouldn’t have worried that much, instead I’d probably download a copy of the CD off the seedier side of the intarwebs.

I’m using this as an example of how not to handle a bug in multiple different ways.  Firstly, the bug seems common enough that MS should have had a test case for it.  It should have been caught in the development process and feature check for the equation editor.  Equations and graphics often go together on the same page, and it’s not just Ph.D. researchers who need them, I remember using equations and graphs on the same page when using Word 6.0 for Window 3.1 to type up lab reports for my high school biochemistry class.  It’s great that MS sells a version of Office for Students and Teachers which has this wonderful bug — at least it gets them to set their software expectations low from the beginning.  Now, a test case may not have caught it before it was released, but I know that the bug has been around long enough that a test case could have been created for the numerous service packs for Office and Windows that have been released since the bug was first found.  Most glaring is the fact that the bug wasn’t even fixed with Windows Vista, it just manifests itself in a new and creative way.  Test driven development may not have caught this, but I can’t help but wonder if it’s part of the build process at Microsoft.

Second, reading the support document on Microsoft’s site makes it sound a little like the Office and Windows teams were left pointing fingers at each other.  The end user doesn’t care what component is broken, they just want it fixed.  They don’t care if it requires an extra 10MB of disk space. They just want their program to function as advertised.  While I’m fine with working random command lines and installing random components, most end users don’t like this at all.  This is a process that should have been fixed by a service patch for either Office 2007 or Windows XP.  Unfortunately, neither team decided it was worth their time and the end user is left to pay the price.

Finally, the solution of requiring the CD reminds me a little too much of 2000 (yes, I’m aware that XP is from 2000, but it still is the best OS from MS until Windows 7 is officially released in a few weeks).  Most computers have gone mobile since 2000 and many have lost the CD drive in an effort to reduce size and power consumption.  Automatic update solutions have made users comfortable with pulling updates over the Internet, so why does this still require a CD?  This is a component that has no independent commercial value and an update to Windows XP could have just as easily forced support on everyone — it eliminates a lot of those boxes you see on web pages written in right-to-left languages too — or at the very least made it a download.

The biggest problem that I see here is that the bug was a complicated one.  I shudder to think of the tester who was responsible for finally figuring out that it was the third component, right-to-left language support, that caused the problems. However, the solution was needlessly complicated and filled with half-solutions that caused just as many problems as they fixed. As software engineers we can do better. Integrated update systems are a nice start — within APT it is possible to say that an update to an application requires an additional update to a different application.  For example, an update to Open Office could signify that it requires an updated version of iconv — a library for translating character sets.  When you update Open Office the new version of iconv would come along with it.  See, it’s not that hard, and it’s something that Linux has been doing right since 1995.