ACM SIGS Style for Zotero

Academics live and die based on references. A variety of tools exist to manage references, from the sadly ubiquitous EndNote, to manually curated BibTeX files, to some people who just type their entries by hand with each paper. A variety of web aware citation management tools are also available, for example Mendeley and Zotero.

For the past few years I’ve been using Zotero and have found it to be wonderful (with a few slight exceptions). It lives inside of Firefox and provides one click functionality to add a reference to my database and synchronize the change across computers automagically. It has good plugins for Microsoft Word and OpenOffice to provide citation management on a level that is similar to what one gets on EndNote. It also supports BibTeX export (with some slight key naming issues).

Unfortunately, almost every journal and conference wants slightly different formatting for their references. Zotero can handle this through the use of style files crafted in the Citation Style Language (CSL). I had noticed that my submission to ICSE 2010 was dinged because I had the citations in the wrong format — apparently I was mistaken and thought they used SIGCHI reference formatting, when in fact they use ACM SIGS reference formatting. Sadly, Zotero doesn’t have a style for formatting ACM SIGS references, until now. While finishing up my paper on supporting stakeholders in enterprise software projects for the CHASE 2010 workshop I decided it would be easier to bite the bullet and just write my own style that fits the specification. I don’t claim that the style is complete, but it seems to work well enough. You can find the style hosted as a GitHub gist, an I’ve also embedded the file below. If you’re using Zotero you’ll need to download the raw file then drag it into Zotero where the style will be installed and you’ll now see ACM SIGS as an option for reference formatting. Feel free to fork it and improve the formatting. In case the embed does not show up, please visit http://gist.github.com/320873 instead.

Dell Inspiron Zino HD and Failure

Around work and with friends I’ve obtained a reputation as the go-to-guy for computer hardware, particular home theater and media center computers. I was really excited about the new Dell Inspiron Zino HD, a tiny little box that can pack a dual core Athlon processor, a couple of gigs of ram, and a decent enough video card to play some games. Starts around $299, and you can get a nicely loaded one for $500. Sounds like a great deal.

A couple of weeks ago I placed an order for an Inspiron Zino HD for my wife and our apartment in Minneapolis. I ordered minor upgrades to make it a bit beefier and faster:

  • AMD Athlon x2 6850
  • 3GB Ram
  • ATI Radeon HD 4330/512MB RAM video card
  • 320GB hard disk
  • Dell 1520 802.11 b/g/n wireless card

None of these require substantial modifications to the device.  Apparently adding in the video card requires a swap from the standard motherboard to one with an MXM video card slot, but that’s it.

Inspiron Zino HD with Blue Lid

My Former Future HTPC

The order was placed on January 31st with an expected delivery date of February 15th thanks to some extra money I paid for expedited shipping.  This was just in time for my monthly visit to Minneapolis where I could set the machine up with Windows 7 Media Center and put it on the wireless network so my wife wouldn’t need to worry about a thing. To surprise her I also ordered her an HD HomeRun and a nice Windows Media Center remote from Amazon. Instead of using Hulu for everything, she would be able to record everything and not have to wait until the next day to watch shows. This is a godsend to our relationship where otherwise I know that I can’t call her between 8pm and 10pm EST on Thursdays because of “30 Rock” and “The Office”.

Toward the middle of last week I started to get concerned.  My order was still listed as “In Production”.  On Thursday, the day the machine would have to ship to make it’s delivery date.  As expected, however, on Friday the website was updated saying that the order was delayed. On Monday afternoon, more than 72 hours after the order was delayed on the website, I finally got an email telling me the order was delayed.  Apparently the tubes are really clogged at Dell.

Yesterday and today I took some time and investigated how many other people had delays and what my other options for.  Some people were advocating the Dell Studio Slim; it looks promising, but it doesn’t ship until the end of March. Wow. Looking around more I found numerous tales of people who ordered their systems as far back as November and still haven’t received them:

Apparently Dell was insistent on giving 1980’s order from TV commercials a good name by shipping products not in 6-8 weeks, but in 12+ weeks. In some cases they were actually in violation of FTC standards for how long it was taking to ship products.  Awesome. I knew I had to do something to stop the pain now.

At first I called Dell just to try and get a refund for my expedited shipping. After about ten minutes I finally reached a human in India who asked me what my problem was. I stated very clearly that my order was delayed and therefore I should get a refund for the difference in shipping costs. I was transferred and put on hold for another 10 minutes until I spoke to a nice lady named Priyanka. Unfortunately, she didn’t understand why this would be a problem and said that they could not change my shipping option now that my system was “in production”.  I told it had been “in production” for the last two and a half weeks, so I’m sure she could. No dice.

I attempted to figure out what that issue was. Rumors on the webternets indicate that there is a lack of black lids for this little guy.  Priyanka didn’t know about that. I asked if she could change my order to a red lid and if things might ship faster, she said that I’d have to pay the difference in prices. I explained that if I didn’t get the red lid that I’d probably just cancel the order, so either they eat the couple of bucks difference in plastic prices, or lose everything from the order.

As you can imagine, Dell chose the latter. Even though my order was “in production” she still allowed me to cancel it. I asked if I would get an email about it, she had no idea what I meant. Finally she said that my order status page would be updated. At the end of the call she asked for my email, I told them that they had it. She said they wanted to update it. I told them my email was firstname at lastname dot net.  She didn’t get it. So I spelled it out p-a-t-r-i-c-k-@-w-a-g-s-t-r-o-m-.-n-e-t.  She explained that my email address should end in hotmail.com or gmail.com. At this point I wanted to cry. I asked them what they had, she read off patrick@wagstrom.net and insisted that it could not be a valid email address. Just shoot me. I thanked her for her time, informed her that the information was correct and hung up.

Five minutes later I had placed an order for a larger, faster, slightly noisier machine from NewEgg. For $200 in savings and not having to deal with Dell, I can deal with a little bit of noise. Unfortunately, the full review won’t arrive until the end of March when I’m back in Minneapolis.

What Qualifies as an Open Source Project?

I spent the last few days at the FOSS2010 Workshop in Irvine, CA. It’s an interesting little workshop that brought together some of the best minds in Open Source research from academia, industry, and practice. The goal was to develop a roadmap for the future of Open Source research.

One issue that constantly comes up is how many Open Source projects exist. People most often point to SourceForge as an indicator of success, with it’s claim on 230,000 projects and more than 2 million users. However, how many of those are really projects.

First, lets look at activity on SourceForge. It’s about halfway through the day on Saturday, and according to the home page here’s a summary of Today’s activity:

Activity as of Noon PST on a Saturday.

If there are actually 230,000 active projects this amount is laughable. I’ll be generous and say that bug tracking activity is less on a Saturday, so we assume that the 12 hours of data represent 1/1,000 of yearly activity, we’re getting less than three bugs per project a year.

I polled some of the other folks at the workshop for a definition of a project and I think we’ve gotten a pretty good idea. First, there is a selection of general project process attributes:

  • Code available under an Open Source license
  • Publicly accessible web page, bug trackers, mailings lists, and version control
  • Governance model that allows outside contribution

The license is actually the only item that is necessary to call a project Open Source, while publicly accessible collaboration resources and a governance model that allows for participation are conditions that are required for any sort of community to emerge.  However, under these terms it still is possible to have a project that is run by a single person and isn’t really open.  I can still create a project, release it under an Open Source license, register it on SourceForge, and say I accept contributions, but it won’t really be Open Source in most people’s definition of the words.

After talking to various other quite intelligent people within the community I’d like to propose the following criteria for calling a project truly Open Source:

  • At least three different contributors
  • Code that can be compiled and run by someone with only moderate skill in the field and without significant external resources
  • A social process that not only allows contribution, but actively mentors new developers
  • Active participation by individuals who write no code

I’ll go over each of these criteria and explain why they’re necessary for a first cut of determining whether or not a project is truly Open Source.

One of the key elements of Open Source development is that it is a collaborative effort by multiple developers.  Advantages include utilizing the expertise of each of the different developers and having many developers looking over the code to ensure to identify bugs.  If a project is just one or two people then it is unlikely that there is always enough overlap of individuals examining code.  Furthermore, it’s unlikely that the strategy process for designing and developing new features needs to be collaborative at all.  Fortunately, this is something that can be easily ascertained through automated methods when mining software communities.

The next requirement is that the project actually be able to compile by people of moderate skill in the field.  This is important because not everyone in the field of interest is going to be an expert.  If you require everyone to be an expert then the community will have trouble growing.  Requiring expensive libraries or exotic compilers has a similar effect and dramatically limits the pool of individuals.  It also may signify that the community really isn’t interested in harness the resources of an open community.  This raises some interesting questions about existing projects, for example, VISTA is a widely used system for managing hospital infrastructure from the Veterans Administration of the United States, but it written in MUMPS, a language that few people know and with even fewer compilers (I’m aware that in the strictest sense VISTA isn’t Open Source anyway, as the core is public domain, but we’ll ignore that for now).  How does this compare to open source code designed to run on Oracle’s high end databases?  I don’t have a clear answer, but when weighing projects it’s something to be aware of.

Next up is a social process that mentors new developers.  Sadly this is one place where many Open Source projects end up falling flat.  For example, GLIBC under the iron hand of Ulrich Drepper fails this category.  When dealing with potential future developers, it’s necessary to actively ensure they learn the process.  This includes simple things such as helping developers understand the process roadmap, explaining why bugs are rejected or marked as dupes, and responding to users on mailing lists in a gentle manner.  Now clearly, at some point projects need to put some of the impetus for learning on the potential community members or we’ll end up with hundreds of Bowie J. Poags, but I’d imagine for every Bowie you’ll get a couple of genuinely helpful developers.

The final issue is obtaining contributions from individuals who don’t write code.  For a long time there has been a perception that the code is all that really matters.  This was reinforced through some early writing such as the Cathedral and the Bazaar and misinterpretations of Lessig’s Code.  The most successful projects all have numerous individuals who write no code, instead they write documentation, create art, support users on the mailing lists and chat channels, and act as general promoters of the software across all media. The fact of the matter is that although I know many skilled coders who may excel in some of these additional areas, they can’t do it all alone.  Without these other individuals, you don’t really have an Open Source project.

I don’t claim that these criteria are perfect for identifying “true” Open Source projects, but they do help to ensure that projects at least have some of the key elements of what is generally considered to be Open Source and have a chance of being successful. I also don’t see this as competing with something like Tony Wasserman’s Open BRR, rather, for individuals researching Open Source they should be a guideline for identifying projects that will be interesting and exhibit the behaviors that make Open Source interesting to study in the first place.

Command Line Updating Pages on Google Sites

About eighteen months ago I migrated my academic web pages away from a self hosted solution on a Linux box in my living room to Google Sites. Mainly this was done because I was applying for jobs and wanted to make sure that the site would be reliable. But although I came for the reliability, I stayed for the features.

It’s true that Google Sites is somewhat limiting in what you can do. You can’t do fun stuff with jQuery and highly customized CSS is verboten. It’s not going to work for someone who needs to share a design portfolio. However, for an academic it works really well. Basically, I need a set of pages about my research, copies of my papers and presentations, and various forms of my résumé and cv.  These are all typically boring pages that can be created with some simple HTML.  Google Sites manages that and even helps them look good too.

At that time I also realized that I needed to be a bit more flexible in how I handled my resume and CV. Up until this point I had a highly customized LaTeX file that generated a very pretty PDF.  The beauty was only skin deep, underneath the PDF it was ugly, difficult to maintain, and if I wanted to paste portions of the document in to an email or someone requested a word document version, I was out of luck.

At the time I still hadn’t gotten my head on straight regarding how XML should never be used by humans, so I chose the XML Résumé Library, an Open Source package that hasn’t been updated in a couple of years. The library consists of an XML DTD that defines elements of a resume and a set of XSLT files that translate your resume into various formats, including text, html, and FO. I can then use the FO files to generate PDF, DOCX, and ODT files. Simple enough. Now I have a single source document with a Makefile that compiles the file into both my résumé and cv.

The problem is that I provide each document in five different formats, which means that I needed to upload 10 documents every time that I changed something. This was not ideal at all.

Luckily, Google is in the process of making open APIs for all of their tools and last September they finally released the Google Sites API. It still isn’t 100% complete, but with the 2.0.7 release of the python libraries it is finally to the point where the python library is suitable for updating documents.

I whipped up a simple little python script that uploads files from the command line to Google Sites. It only works with documents that have previously been uploaded by hand, so in that sense, it only updates documents. You can find site_uploader.py as a github gist or it should be embedded below.

The script itself has only been tested on apps for domains and has a couple of mandatory options:

  • -s/–site: The name of the site to update. This isn’t the URL, but the name in your admin panel for the site.
  • -d/–domain: The domain name of your apps for domain setup. I’m not certain what happens if you don’t include this because all of my sites are hosted through Apps for Domains.
  • -u/–user: The username to use for accessing the Google Sites API.
  • -p/–pass: The password for the user account. The sites API provides multiple different authentication methods. For my own convenience I have my Makefile prompt me for a password with Zenity then invoke the script. I’m on a laptop which means the chance of someone else seeing my password in the process list is pretty slim.
  • ENTRY_ID: each document on your site has an entry_id that doesn’t change with updates. Think of it like a UUID.
  • NEW_DOCUMENT: the filename of the new document to store on Google sites.

When you’re first getting started with sites_uploader you can also use the –list option to get a list of all the documents on the site and their entry_id values. Here’s what a simple session might look like:

patrick@wallaby$ python sites_uploader.py -s "dummy" -d "wagstrom.net" -u "patrick@wagstrom.net" -p "PASSWORD" --ssl --list
["attachment.png", "attachment", "https://sites.google.com/feeds/content/wagstrom.net/dummy/9999384153430219999"]
["Home", "webpage", "https://sites.google.com/feeds/content/wagstrom.net/dummy/9999953700077559999"]
["files", "filecabinet", "https://sites.google.com/feeds/content/wagstrom.net/test/9999182398032899999"]
patrick@wallaby$ python sites_uploader.py -s "dummy" -d "wagstrom.net" -u "patrick@wagstrom.net" -p "PASSWORD" --ssl https://sites.google.com/feeds/content/wagstrom.net/dummy/9999953700077559999 home.html


This examines the site “dummy” under “wagstrom.net” by first listing all the documents, of which there are three: a png file called “attachment.png”, a webpage called “home”, and a filecabinet called “files”. We note the id of the webpage called “home” and wish to replace its content with that of home.html. When an operation is successful it prints out nothing.

The beauty of using id’s is that they don’t change, so once you look them up and put them into your Makefile, you’ll never need to change them again. The other nice thing about Google Sites is that it sanitizes your HTML, so you can feed it a complete HTML file and it is smart enough to just take the part that belongs in the body of the document. Pretty neat stuff indeed.

The code for the tool should be pretty straight forward, but folks have questions feel free to email me and I’ll attempt to answer.

Open Source Predictions for 2010

It’s become the in-vogue thing for experts on various issues to pontificate on events for the coming year.  As I’m now a Dr. and I received my doctorate for studying the socio-technical aspects of software engineering communities, I feel like I’m legitimately qualified to put forth predictions about Open Source, Free Software, and web technologies. Putting a prediction in this list does not necessarily mean that I want it to happen, merely that I think there is a significant chance it will happen.   I’m not an insider on any of these issues, but here’s what I think will happen in 2010, in rough order of likelihood:

  1. The Eclipse Foundation will undergo a major upheaval related to community/corporate structure and governance.This follows from some of the results of my Ph.D. research and also from more recent posts from Bjorn Freeman-Benson.  Much of this will arise from the structure of Eclipse as a 503c6 trade association and the lack of individual participation relative to commercial control of development.
  2. Highly customized hardware running on Open Source will continue to under-perform. In particular, OLPC will continue to be a yawn with most people questioning the long term value of the project over basic needs, resulting in no new deployments of more than 150,000 units and Litl will be too different for most people to understand and will fold.
  3. Despite the pleas of Open Source users and zealots, Netflix will continue to be inaccessible from open platforms. This isn’t a Netflix issue, it’s related to the content providers who want their content to be protected.  Until Open Source can provide a suitable way to protect this content it will continue to lag behind. Therefore, for the most part, Netflix will continue to require Silverlight for streaming.

    As Netflix becomes more dominant in the streaming market this may lead to some firms claiming that Netflix is abusing it’s power as a market leader by not providing open access to their streams. These people will have a fundamental misunderstanding of both the cause for Netflix requiring Silverlight and the duties of a market leader in an unregulated market.

    While this would be awesome, it ain't gonna happen.

    While this would be awesome, it ain't gonna happen.

  4. The Boxee Box will be greeted with lots of nerd hype, but will ultimately lose out to more flexible HTPC solutions based on Intel’s Pinetrail and nVidia’s Ion 2.This is largely related to the previous two predictions, but addresses broader issues of media convergence. I’m excited about the Boxee Box, but I’ve seen too many great devices fail because of various problems. Without a huge marketing blitz and without the ability to work as a Windows Media Center extender or record live TV, the beautiful Boxee Box will underperform as nerds op for a faster, more open platform.
  5. Google will suffer a major privacy leak and be very very quiet about the issue. Some will say this will hurt Google, but in the long run nothing will change. Google is too careful and conscientious about it’s position to allow this to happen to GMail.  It’s much more likely that this will occur in a smaller service such as Google Voice or Picasa.
  6. St. Ignucious - Harbinger of Doom? (Photo from Michael McCracken)

    St. Ignucious - Harbinger of Doom? (Photo from Michael McCracken)

    Tensions between Open Source developers and the Free Software Foundation will continue to grow, resulting in at least one major community issuing a formal statement about community and inclusiveness that puts it directly at odds with the Free Software Foundation. We’ve seen some signs of these issues, particularly in the GNOME community, which expressed outrage over Richard Stallman’s actions at the Gran Canaria Desktop Summit and also in a series of messages on various GNOME mailing lists in December. I’m not certain if it will be GNOME, but they are the community I’ve followed most closely on these issues.

  7. GNOME will encounter difficulties related to the foundation’s reliance on corporate sponsors and the rise of alternative platforms. GNOME was making excellent progress in the mobile space, especially with the maemo platform.  Unfortunately, Android may be a force too powerful for GNOME to compete against. Some of this loss will be offset by the increasing interest in netbooks and cloud based services, however such gains will be short lived as Android moves into netbooks and Google releases Chrome OS.
  8. Ubuntu will remove at least one major application in favor of directing users toward a cloud based solution. I have no idea what application this might be.  For end users I could easily see them removing Evolution, however it is too entrenched in the desktop at the enterprise level.  My best bet is that the dumping of GIMP for F-Spot will go poorly and Ubuntu 10.10 will direct users to an online application for photo editing and management.
  9. The issue of ownership rights of open source software that originated with one author, and was later updated by another who progressively removed all of the original author’s code will come to the forefront resulting in threats of, but no actual, legal challenges.We’ve seen the beginnings of this with Bruce Perens and Rob Landley spatting over BusyBox.  I’d imagine that the future challenge will come as the result of a commercial firm using an Open Source project as scaffolding to build a new project, eventually removing all the original Open Source software and then changing the license to something decidedly less open.
  10. Commercial ventures will continue to shy away from the GPL in favor of more standard Apache or BSD licenses. At least one major project/community will begin the process of transitioning to more business friendly licensing. We’ve seen how more open solutions (I consider the BSD to be more free than GPL) have been gaining traction, especially in the web frameworks market. It’s only a matter of time until a community begins to recommend that all projects adopt something like the BSD.  Despite this success, trolls will continue to insist that BSD is dying.

I’ll revisit these predictions as we turn the calendar to 2011. Given the probability estimates in my head I’d say that five of these will come true with my 95% confidence interval between 2 and 7. However, this is my first attempt at prognostication, so go easy on me in December 2010 when I review these, okay?

What the TSA has is a failure to communicate

On Christmas day a young Nigerian man boarded a flight from Lagos to Amsterdam and then later boarded a flight from Amsterdam to Detroit.  About an hour outside of Detroit, as the plane was descending over Canadian airspace he decided that his religion would martyr him if he blew up the plane using a slab of pentaerythritol tetranitrate (aka PETN) taped to his crotch.  Unfortunately for him and fortunately for the plane’s passengers the igniter failed and he merely started his crotch on fire and probably blew up his testicles.

As expected, the TSA felt a need to respond to yesterday’s threat today.  Apparently it is now a threat to have more than one carry on, electronics in the main cabin on international flights, anything in your lap, and to go to the bathroom within one hour of the flight landing. I’ll leave the sanity of these for other people.  What I’m interested in is the way that the TSA shared this information.

The primary way that travelers are told to stay up to date on the ever changing regulations regarding flight security is to visit the TSA home page.  Here’s a screen capture of the home page from December 27th, more than 48 hours after the incident on Northwest 253:

TSA.gov website on December 27th

TSA.gov website on December 27th

The TSA has a statement on flight 253, but there is nothing about the new flight restrictions, despite the fact that I’m sure that I, and thousands of other people, were concerned about how their flights home may be impacted.

The digerati may have also checked Twitter for information.  The TSA doesn’t have an official Twitter feed, but their blog team has a Twitter feed.  Here’s a screen capture of the Twitter feed taken at the same time:

The TSA's Twitter Stream on December 27th

The TSA's Twitter Stream on December 27th

There is a link to an article on the TSA’s blog about Northwest 253, but the article is merely the official statement and contains nothing about the changing regulations.  For more than 48th hours passengers were left to wonder about the changing regulations and how these regulations impact travel in the holiday season. The primary source of the changing regulations was a message on Air Canada’s web site.

Finally, late on the 27th the actual text of the regulations was finally made available, but not through an easy source on TSA’s website.  Instead, we’re forced to go to Christopher Elliott’s full text of the directive.

The administration has been quiet on the issue, but I believe that is more by design.  There’s no need to send everyone off to Sunday morning talk shows because a rich Muslim decided to blow his own balls off.  But what isn’t acceptable is the long delay and horrible communication regarding the new restrictions for travelers.  There’s lots more that could be written on this, but I’m trying to minimize editorializing in this article.

Will hubris kill MySQL?

Round and Round They Go!  Where They'll Stop, Monty Knows!

Round and Round They Go! Where They'll Stop, Monty Knows!

Over the weekend the twitterwebs lit up with the #savemysql tag in response to a plea from Monty, the founder of MySQL. The basic gist of Monty’s plea are as follows:

  • MySQL is a critical piece of infrastructure
  • Thousands of businesses use MySQL in functions beyond the web
  • Oracle sells a variety of competing databases
  • Oracle doesn’t have the best track record with Open Source
  • Therefore, Oracle should have to make a set of promises related to the commercial viability of MySQL.

The first four are all facts that are not in dispute.  The final point is Monty’s conclusion from the facts, that, in and of itself, doesn’t sound all that bad. If you’re the European Commission, of course you’re going to be concerned if a major tool for thousands of small and medium enterprises goes by the wayside. This would force those companies to spend untold amounts of time and money to procure another database — one most likely provided by an American company. Therefore, the merger between Sun and Oracle could result in a large amount of money flowing out of Europe and into the hands of the United States.

However, if we examine this issue a bit deeper, we see while Monty’s facts are true, the conclusion is completely unfounded.  First, the license that MySQL is distributed under is a dual license.  Anyone can download, modify, and distribute modifications the the GPL version of the database.  One the code is available under the GPL it’s forever available under that same license.  Sure, the copyright holder can change the license, but that doesn’t nullify the GPL versions of the software.  So, in that sense, MySQL cannot die from the Oracle acquisition of Sun any more than it could die from when Sun purchased MySQL AB. The dual license portion is where many people got bit by the whole thing.  Many companies believed that when they purchased a license from MySQL AB or Sun to use MySQL for purposes not covered under the GPL that they  were still getting an Open Source project.  The fact of the matter is that they were not.  The dual version of MySQL may be supported by the same tools as the Open Source version and may sound the same on the surface, but the license doesn’t guarantee the companies access to updated versions of the software.  What they were purchasing was not much different than a proprietary commercial piece of software that also has a free version.

For Monty to act like this is some travesty is entirely dishonest.  Of course he knew that this was the situation the whole time, he is the one who structured the MySQL licensing.  While there are a variety of companies that work in Open Source and made money from their expertise in the product area, performing consultations and developing custom extensions, MySQL AB went a step beyond this and also allowed firms to license the software out of the GPL.  This was because while other databases and packages were licensed under more business friendly licenses, the LGPL for JBoss and BSD for PostgreSQL, MySQL AB licensed MySQL under the GPL to create this business opportunity.  There’s nothing wrong with that, but it has always made MySQL adoption rather dicey inside of some enterprises because even the connectors for MySQL database access were GPL’d, making any program that used those connectors GPL.

One might argue that this licensing scheme is what made MySQL so valuable.  It was a critical piece of infrastructure and revenue could be derived from the dual licensing, something that while possible with PostgreSQL, is not nearly as easy.  Sun thought this too and they purchased MySQL for about $1 billion.  At this point I’m sure that Monty was now laughing all the way to the bank.  Then, oddly enough, Monty left Sun within seven months of the purchase.  Publicly stated reasons were his disagreement with Sun wanting to offer commercial extensions to some of the core of MySQL.  In February of 2009, about five months after leaving Sun, Monty announced the creation of Monty Program, a new software firm that oddly enough, seems to do exactly what MySQL would have been doing if they didn’t offer the dual license option.

However, now something was different, after the sale of MySQL AB to Sun, Monty no longer owned or had the rights to the source code.  That’s what the $1 billion in compensation was for.  Rather than being like other entrepreneurs and going into a different venture, Monty started nearly the exact same business he had before, only with a no rights to the source code.  Therein lies the problem for Monty and his call for the EC to block the Oracle purchase of Sun.  Doing so would hurt his business.  It would hurt his business because of the license that he put the software under in the first place.  It would hurt other businesses because of the ruse that he managed to play on business that didn’t completely understand Open Source licensing, and he took little effort to explain what they were purchasing.

Yes, a purchase of Sun by Oracle could hurt thousands of small and medium enterprises that rely on the dual licensed version of MySQL, but so will the continued downfall of Sun.  If Sun goes out of business it could put the code in legal limbo, leaving no one able to get a license for the code.  Furthermore, it’s hurting the MySQL community right now.  With Sun being in limbo right now, I’m sure that they’re not winning many new customers — businesses hate uncertainty, and there isn’t much greater uncertainty than wondering if your supplier is going to be independent or not in the next few months.  This whole thing smacks of selfishness and I truly hope that the EC sees through this ruse and the all the letters they get from the internet fanboys.

If that doesn’t work, then Michael Meeks has helpfully pointed out that Sun provides an LGPL connector to MySQL, but no one has actually requested the source code for it yet.  Just sayin’…

Wouldn’t it be great if there was a StackOverflow for that?

Since I started on my “seekrit project” I’ve asked a few questions on programming related topics at StackOverflow. Thus far I’ve found the answers to be really helpful. While most people know about StackOverflow, lots of folks don’t realize that there three other sites in StackOverflow empire: ServerFault – for system administration issues, SuperUser – for end user software issues, and Meta.StackOverflow – for site related questions about StackOverflow.

The Four Horsemen of the StackOverflow Empire

The Four Horsemen of the StackOverflow Empire

Today during my run in Central Park, I was thinking about the community that StackOverflow has. It’s really a remarkable thing, there are thousands of users and most posts in even the most obscure topics get lots of views quickly. If you’re lucky, you’ll even get a good answer or two — or, more likely, you’ll have someone who is smart tell you that you’re not asking the right question, and help you down the right path to rethink your problem in the correct way. This sort of input can be really helpful in many other domains, parenting, cooking, travel, etc.

It’s actually possible to run your own StackOverflow site. The software, StackExchange is available from FogCreek software who offer different hosted plans. At $129/mo for the most basic plan it’s not cheap at all, but when you consider that includes everything, it’s not a horrible idea either. It’s been a pretty decent hit so far too, the StackExchange website lists numerous sites around specific topics.

So here’s the sites that I’d really like to see:

  • NYCLikeALocal.com: Questions and answers about the self-proclaimed greatest city in the world. With a focus for people actually living in the city who may not be overly familiar with the city. In other words, folks like me who live in NYC, but aren’t native.
  • RunThroughTheWall.com: Questions and answers about training for marathons and other long distance endurance sports, like IronMen triathlons.
  • SnackExchange.com: Ideas for cooking, specifically desserts. Everyone loves deserts.

After browsing through the sites, I found out that in each of the cases someone had already beat me to it. For NYC local information there is PojoCity.com. For running someone created FitBulb which covers all sports. Unfortunately, they weren’t on the ball and let a squatter get FibBulb.com. For cooking there is cooking.stackexchange.com, which probably the least branded StackExchange site in existence.

While I’ve contributed to FitBulb, most of the sites are rather sparse. This could be for a variety of reasons, it’s likely that they haven’t advertised themselves enough. StackOverflow launched and was hugely successful because Jeff Atwood and Joel Spolsky both had legions of followers on their weblogs. When the site was launched there was already a huge pool of users. By mentioning them here I’m hoping that other people will use these three StackExchange sites and help build mass.

The second reason why the sites may have difficulty in excelling is because they’re not generally perceived as a need. For programming there was a definite need for a site where people could ask questions and get reasonable answers. StackOverflow was aimed dead center at destroying ExpertsExchange, both by providing better answers, more information about people who provide answers, and not seeming all scammy. From the looks of it, they’re succeeding.

Stackoverflow will soon kill Experts-Exchange.  And the world will be a better place.

Stackoverflow will soon kill Experts-Exchange. And the world will be a better place.

However, most of these other domains don’t have any large community at all, so it’s up to the site developers and maintainers to build one. This is no easy task. I’d like to think that I’m pretty savvy, but I haven’t seen much about communities for any of these topics.

Finally, there’s a small problem with the architecture of StackExchange. I would like to vote up some posts and answers on FitBulb, but I can’t do that until I get 15 karma. Unfortunately, when no one has any karma on the site, it’s difficult to build enough karma. So, if the site admins hear me, vote me up some so I can support other people on your fledgling site.

In any case, it’s exciting to see that there are StackExchange sites for the three sites that I really hoped would exist. I’ve already posted a bit to FitBulb and have found some useful stuff on Pojocity, but remember, community sites are only as good as the people who use them. So visit the sites and give them some love today.

Textile Markup in Trac

For a new secret project I’ve been putting together an installation of Trac to provide a web interface for task management. I found myself wanting to create a nice looking table within a Trac wiki page to document different dependencies needed by the software. In a normal installation there are three different ways to do this:

Trac’s Wiki Formatting

This is probably the simplest option, however it’s also the most limiting, there is no way to provide additional styling to elements and you can’t create columns that span multiple rows or columns.  The lack of additional styling is particularly difficult because you can’t correct it easily with CSS without modifying all of the tables.

1
2
3
4
5
6
7
8
|| '''Client Side Packages''' ||
|| '''Package''' || '''License''' || '''Purpose''' ||
|| [http://jquery.com/ jQuery] || [http://docs.jquery.com/Licensing MIT or GPL] (choice) || General Purpose javascript ||
|| [http://raphaeljs.com/ Raphaël] || [http://raphaeljs.com MIT] (see sidebar) || Vector graphics library ||
|| '''Server Side Packages''' ||
|| [http://www.djangoproject.com/ Django] || [http://code.djangoproject.com/browser/django/trunk/LICENSE BSD] || Server framework ||
|| [http://pinaxproject.com/ Pinax] || [http://github.com/pinax/pinax/blob/master/LICENSE MIT] || Server social components ||
|| [http://www.sqlite.org/ SQLite] || [http://www.sqlite.org/copyright.html Public Domain] || Server database ||
Table formatted using Trac Wiki Processing

Table formatted using Trac Wiki Processing

HTML

HTML is of course, the go to for all of this.  However, the whole reason I started looking at this is that I didn’t want to embed a huge chunk of HTML into my wiki page.  I know I can do all the formatting I want, but this really isn’t an option.

ReStructured Text

ReStructured Text is the preferred markup language of Python documentation.  It’s more robust than Trac Wiki formatting, especially for tables.  The processor for ReStructured Text in Trac is smart enough to tag tables with the class ‘docutils’ which helps out immensely when trying to style the documents.  This allows for the creation of some prettier tables.  It also allows the creation of header columns and multi-column/row cells.  However, it’s not great for this sort of general purpose document, for two reasons.  First,  it doesn’t allow for a header row to appear in the middle of the table.  I’d really like these to be part of the same table.  Secondly, you can’t have multiple short links with the same name.  So, I can have the name ‘MIT’ link to two different pages without doing implicit links.  However, this introduces a new third problem, which is that table cells are proportional in width to the RST equivalent.  Here’s the sample table structured in RST:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
+--------------+----------------------------------------------------------------------+----------------------------+
| Client Side Packages                                                                                             |
+--------------+----------------------------------------------------------------------+----------------------------+
| Package      | License                                                              | Purpose                    |
+==============+======================================================================+============================+
| jQuery_      | `MIT or GPL <http://docs.jquery.com/Licensing>`__ (choice)           | General JavaScript library |
+--------------+----------------------------------------------------------------------+----------------------------+
| Raphaël_     | `MIT <http://raphaeljs.com/>`__ (see sidebar)                        | SVG/VML Library            |
+--------------+----------------------------------------------------------------------+----------------------------+
 
|
 
+--------------+----------------------------------------------------------------------+---------------------------+
| Server Side Packages                                                                                            |
+--------------+----------------------------------------------------------------------+---------------------------+
| Package      | License                                                              | Purpose                   |
+==============+======================================================================+===========================+
| Django_      | `BSD <http://code.djangoproject.com/browser/django/trunk/LICENSE>`__ | Server framework          |
+--------------+----------------------------------------------------------------------+---------------------------+
| Pinax_       | `MIT <http://github.com/pinax/pinax/blob/master/LICENSE>`__          | Server social components  |
+--------------+----------------------------------------------------------------------+---------------------------+
| SQLite_      | `Public Domain <http://www.sqlite.org/copyright.html>`__             | Server database           |
+--------------+----------------------------------------------------------------------+---------------------------+ 
 
.. _jQuery: http://jquery.com/
.. _Raphaël: http://raphaeljs.com/
.. _Django: http://www.djangoproject.com/
.. _Pinax: http://pinaxproject.com/
.. _SQLite: http://www.sqlite.org/

Unfortunately, this results in tables that are unnaturally wide and must be split into two different tables.  While the result is passable, especially after throwing down some CSS to help out and collapse those ugly borders, the width of the tables makes them look highly awkward, and therefor unacceptable.

Table output using ReStructured Text processor

Table output using ReStructured Text processor

Markdown

Markdown is the hotness markup language that all of the web 3.0 kiddies like to use. And why not?  It’s from John Gruber, macboy extraordinairre!  Markdown is designed to be wholly readable without people needing to know how to speak HTML.  It’s quick and easy to use.  There’s a pretty good markdown processor available for Trac.  However, there are a few shortcomings.  First, by default Markdown has no support for tables, so you’ll need to drop back to HTML for the formatting.  This is less than ideal as it’s messy again.  Furthermore you cannot use Markdown inside of an HTML table.

Fortunately, there are some extensions to Markdown that provide support for tables.  As is normal for this sort of community, it’s been further extended to allow for all sorts of crazy formatting and alignment issues.  However, as near as I can tell it still does not allow cells to span multiple columns/rows.  Or, if it does, the Markdown plugin for Trac doesn’t support it.  I didn’t really get far enough to generate anything really interesting with Markdown.

Textile

In the world of Web 2.0 fanboys, the other competing markup, favored by the Ruby guys and folks at GitHub, is Textile.  It’s slightly similar to Markdown, in that it aims to be fairly readable, but advanced features let you do neat stuff at the expense of some readability.  Fortunately, using Textile I was able to make the table look exactly like I wanted to.  Although Textile allows you to use multiple header rows, I did have to apply some simple CSS styling to the header columns and rows.  Here’s the Textile I used.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
table(clean).
|_\3. Client Side Packages                                                 |
(bbot). |_. Package |_. License                |_. Purpose                 |
| "jQuery":jq       | "MIT or GPL":jql (choice)| General purpose javscript |
| "Raphaël":ra      | "MIT":ral (see sidebar)  | Vector graphics library   |
(bbot btop). |_\3. Server Side Packages                                    |
| "Django":dj       | "BSD":djl                | Server framework          |
| "Pinax":pi        | "MIT":pil                | Server social components  |
| "SQLite":sq       | "Public Domain":sql      | Server Database           |
 
[jq]http://jquery.com/
[jql]http://docs.jquery.com/Licensing
[ra]http://raphaeljs.com/
[ral]http://raphaeljs.com/
[dj]http://www.djangoproject.com/
[djl]http://code.djangoproject.com/browser/django/trunk/LICENSE
[pi]http://pinaxproject.com/
[pil]http://github.com/pinax/pinax/blob/master/LICENSE
[sq]http://www.sqlite.org/
[sql]http://www.sqlite.org/copyright.html

And here is the actual output from Trac after rendering the Textile (after a little bit of help from some CSS:

Table output using Textile processor

Table output using Textile processor

Trac Textile Macro

To get this far I needed to write a simple little Trac processing macro for Textile. Using information from the excellent Markdown Macro I was able to hack together a simple Textile Macro for Trac.  Before installing it you’ll need to install the Python Textile libraries.  Then, you can clone the git repository from git://github.com/pridkett/tractextilemacro.git.  From here you can install it like a normal Trac component.  Enable it in your trac.ini using these lines:

1
2
[components]
Textile.* = enabled

Now, just embed your Textile code in a block like this and you’ll be on your way:

1
2
3
4
5
{{{
#!Textile
 
Hello from the __magical__ land of "Textile":http://textile.thresholdstate.com/!
}}}

Shutting Down PennAve

Several years ago I created a nifty little photo gallery webapp for F-Spot users called PennAve. PennAve was designed to be incredibly simple in its use: Tag a set a photos in F-Spot as “Public”, copy the database to your web server, and boom, those photos would be displayed. It was quite simple and just worked, well for the most part.

However, in the design of PennAve I made a couple of critical errors and since that point the marketplace has changed significantly.  Some of these are issues that I should have be more aware of, other ones I couldn’t have anticipated.

Issue 1: XML? What was I thinking‽

At what point did I think that code like this would be a good idea?

At what point did I think that code like this would be a good idea?

At the time I was all hyped up on the XML goofballs and believed that making the whole thing styleable using client side XSLT would be an awesome idea.  It was just really neat that I could send an XML document and Firefox could transform it.  Think of the other clients I could use to slurp that data with little effort! In retrospect, trying to do any sort of significant styling using XSLT makes me feel like I’m slowly driving an ice pick into my head. The clean syntax of XML gives way to the disaster of XSLT, templates, variables, and overly verbose syntax.

Issue 2: Strange dependencies and poor documentation.

PennAve required things like CherryPy and SQLObject (for a while it even required a SVN version of SQLObject, scary, eh?).  Folks know how to install a PHP script, they can probably figure out how to run a Django app.  Trying to explain to everyone how to compile mod_wsgi so they could run PennAve in the web server, or use mod_proxy to redirect to CherryPy’s internal web server was a big pain.  I sought to make it really easy to create your galleries, but the installation bordered on insanity for some users.

Issue 3: Slow development of F-Spot.

I don’t use F-Spot that much anymore.  I have most of my collection tagged and edited (more than 800 different tags), so I just pop in when I need to import some new photos.  However, while F-Spot was pretty good four years ago, its development is slow enough that numerous other solutions make it look ancient.  The tools for editing photos are slightly substandard. It doesn’t have face detection. It is sometimes prone to crashing. I’m not trying to blame Stephane Delcroix (the maintainer of F-Spot) for these problems, but that’s the reality of a project that only has a small community.  Perhaps these will change with Ubuntu’s decision to make F-Spot the default instead of GIMP, but that’s a little uncertain.

Issue 4: Cheap online storage

Google recently announced that you could get 20GB of storage for $5/yr.  That’s cheap enough that you can easily manage your entire collection online.  $20 for 80GB of storage without having to worry about bandwidth and server costs is a godsend.  Reduction in prices for Flickr also affect this.

Issue 5: The Awesomeness of Face Detection

Not even my ugly mug could break Picasa's face detection!

Not even my ugly mug could break Picasa's face detection!

F-Spot has a default category of people where you can create a tag for each person.  However, this merely associates the photo with a person, not an area of a photo with a person.  Using face detection you can find people in all your photos, many times automatically.  For example, I’ve found cases where I had photos of friends from before I became friends with them.  When I put these photos in F-Spot I didn’t tag those photos, but now I’d like to do that.  Face detection takes care of that with no interaction from me.

Issue 6: Geotagging

Once again, F-Spot has a default category of places.  I carefully created a hierarchy of locations so I could quickly find all of my photos taken in Pittsburgh, or Pennsylvania, etc.  However, it’s much better if photos can say exactly where they’re taken, which is where geotagging is helpful.  Photos from my iPhone already are geotagged and the integration of Picasa with Google Earth and Google Maps makes it dreadfully simple to tag all of my other photos.

Independently none of these issues was bad enough to cause me to stop development, but in sum, they’re huge.  I never started PennAve to get hordes of fans, mainly I needed a way to show my own photos and if a few other people might find it useful, that would be great.  I’m pleased that a handful of other people have found PennAve useful, but times change and it’s time to move on.  Therefore, I’m declaring officially that I’m stopping development on PennAve.  I’ll keep the site up for the reasonable future, but I won’t make efforts to update the software or ensure the website stays operational indefinitely.  Thanks for all the help from everyone!