<?xml version="1.0"?>
<rss version="2.0">
    <channel>
        <title>My Delusional Dream Feed</title>
        <link>http://patrick.wagstrom.net</link>
        <description>The 10 most recent posts from http://patrick.wagstrom.net.</description>
                <lastBuildDate>Wed, 17 Apr 2013 13:04:00 UTC</lastBuildDate>
        <generator uri="http://github.com/tylerbutler/engineer">Engineer</generator>
                    <item>
    <title>New Paper&#x3a; A Network of Rails&#x3a; A Graph Dataset of Ruby on Rails and Associated Projects</title>
    <link>http://patrick.wagstrom.net/weblog/2013/04/17/a-network-of-rails/</link>
    <description><![CDATA[
                            <p>For the last year and a half I&#8217;ve been working with Anita Sarma, a professor at the University of Nebraska, Lincoln and her graduate student, Corey Jergensen, to try and understand some of the social dynamics around GitHub. As we began to dig at the ecosystem we realized that we had an opportunity to perform some novel analysis on the community. Specifically, GitHub is a highly networked ecosystem and most of the queries that we were doing were localized around single projects or developers. At this time graph databases were taking off so we decided to learn a new technology while getting some data at the same&nbsp;time.</p>
<p>This resulted in the creation of <a href="https://github.com/pridkett/gitminer">GitMiner</a>, a tool that utilizes the GitHub APIs to download all the data about a project and it&#8217;s related users, issues, pull requests, and basically everything else that you can get out of the <span class="caps">API</span>. It then stores this information inside of a graph database - something that I&#8217;ve written about before when I first published <a href="/weblog/2012/05/13/mining-github-followers-in-tinkerpop/">a dataset on the Tinkerpop family of projects</a>.</p>
<p>Now we&#8217;ve had a chance to formally publish a larger set of data, thousands of projects associated with Ruby on Rails. The data are published in this years <a href="http://2013.msrconf.org/">conference on Mining Software Repositories</a>. If you&#8217;d like to read the actual paper, here&#8217;s <a href="http://academic.patrick.wagstrom.net/publications/Wagstrom_2013_ANetworkOfRails.pdf?attredirects=0">the authors&#8217; pre-print of the paper</a> and <a href="https://github.com/pridkett/gitminer-data-rails">the GitHub repository with the actual data</a>.</p>
<p>In the coming weeks/months I&#8217;ll probably write more about how to use GitMiner to collect large amounts of data from GitHub and how to crawl this data. For the interim, however, I&#8217;ll leave you with this nifty picture of shared developers between projects, which is part of an upcoming submission of&nbsp;ours.</p>
<div class="image caption center">
    <a href="/weblog/media/2013/04/rails-network.png"><img src="/weblog/media/2013/04/rails-network-thumb.png" width="800" height="573"></a>
    <p>Developers shared between projects in our Ruby on Rails dataset. The size of nodes represents the number of developers on the project, edge width is the number of shared developers between projects, and color represents programming language. [<a href="/weblog/media/2013/04/rails-network.png">link to full size image</a>]</p>
</div>

<p><strong>Citation:</strong> Wagstrom, P., Jergensen, C., and Sarma, A. <em><a href="http://academic.patrick.wagstrom.net/publications/Wagstrom_2013_ANetworkOfRails.pdf?attredirects=0">A Network of Rails: A Graph Dataset of Ruby on Rails and Associated Projects</a></em>. Proceedings of the 2013 Working Conference on Mining Software Repositories, <span class="caps">ACM</span>&nbsp;(2013).</p>
                        ]]></description>
    <pubDate>Wed, 17 Apr 2013 13:04:00 UTC</pubDate>
    <guid>http://patrick.wagstrom.net/weblog/2013/04/17/a-network-of-rails/</guid>
</item>
                    <item>
    <title>The KTHXBAI Experiment</title>
    <link>http://patrick.wagstrom.net/weblog/2013/03/06/the-kthxbai-experiment/</link>
    <description><![CDATA[
                            <p>On May 1st, 2012 I embarked on an experiment at work &mdash; I started signing work emails to my team and friends inside and outside the office with the words &#8220;<span class="caps">KTHXBYE</span>&#8221; or &#8220;<span class="caps">KTHXBAI</span>&#8221;. The goal was to see how long it would take until someone mentioned or asked about it. About two weeks after I started the experiment a friend from Microsoft noticed it and mentioned it to me. Of course, I replied with a&nbsp;meme:</p>
<div class="image caption center">
    <img src="/weblog/media/2013/03/KTHXBYE1.jpg" height="400" width="400">
</div>

<p>To which my friend at Microsoft was gracious enough to reply with a meme of his own. This experiment was clearly off to an awesome&nbsp;start.</p>
<div class="image caption center">
    <img src="/weblog/media/2013/03/KTHXBYE2.jpg" width="400" height="235">
</div>

<p>I expected that I&#8217;d hear back from other folks in a matter of days. But then the days turned to weeks and the weeks slowly turned into months. I concluded one of two things: either no one actually read my email, or no one actually caught the reference. Undaunted I persisted. Over the course of the experiment I sent out more than 450 messages with the signature &#8220;<span class="caps">KTHXBYE</span>&#8221; and about 65 with &#8220;<span class="caps">KTHXBAI</span>&#8221;, although I only realized that &#8220;<span class="caps">KTHXBAI</span>&#8221; was the appropriate spelling late into the&nbsp;experiment.</p>
<p>Finally, yesterday, March 5, 2013, the experiment came to an end. My manager asked about what it meant, and she googled the definition. Which, unfortunately led to the <a href="http://www.urbandictionary.com/define.php?term=kthxbai">Urban Dictionary definition of &#8220;<span class="caps">KTHXBAI</span>&#8221;</a>.</p>
<div class="image caption center">
    <a href="http://www.urbandictionary.com/define.php?term=kthxbai
"><img src="/weblog/media/2013/03/KTHXBAI-UrbanDictionaryDefinition.png" width="484" height="226">
    <p>The Urban Dictionary definition of&nbsp;<span class="caps">KTHXBAI</span></p></a>
</div>

<p>My response: &#8220;Ughh…&#8221;. This led to an explanation that Urban Dictionary shouldn&#8217;t be trusted and that, no, I wasn&#8217;t telling my co-workers to get bent at the end of every email. I had to introduce the whole concept of LOLCats, which thankfully was backed up by the creation of my 2007 intern project called LOLJazz, which somehow lingers on as a zombie inside of our Rational Team Concert Server. Still, I wasn&#8217;t out of the woods, there was the chance that it could still be &#8220;actionable&#8221;. This is where I had an ace up my sleeve. During the development of Watson, <span class="caps">IBM</span>&#8217;s Jeopardy! playing computer, the team, which happens to be in my organization, fed the entire Urban Dictionary into Watson. As could be guessed, <a href="http://www.theatlantic.com/technology/archive/2013/01/ibms-watson-memorized-the-entire-urban-dictionary-then-his-overlords-had-to-delete-it/267047/">the importation of Urban Dictionary into Watson led to many hilarious and wholly inappropraite responses</a>. In short, Urban Dictionary was a cesspool and shouldn&#8217;t be used as canon. Rather in this case the <a href="http://icanhas.cheezburger.com/tag/kthxbai">Cheezeburger kthxbai</a> is a much better&nbsp;source.</p>
<div class="image caption center">
    <a href="http://cheezburger.com/5280420352"><img src="https://i.chzbgr.com/maxW500/5280420352/h03E3E890/" width="500" height="322"></a>
    <p>A traditional use of <span class="caps">KTHXBAI</span> (even if it is&nbsp;misspelled)</p>
</div>

<p>And so, my experiment has come to an end. In the end it was sorta a drag as month after month passed with no one mentioning it. After I talked about it as an experiment everyone came out of the woodwork to say they had seen it and wondered what it meant, but didn&#8217;t bother to ask. Which leads me to wonder, how often does this happen? Do people even read my emails? Do they just ignore things they don&#8217;t understand or perceive as irrelevant? Do they do that to everyone, or just me? Could I start saying that we need to replace the fitzervalve on the flux capacitor in order to keep the keep the servers from frobnicating themselves and get away with&nbsp;it?</p>
<p>Now, it&#8217;s time to find a new subversive work&nbsp;experiment…</p>
                        ]]></description>
    <pubDate>Wed, 06 Mar 2013 14:32:00 UTC</pubDate>
    <guid>http://patrick.wagstrom.net/weblog/2013/03/06/the-kthxbai-experiment/</guid>
</item>
                    <item>
    <title>Looking for an Intern for Summer 2013</title>
    <link>http://patrick.wagstrom.net/weblog/2013/01/22/looking-for-an-intern-for-summer-2013/</link>
    <description><![CDATA[
                            <p>Once again I&#8217;m looking for an amazingly bright Ph.D. student to work with me over the course of the summer. The position is open to Ph.D. students from any university and at any point of their studies, and I can nearly guarantee it&#8217;s going to be an awesome&nbsp;experience.</p>
<p>The primary task will be applying machine learning techniques (lexical analysis, network extraction, predictive analytics) to the usage data from a large piece of commercial software. With a little bit of luck the software will be instrumented by this point in time so you&#8217;ll just need to slice and dice the data and find awesome stuff. The goal, of course, is to publish an amazing paper that provides great insight into how users actually use this type of software and provide guidance to architects and developers of such a&nbsp;system.</p>
<p>A loose list of skills that are desirable&nbsp;are:</p>
<ul>
<li>Java: Most of our tools are written in Java. It took me a while to get used to this, but Java has some nice advantages for developing code to run in an enterprise. Here at <span class="caps">IBM</span> we really love it and most of our software, including the tool we&#8217;re looking at, is built in&nbsp;Java.</li>
<li>Software Engineering Processes: Domain expertise in understanding the relationships between the different levels of stakeholders in a software project is immensely helpful and will make it a lot easier to tease great bits of nuggets out of the&nbsp;data.</li>
<li>Machine Learning: We use various types of machine learning, both Java libraries and some R to understand the data. On the Java side knowledge of text analysis packages such as OpenNLP is&nbsp;helpful.</li>
<li>Statistics: I love R. If you love R it helps&nbsp;out.</li>
<li>Visualizations: I&#8217;m big on making great visualizations to show off our findings. If you&#8217;re a ninja with <a href="http://ggplot2.org/">ggplot</a> or <a href="http://d3js.org/">d3</a> then you probably&nbsp;qualify.</li>
</ul>
<p>Of course, there&#8217;s a variety of other skills that are helpful too. The intern absolutely must be self motivated and able to find answers to questions on their own. This isn&#8217;t an unsupervised position, but I travel a lot and am frequently out of the office, which limits my ability to provide direct daily supervision. As a result, excellent communication skills are also helpful &mdash; you should know how to ask questions over email in way which is succinct while providing enough information to other people to answer the question. If you&#8217;ve got a great profile on <a href="http://www.stackoverflow.com/">StackOverflow</a> you&#8217;re probably already&nbsp;there.</p>
<p>There&#8217;s some great advantages to spending a summer working with me at <a href="http://www.research.ibm.com/labs/watson/index.shtml"><span class="caps">IBM</span> <span class="caps">TJ</span> Watson Research</a> in Yorktown Heights, <span class="caps">NY</span>. First, you&#8217;ll be working with some of the smartest people in the world at a facility that has an amazing legacy. <span class="caps">IBM</span> Research was the genesis of <span class="caps">DRAM</span>, the processors in all major video game consoles, Watson - the Jeopardy! playing computer, <span class="caps">LASIK</span>, and thousands of other things. We make the world&nbsp;awesome.</p>
<p>Second, our interns come from around the world and are generally smarter than we are. You know that feeling you get when you go to a conference? You&#8217;re always excited about new ideas and feel like you could go home and churn out your thesis in a week. Imagine that feeling for an entire summer! I had a blast when I interned here and met some incredible young researchers who I&#8217;m still friends&nbsp;with.</p>
<p>Thirdly, we&#8217;re just outside of New York in scenic Westchester County, <span class="caps">NY</span>. I took the train into the city every Friday, Saturday, and Sunday when I interned here. It was the perfect combination of excitement from New York City and a setting where you can really get work done. You may be saying &#8220;isn&#8217;t New York really expensive?&#8221;. You&#8217;re entirely right. Don&#8217;t worry, we pay enough that it&#8217;s totally worth your&nbsp;time.</p>
<p>Interested? You can either <a href="mailto:patrick@wagstrom.net">email me</a> or visit our <a href="https://jobs3.netmedia1.com/cp/faces/job_summary?job_id=RES-0546073">intern hiring page for more information</a>. We won&#8217;t be taking application that much longer, so be sure to act&nbsp;soon.</p>
                        ]]></description>
    <pubDate>Tue, 22 Jan 2013 15:45:00 UTC</pubDate>
    <guid>http://patrick.wagstrom.net/weblog/2013/01/22/looking-for-an-intern-for-summer-2013/</guid>
</item>
                    <item>
    <title>Rules for Recruiters, Vol 1: GPA Doesn't Matter If You Have a Ph.D.</title>
    <link>http://patrick.wagstrom.net/weblog/2012/12/21/rules-for-recruiters-vol-1-gpa-doesn-t-matter-if-you-have-a-ph-d/</link>
    <description><![CDATA[
                            <p>Tech jobs are hot in New York right now. Last year while sitting in LaGuardia Airport waiting for a flight I was hacking on some code for work in Eclipse and guy who was shoulder surfing me tried to persuade me to interview for positions he had available at his hedge fund. If you visit any Meetup in the city you&#8217;ll hear from dozens of people who are looking for the best and the brightest. When I combine these with a publicly visible Github profile, a resume that&#8217;s sitting on my web page, and a fairly complete LinkedIn profile it means that messages from recruiters are constantly flooding my&nbsp;mailbox.</p>
<p>They&#8217;re nearly all amateurish wastes of my&nbsp;time.</p>
<p>In this series of posts I&#8217;m going to chronicle why they&#8217;re such a waste of my time. Here&#8217;s a paraphrased recent message I&nbsp;got:</p>
<blockquote>
<p>Dear Dr. Wagstrom,<br><br>
I work for MegaHyperTech, a leading technology placement firm in New York City. We came across your profile on GitHub and later found your resume and think that you may have the talent that our client Quanttastic Solutions is looking for. They&#8217;re a hedge fund that makes it feel like you&#8217;re working at Google. They hire only the best and the brightest from schools like <span class="caps">MIT</span>, Berkeley, <span class="caps">CMU</span>, and Michigan. We&#8217;d love to send your information over there, but we noticed that you don&#8217;t list your <span class="caps">GPA</span> on your resume and they only hire individuals with exemplary GPAs. If you&#8217;re willing to update your resume to include that information for all your degrees we think that you&#8217;d enjoy the&nbsp;challenge.</p>
</blockquote>
<p>The recruiter is entirely correct, I don&#8217;t list my <span class="caps">GPA</span> on my resume. This si done for a couple of reasons: first, my degrees are intertwined. It&#8217;s really hard to differentiate the GPAs for computer science, electrical engineering, and computer engineering bachelors degrees. They&#8217;re all in the 3.5 - 3.9 range, but really I don&#8217;t remember what they were. Likewise, my masters and Ph.D. from Carnegie Mellon are also intertwined and probably have a similar&nbsp;range.</p>
<p>But the bigger issue is that a Ph.D. isn&#8217;t about classes. In fact, while working on a Ph.D. if you&#8217;re taking a required class that isn&#8217;t directly related to your research you probably shouldn&#8217;t spend enough time to get an &#8216;A&#8217; in the class. The measure of the work for a Ph.D. is the thesis and the publications that come out as a result of doing the research. I think of all the times that I met with my advisors I was asked my grades only once, and it was over a concern that I was spending too much time on my homework for my machine learning&nbsp;class.</p>
<p>So here&#8217;s my hope that maybe at some point a recruiter will read this. If you ask me for my <span class="caps">GPA</span> you&#8217;re not going to get it. If your client insists on GPAs for their candidates, then they don&#8217;t know what they&#8217;re&nbsp;getting.</p>
                        ]]></description>
    <pubDate>Sat, 22 Dec 2012 02:10:00 UTC</pubDate>
    <guid>http://patrick.wagstrom.net/weblog/2012/12/21/rules-for-recruiters-vol-1-gpa-doesn-t-matter-if-you-have-a-ph-d/</guid>
</item>
                    <item>
    <title>30 Meters Underwater with a Dead Physical Layer Protocol</title>
    <link>http://patrick.wagstrom.net/weblog/2012/12/04/30-meters-underwater-with-a-dead-physical-layer-protocol/</link>
    <description><![CDATA[
                            <p>A couple of years ago I got the bright idea that I&#8217;d get my wife open water <span class="caps">SCUBA</span> certification as her Christmas present. She likes aquariums and fish and I thought it would be a fun way to do something different when we travel. Fast forward to the present day and I&#8217;ve got a closet filled with neoprene, BCDs, fins, first aid kits, and a dive log filled with all sorts of certification cards from&nbsp;<span class="caps">PADI</span>.</p>
<p>We purchased our own equipment relatively early in the process of learning how to <span class="caps">SCUBA</span> dive - shortly after getting our open water certification, thanks in large part to a nice tax rebate. For the most part we&#8217;ve been very happy with our purchases and I feel like it&#8217;s made us much more comfortable when we&#8217;re underwater. One of the key components of diving is a dive computer. The most basic dive computers tell you your depth and warn you if you&#8217;re ascending too fast or are going to need a decompression stop somewhere along the way. More advanced computers replace your entire diving console and provide a compass and wireless integration of you and your buddy&#8217;s pressure gauge. Yeah, we went for that kind of over the top dive computer and bought the <a href="http://www.scubapro.com/en-US/USA/instruments/computers/products/galileo-luna.aspx">Uwatec Galileo Luna</a>.</p>
<div class="image caption center">
    <img src="/weblog/media/2012/12/galileo_luna.jpg" alt="Uwatec/ScubaPro Galileo Luna Hoseless Air Integrated Dive Computer">
    <p>Uwatec/ScubaPro Galileo Luna Hoseless Air Integrated Dive&nbsp;Computer</p>
</div>

<p>I&#8217;ll be the first to admit that this probably wasn&#8217;t the wisest of ideas. I spent two weeks researching <a href="http://www.amazon.com/gp/product/B003BYRDK2/ref=as_li_ss_tl?ie=UTF8&amp;camp=1789&amp;creative=390957&amp;creativeASIN=B003BYRDK2&amp;linkCode=as2&amp;tag=twesixandtwo-20">$55 gel pads for my standing desk<img src="http://www.assoc-amazon.com/e/ir?t=twesixandtwo-20&l=as2&o=1&a=B003BYRDK2" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /> </a> and here we just decided to drop $2000 on a couple of dive computers thanks to thirty minutes at our local dive shop. We&#8217;ve been completely thrilled with them under water. Where we&#8217;ve had more problems is getting data out of them above water. More advanced computers also take periodic samplings of your depth, remaining air, water temperature, etc. You can use this data to reconstruct a dive profile in a way that much richer than what normally appears in your dive&nbsp;log.</p>
<div class="image caption center">
    <img src="/weblog/media/2012/12/divelog.png" alt="Screenshot of jTrak - Does it feel like it's 1999?">
    <p>Screenshot of jTrak - Does it feel like it&#8217;s&nbsp;1999?</p>
</div>

<p>Getting this data off your computer isn&#8217;t trivial. Dive computers are
expensive for a couple of reasons: they&#8217;re produced in relatively low
volumes, they often license patented algorithms for estimating your
air consumption and remaining bottom time, and, of course, they need to be waterproof. This
means that you can&#8217;t just drop a <span class="caps">USB</span> port on the outside of the
case. Nor can you just put a <span class="caps">USB</span> port under a rubber flap. At 30
meters you&#8217;re facing about 400kPa of pressure - four times the
pressure at the surface. Water will find a way in. If it gets in the
salt will corrode everything and it will die. Thus, dive computers
tend to be very well sealed and make even trivial things like changing
the battery a process that requires tools and new grease for the&nbsp;O-rings.</p>
<p>There really isn&#8217;t a standard interface to these devices. It seems as
though a lot of devices, such as the <a href="http://www.mares.com/products/computers/wrist-computer/puck-wrist/371/?region=us">Mares puck computers</a>, have
corrosion resistant metallic contacts that connect to a <span class="caps">USB</span> controller
with an <span class="caps">FTDI</span> <span class="caps">USB</span>&rarr;Serial chip in it. However, the Uwatec Galileo
decided to be more advanced and use what I&#8217;m sure was the hip
protocol at the time:&nbsp;IrDA.</p>
<p>Now, in case you missed it, IrDA was all the rage in the 1990&#8217;s and
early 2000&#8217;s. Every laptop seemed to ship with an IrDA port built
in. You could use it to synchronized data with your Palm or Handspring
in the late 1990&#8217;s. Once cell phones were more common you could even
tether your laptop to your cell phone and get very slow data. In the
pre-wifi, pre-edge days this was pretty hot stuff. &#8220;Was&#8221; being the key
word. Hot stuff being around the speed of the 28.8k modem that I used
when back in&nbsp;1994.</p>
<p>You can still find devices that use IrDA, most notably a lot of the
<a href="http://www.polarusa.com/">heart rate monitors from Polar</a>, but for the most part the technology
is from about 10 years ago. This also means that you&#8217;re dealing with
the headaches of 10 years ago, including the near total lack of Mac
support for devices. Those that do support the Mac often only support
the <span class="caps">PPC</span> Mac and never really fully supported it anyway. Did I mention
that MacOS X doesn&#8217;t even have full support for IrDA? Just try opening
up a socket using <code>AF_INET</code>. It doesn&#8217;t exist. Ughh. This was going to
be a great&nbsp;adventure.</p>
<h2>Setting Baseline&nbsp;Expectations</h2>
<p>My first naive attempts were to hack an IrDA driver into the framework
of <a href="http://www.divesoftware.org/libdc/">libdivecomputer</a>. There was already support for IrDA
dive computers under Windows and Linux, and I had confirmed that they
worked just fine with my my computer, how hard could it be? The answer is
a lot more complicated than I thought. The first step was to find an
IrDA dongle that even worked with Mac <span class="caps">OS</span> X. I ordered a couple of cheap
ones off eBay and had no luck. I read a few comments from folks saying that
the official <a href="http://www.scubapro.com/en-US/USA/instruments/computers/products/usb-infrared-interface-(irda).aspx">Uwatec <span class="caps">USB</span>-&gt;IrDA devices</a> worked with <a href="http://www.frobese.de/jtrak/en/jtrak.html">JTrak</a> on Mac <span class="caps">OS</span> X, however
the official dongles about $70 and JTrak is a bit less than what I&#8217;m looking for in a
dive logging software. Fortunately, I was able to find another device that looked nearly identical from the outside - the <a href="http://www.amazon.com/gp/product/B004FEQ9UW/ref=oh_details_o05_s00_i00">IRJoy <span class="caps">USB</span> 2.0 <span class="caps">USB</span> IrDA adapter</a> for $30. When this guy arrived a quick scan showed that it was the exact same hardware as the official Uwatec dongle - both were based on the MosChip 7780&nbsp;chipset.</p>
<p>Plugging the device into my trusty Thinkpad x31 showed that it quickly and easily worked both in Windows and Linux using the SmartTrak software from Uwatec, JTrak, and the test applications from libdivecomputer. I knew that I could at least make some progress. Next up was to test it on my Mac. I plugged in the IrDA stick and fired up JTrak and to my amazement it just worked. That <span class="caps">NEVER</span> happens. Poking around showed why it worked, the company behind JTrak had licensed a complete pure Java IrDA stack. Well, at least I could use JTrak if everything else failed. However I had my eyes set on something much prettier, <a href="http://www.mac-dive.com/">MacDive</a>.</p>
<h2>Writing a&nbsp;Driver</h2>
<p>I had heard people refer to the fact that the MosChip devices had a Mac driver, but most of those conversations ended many years ago &emdash; as if I needed more evidence that I was dealing with a dead protocol. After some digging around and emailing random customer service addresses I found that the <span class="caps">IP</span> for the MosChip devices were sold off to a company in Taiwan called Asix. They provided a couple of different versions of the driver and I eventually found one that worked in full 64 bit mode on Mac <span class="caps">OS</span> X Lion.&nbsp;Score.</p>
<p>The driver came with a simple test application that would let me read the data coming over the device as though it was a serial device. Using this test application I was able to position the reader in the line of sight of other IrDA devices and receive data. Neat. The problem is that I was getting the raw bytes of the IrDA sockets. There&#8217;s a lot of overhead in there that goes along with handshaking, setting speed, and resending data when connections are interrupted. None of this seemed to be enabled in the driver. The driver simply provided a couple of serial devices that I could open up and use to smack bits back and forth. If I wanted this to work I would need to write a complete IrDA stack on top of this serial&nbsp;device.</p>
<p>The problem is that the <a href="http://en.wikipedia.org/wiki/Infrared_Data_Association">IrDA stack is actually fairly complex</a>. Theres&#8217;s a myriad of different protocols that stack on top of IrDA to make everything work. This was basically the equivalent of trying to implement <span class="caps">TCP</span>/<span class="caps">IP</span> using just the raw bits coming over the 802.11 physical layer. In other words, it was a nasty layer mismatch that was not going to do me any&nbsp;favors.</p>
<div class="image caption center">
    <img src="/weblog/media/2012/12/irdaStack.png" alt="The Multifaceted IrDA Stack - From Wikipedia">
    <p>The Multifaceted IrDA Stack - From&nbsp;Wikipedia</p>
</div>

<p>I continued to email Asix, who were more than helpful, although they seemed most concerned that I would write a driver that would let the user transfer files with Windows and cell phones. After a few more emails I explained was a dive computer was and how much of a niche this issue was and Asix offered me an <span class="caps">NDA</span> to work on the driver
and attempt to implement the <code>AF_INET</code> stack for Mac <span class="caps">OS</span> X. If I were in
undergrad this would probably sound like a great idea. However, I&#8217;m not.
I&#8217;ve got a job that keeps me quite busy and has me flying back and forth
between New York and Washington on a weekly basis. I just don&#8217;t have the
time to acquire the knowledge needed to hack together a driver on Mac <span class="caps">OS</span> X.
Of course, there&#8217;s also the issue of me performing gratis work for a
for-profit company, which I didn&#8217;t really want to do&nbsp;either.</p>
<h2>VMs to the&nbsp;Rescue</h2>
<p>This left me with really only one simple solution, use what I know
already works for communicating with the Galileo Luna, Linux or Windows.
In an effort to keep this simple and avoid worrying about license issues
I chose to use a very minimal Linux installation under VirtualBox as my
guest environment. The next problem was the software to make use of my
data. There were a couple of different ways to handle this, either do
all of my log work inside of the virtual machine, or just download the
data in the virtual machine and copy it over to my Mac to do most of the
work on the log. Starting up a <span class="caps">VM</span> is a bit of a pain, so the choice was
made to use mac dive log software and download the data in the <span class="caps">VM</span> then
copy it&nbsp;over.</p>
<p>There are a couple of different formats that might be able to fit the bill, <span class="caps">SDE</span>, <span class="caps">UDCF</span>, <span class="caps">UDDF</span>, and <span class="caps">ZXL</span>. <span class="caps">SDE</span> is the output format from Suunto Dive Explorer software. There doesn&#8217;t appear to be much documentation for the format, but it supposedly contains all the necessary information that a diver might want to recreate a dive log on a computer. Supposedly <a href="http://subsurface.hohndel.org/">Subsurface</a>, a dive log software package by Linus Torvalds, can import from <span class="caps">SDE</span>, so there should be some source code there that I just haven&#8217;t had a chance to dig at yet. <span class="caps">ZXL</span> is a format designed by <span class="caps">DAN</span> to collect information for scientific studies of diving related injuries. <a href="http://www.streit.cc/dive/page17/page15.html"><span class="caps">UDCF</span></a> and <a href="http://www.streit.cc/extern/uddf_v310/en/index.html"><span class="caps">UDDF</span></a> are formats developed by a group of interested divers that seem to achieved moderate success. <span class="caps">UDCF</span> can be considered to be the little brother to the more robust <span class="caps">UDDF</span>. Many tools support <span class="caps">UDCF</span>, but it lacks official mechanisms to do things like save the pressure in a&nbsp;tank.  </p>
<p>The most promising format seems to be <a href="http://www.streit.cc/extern/uddf_v310/en/index.html"><span class="caps">UDDF</span> - the Universal Dive Data Format</a>. <span class="caps">UDDF</span>, like most interchange formats, sadly uses <span class="caps">XML</span> so it is parseable by neither humans nor machines. It is able to contain information about dive profile, temperature, and air usage, which are the main things I want to track. I wasn&#8217;t able to find a tool that used libdivecomputer to produce a <span class="caps">UDDF</span> file, so I wrote my own, the cleverly named <a href="https://github.com/pridkett/dc2uddf">dc2uddf</a>.</p>
<p>dc2uddf is a simple tool that uses <a href="http://www.divesoftware.org/libdc/">libdivecomputer</a> and <a href="http://www.xmlsoft.org/">libxml2</a> to download data from a dive computer and save it as a <span class="caps">UDDF</span> file. That&#8217;s all it does. There isn&#8217;t much of a user interface, but it works, and it&#8217;s written in C, which makes me feel a little more like a programmer than I normally do. I&#8217;m certain there are some things that it is doing incorrectly, if folks discover problems email me or [file issues on github][ghissues] and I&#8217;ll be sure to fix them. Along the way I&#8217;ve also found several defects in the <span class="caps">UDDF</span> standard, so I feel like I&#8217;m making the standard better&nbsp;too.</p>
<p>Now I&#8217;m at the point where I can download the data using a Linux <span class="caps">VM</span> and then
copy the data over to my Mac where I can easily import it into the excellent
<a href="http://www.mac-dive.com/">MacDive</a> software, as you can see&nbsp;below.</p>
<div class="image caption center">
    <img src="/weblog/media/2012/12/macdive.png" alt="The Pretty-Pretty Output of MacDive">
    <p>The Pretty-Pretty Output of&nbsp;MacDive</p>
</div>

<h2>The&nbsp;Future</h2>
<p>I&#8217;ve thought about a couple of ways that I could make this a bit more streamlined. The current candidate is to get a <a href="http://www.raspberrypi.org/">Raspberry Pi</a> board and create a small dedicated device for downloading dive computer data. Basically you&#8217;d turn it on, put the dive computer within range, press a button on the case for the Raspberry Pi and your data would be automatically downloaded. You could feed it an <span class="caps">SD</span> card and it could either use the configuration file on the <span class="caps">SD</span> card to upload the data to a remote host or just store a copy on the <span class="caps">SD</span> card. However, given the long waits for Raspberry Pis at the current time and my busy schedule I&#8217;ll just have to wait on that&nbsp;idea.</p>
<p>I&#8217;ve also toyed with the idea of making a service that provides real analytics on dives. Right now there are a couple of different sites that allow you to share dive logs. <a href="http://www.diveboard.com/">DiveBoard</a> seems to be the most cross-platform of sites and they&#8217;ve even developed a browser plugin based on libdivecomputer to automatically upload your dives from your browser. Aside from their plugin they allow users to upload <span class="caps">UDCF</span>, <span class="caps">SDE</span>, and <span class="caps">ZXL</span> files. They&#8217;ve even gone so far as to extend <span class="caps">UDCF</span> to allow for pressure information &mdash; although this seems to be a clear sign to me that they should consider allowing <span class="caps">UDDF</span>&nbsp;uploads.</p>
<p>Another community is <a href="http://www.movescount.com/">Suunto Movescount</a>. This is the successor to Suunto&#8217;s Dive Explorer software and reflects the fact that they&#8217;ve moved beyond just diving metrics. The problem is that as near as I can see it&#8217;s a locked platform. There doesn&#8217;t appear to be any way to get your data out of it, or, for that matter, get data from non-Suunto devices into&nbsp;it.</p>
<p>Both of these sites are missing some of the potential for such sites, which is the ability to measure and track rather than just keeping a log. It&#8217;s something that sites like <a href="http://www.runkeeper.com/">RunKeeper</a> are just beginning to explore with efforts like their FitnessReports, but even those reports are rather cursory. There&#8217;s a number of metrics that we can calculate both on an individual and across a community that would be highly beneficial to everyone involved - divers, dive shops, travel agents, tour operators, and gear manufacturers, to name just a few. However, the description of these analytics will have to wait for a future&nbsp;post.</p>
                        ]]></description>
    <pubDate>Tue, 04 Dec 2012 19:25:00 UTC</pubDate>
    <guid>http://patrick.wagstrom.net/weblog/2012/12/04/30-meters-underwater-with-a-dead-physical-layer-protocol/</guid>
</item>
                    <item>
    <title>On the Facebook Terms of Service</title>
    <link>http://patrick.wagstrom.net/weblog/2012/11/26/on-the-facebook-terms-of-service/</link>
    <description><![CDATA[
                            <p>Recently I&#8217;ve seen a number of friends and acquantences post some variation of the following message to their Facebook&nbsp;walls:</p>
<blockquote>
<p>In response to the new Facebook guidelines I hereby declare that my copyright is attached to all of my personal details, illustrations, comics, paintings, professional photos and videos, etc. (as a result of the Berne Convention). For commercial use of the above my written consent is needed at all&nbsp;time.</p>
<p>By the present communiqué, I notify Facebook that it is strictly forbidden to disclose, copy, distribute, disseminate, or take any other action against me on the basis of this profile and/or its content. The aforementioned prohibited actions also apply to employees, students, agents and/or any staff under Facebook&#8217;s direction or&nbsp;control.</p>
<p>The content of this profile is private and confidential information. The violation of my privacy is punished by law (<span class="caps">UCC</span> 1 1-308-308 1-103 and the Rome&nbsp;Statute).</p>
<p>Facebook is now an open capital entity. All members are recommended to publish a notice like this, or if you prefer, you may copy and paste this version. If you do not publish a statement at least once, you will be tacitly allowing the use of elements such as your photos as well as the information contained in your profile status&nbsp;updates.</p>
</blockquote>
<p>The intent of these postings is to limit the way that Facebook is legally allowed to use or share your information. On the one hand this makes me happy because it seems as though some people are taking their privacy seriously, on the other hand, it&#8217;s very frustrating because of the ham-fisted way people are going bout&nbsp;this.</p>
<p>The crux of the problem is that the <a href="https://www.facebook.com/legal/terms">Facebook Terms of Service</a> supersede any declaration or addendum you attempt to make toward Facebook. Specifically clause&nbsp;19.5:</p>
<blockquote>
<p>Any amendment to or waiver of this Statement must be made in writing and signed by&nbsp;us.</p>
</blockquote>
<p>However, you might think there is a loophole that will protect you somehow. Maybe something that Facebook forgot to expressly enumerate. Sorry, that&#8217;s covered in clause&nbsp;19.10:</p>
<blockquote>
<p>We reserve all rights not expressly granted to&nbsp;you.</p>
</blockquote>
<p>As an additional level of backup the posts typically attempt to cite various portions of the <a href="http://en.wikipedia.org/wiki/Uniform_Commercial_Code">Uniform Commercial Code</a>, most often <a href="http://www.law.cornell.edu/ucc/1/article1.htm">Article 1</a>. First, it&#8217;s important to understand what the <span class="caps">UCC</span> is. It is <span class="caps">NOT</span> some overarching set of Federal Laws. The <span class="caps">UCC</span> is an attempt to harmonize various state laws and make it easier to do business across state lines. In some ways you can think of the <span class="caps">UCC</span> a little like the Talmud, the text is important, but so are the comments that go along with it. Unfortunately, the text and comments are copyright, so these semi-binding documents are not accessible to the common man (that&#8217;s a whole different problem, one which <a href="http://public.resource.org/">Carl Malamud and Public.Resource.org</a> are attempting to&nbsp;remedy.</p>
<p>Anyway, we&#8217;ll ignore for a moment that the entirety of Article 1 of the <span class="caps">UCC</span> deals with definitions and ways to interpret further rules, and therefore probably isn&#8217;t the thing you&#8217;re looking for. The first reference, <span class="caps">UCC</span> 1-308 (which is often mistyped 1-308-308, which renders it null in the eyes of the law)&nbsp;reads:</p>
<blockquote>
<p><strong>§ 1-308. Performance or Acceptance Under Reservation of&nbsp;Rights.</strong></p>
<p>(a) A party that with explicit reservation of rights performs or promises performance or assents to performance in a manner demanded or offered by the other party does not thereby prejudice the rights reserved. Such words as &#8220;without prejudice,&#8221; &#8220;under protest,&#8221; or the like are&nbsp;sufficient.</p>
<p>(b) Subsection (a) does not apply to an accord and&nbsp;satisfaction.</p>
</blockquote>
<p>However, the issue with 1-308 is that your Facebook content, while being a creative work, isn&#8217;t a performance in most cases. There isn&#8217;t a transaction from Facebook unto you for performing such an action, therefore this most likely doesn&#8217;t&nbsp;apply.</p>
<p>Second is <span class="caps">UCC</span> 1-103, I have no idea how this got mixed up in&nbsp;here:</p>
<blockquote>
<p><strong>§ 1-103. Construction of [Uniform Commercial Code] to Promote its Purposes and Policies: Applicability of Supplemental Principles of&nbsp;Law.</strong></p>
<p>(a) [The Uniform Commercial Code] must be liberally construed and applied to promote its underlying purposes and policies, which are: (1) to simplify, clarify, and modernize the law governing commercial transactions; (2) to permit the continued expansion of commercial practices through custom, usage, and agreement of the parties; and (3) to make uniform the law among the various&nbsp;jurisdictions.</p>
<p>(b) Unless displaced by the particular provisions of [the Uniform Commercial Code], the principles of law and equity, including the law merchant and the law relative to capacity to contract, principal and agent, estoppel, fraud, misrepresentation, duress, coercion, mistake, bankruptcy, and other validating or invalidating cause supplement its&nbsp;provisions.</p>
</blockquote>
<p>Reading through this I can&#8217;t understand why 1-103 was even brought into this. It&#8217;s a simple description of the <span class="caps">UCC</span> and highlighting that unless the <span class="caps">UCC</span> attempts to supersede laws for things like fraud, duress, and bankruptcy, that they stay in&nbsp;effect.</p>
<p>Finally, let&#8217;s look at the appeal of the Rome Statute. I&#8217;m going to out on a limb here and say this was added by someone in Europe as the original postings I saw by Americans didn&#8217;t include this caveat. I&#8217;m assuming that the Rome Statute refers to the <a href="http://en.wikipedia.org/wiki/Rome_Statute_of_the_International_Criminal_Court">Rome Statute of the International Criminal Court</a>. This international agreement established the international criminal court and gave the <span class="caps">UN</span> authority to investigate crimes when the host nations have chosen not to investigate. For example, the <span class="caps">ICC</span> often comes into play with state sponsored&nbsp;genocide. </p>
<p>One could easily argue that the United States has initiated investigations in privacy and Facebook (see the <a href="http://www.judiciary.senate.gov/hearings/hearing.cfm?id=daba530c0e84f5186d785e4894e78220">Senate Judiciary Committee Subcommittee on Privacy, Technology and the Law meeting on July 18, 2012</a> when Franken tore into Facebook&#8217;s manager of Privacy and Public Policy). The fact that the <span class="caps">US</span> is conducting investigations would seem to disallow the <span class="caps">ICC</span> any sort of jurisdiction. would therefore make such an investigation outside the bounds of the International Criminal Court &mdash; which really has non-first-world-problems to deal with, like&nbsp;genocide.</p>
<p>In short, if you&#8217;re really concerned about your privacy posting such a message on Facebook doesn&#8217;t do anything other than annoy your friends. If you&#8217;re really concerned about your privacy on Facebook you need to stop using it&nbsp;altogether.</p>
<p><strong><em>Important Disclaimer:</em></strong> I am not a lawyer. I&#8217;m merely someone who took the time to read the Facebook Terms of Service and look up the relevant portions of the law that people are attempting to quote. None of this should be regarded as real legal&nbsp;advice.</p>
                        ]]></description>
    <pubDate>Mon, 26 Nov 2012 13:05:00 UTC</pubDate>
    <guid>http://patrick.wagstrom.net/weblog/2012/11/26/on-the-facebook-terms-of-service/</guid>
</item>
                    <item>
    <title>Mining GitHub - Followers in Tinkerpop</title>
    <link>http://patrick.wagstrom.net/weblog/2012/05/13/mining-github-followers-in-tinkerpop/</link>
    <description><![CDATA[
                            <p>Development of any moderately complex software package is a social
process. Even if a project is developed entirely by a single person,
there is still a social component that consists of all of the people
who use the software, file bugs, and provide recommendations for
enhancements. This social aspect is one of the driving forces behind
the proliferation of social software development sites such as
<a href="http://www.github.com/">GitHub</a>, <a href="http://www.sourceforge.net/">SourceForge</a>, <a href="http://code.google.com/">Google Code</a>, and <a href="http://www.bitbucket.org/">BitBucket</a>.</p>
<p>These sites combine together a variety of tools that are common for
software development such as version control, bug trackers, mailing lists,
release management, project planning, and wikis. In addition, some of
these have more social aspects that allow you find and follow
individual developers or watch particular projects. In this post I&#8217;m
going to show you how we can use some this information to gain insight
into a software development community, specifically the community
around the <a href="http://www.tinkerpop.com/">Tinkerpop</a> stack of tools for graph&nbsp;databases.</p>
<h1>Graph&nbsp;Databases</h1>
<p>Graph Databases are in the broad family of NoSQL databases. For about
30 years the dominant form of data storage and access has been through
relational databases (e.g. Oracle, MySQL, PostgreSQL, <span class="caps">DB2</span>, etc). These
present your data as a table with various rows. These tables can have
constraints and pointers that map a column in one table to a column in
another table through a process called a join. In this way it&#8217;s
possible to create relations between records and build rich
collections of&nbsp;data.</p>
<p>Relational databases are very nice and can scale fairly well, but they&#8217;re
not suitable for all problems. In particular, there may be cases where
atomicity can be sacrificed in exchange for higher performance or
where the schema of the data may frequently change resulting in severe
problems mapping the data to a traditional&nbsp;database.</p>
<p>This has led to a multitude of different solutions for data storage
and access. Some of the more popular solutions are Google&#8217;s <a href="http://research.google.com/archive/bigtable.html">BigTable</a>
for distributed data storage, <a href="http://www.mongodb.org/">MongoDB</a> for a schemaless document
database, and <a href="http://memcached.org/">Memcached</a> for distributed object storage and
caching. These alternative style of databases are generally lumped
into a category of NoSQL, which means either &#8220;Not <span class="caps">SQL</span>&#8221; or &#8220;Not Only
<span class="caps">SQL</span>&#8221; or perhaps something else depending on who you speak&nbsp;to.</p>
<p>A specific subclass of NoSQL databases is graph databases. A graph
database represents your data a network of vertices and edges that
connect them. Vertices and edges can have various properties that
define the object. As opposed to traditional databases where a query
crawls over the entire table to find the appropriate elements, queries
within a graph database are often done via traversals that walk the
graph from one node to another. Examples of graph databases are
<a href="http://www.neo4j.org/">Neo4j</a>, <a href="http://www.orientechnologies.com/orient-db.htm">OrientDB</a>, <a href="http://research.microsoft.com/en-us/projects/trinity/">Trinity</a>,
<a href="http://www.infinitegraph.com/">InfiniteGraph</a>, and <a href="http://www.sparsity-technologies.com/dex">Dex</a>. A complete description
of these databases are beyond the simple explanation here, but
Wikipedia has a decent <a href="http://en.wikipedia.org/wiki/Graph_database">primer on graph databases</a>.</p>
<h1>Tinkerpop&nbsp;Background</h1>
<p>Tinkerpop is a loosely coupled virtual organization centered around
Marko Rodriguez that develops infrastructure libraries and interfaces
for graph&nbsp;databases.</p>
<div class="image caption center">
    <img src="/weblog/media/2012/05/tinkerpop.png">
    <p></p>
</div>

<p>Tinkerpop has six major projects that are hosted on&nbsp;Github:</p>
<ul>
<li><a href="https://github.com/tinkerpop/pipes">Pipes</a>: A general data flow and processing&nbsp;framework</li>
<li><a href="https://github.com/tinkerpop/blueprints">Blueprints</a>: A library to abstract graph database&nbsp;interfaces</li>
<li><a href="https://github.com/tinkerpop/gremlin">Gremlin</a>: A domain specific language for traversing&nbsp;graphs</li>
<li><a href="https://github.com/tinkerpop/frames">Frames</a>: An object mapper for graph&nbsp;databases</li>
<li><a href="https://github.com/tinkerpop/rexster">Rexster</a>: A general web interface for Blueprints supported&nbsp;databases</li>
<li><a href="https://github.com/tinkerpop/furnace">Furnace</a>: A library of algorithms for traversing&nbsp;graphs</li>
</ul>
<h1>The Tinkerpop&nbsp;Network</h1>
<p>As part of an ongoing research effort between <span class="caps">IBM</span> and the University
of Nebraska, Lincoln, I&#8217;ve written a tool called <a href="https://github.com/pridkett/gitminer">GitMiner</a>
that can connect to Github and pull down information on a set of
projects. In celebration of Gremlin hitting 600 watchers on Github, I
pulled the complete network for all of the Tinkerpop projects from
Github from May 1-3, 2012. This network contains the following pieces
of&nbsp;information:</p>
<p>In a future post I&#8217;ll provide more details of how you can use GitMiner
to access data on your own projects. I&#8217;ll also provide some pointers
to other data sets people may wish to&nbsp;analyze.</p>
<h1>Getting Started with&nbsp;Analysis</h1>
<p>For this analysis we&#8217;re going to use a couple of different software
packages. First, we&#8217;ll be using Gremlin to do some queries of the
database and to create exportable networks for further
analysis. Additional analysis will be conducted using <a href="http://www.r-project.org/">R</a>. These
instructions are generically for people running a Mac, Linux, or other
operating system with a posix-like command line interface. If you&#8217;re
on Windows you should be able to follow along but you&#8217;ll need to
modify the shell commands. All the tools used in this analysis are
cross-platform, open source, and freely&nbsp;available.</p>
<h2>Installing&nbsp;Gremlin</h2>
<p>I&#8217;m not going to repeat everything in the <a href="https://github.com/tinkerpop/gremlin/wiki">Gremlin docs</a>
here, but here&#8217;s a brief overview of what you&#8217;ll need to do to get
going on a Mac or&nbsp;:</p>
<div class="codehilite"><pre><span class="n">cd</span> <span class="o">~</span>
<span class="n">git</span> <span class="n">clone</span> <span class="n">git</span><span class="p">:</span><span class="o">//</span><span class="n">github</span><span class="p">.</span><span class="n">com</span><span class="o">/</span><span class="n">tinkerpop</span><span class="o">/</span><span class="n">gremlin</span><span class="p">.</span><span class="n">git</span>
<span class="n">cd</span> <span class="n">gremlin</span>
<span class="n">mvn</span> <span class="n">clean</span> <span class="n">compile</span> <span class="n">package</span>
</pre></div>


<p>This assumes that you&#8217;ve already got a nice java development
environment setup and that you have <a href="http://maven.apache.org/">maven</a> installed. If this
is your first time using maven to build any Java packages this can
take a long time as it will automatically download all of the
dependencies needed to compile and run&nbsp;Gremlin.</p>
<h2>Installing&nbsp;R</h2>
<p>R is a language for statistical computing. It&#8217;s slow, uses strange
syntax, and is a memory hog. In short, it&#8217;s quite possible one of the
worst possible ways to do this analysis. However, it also is the
dominant language in the field and provides a huge number of libraries
and tutorials that we&#8217;ll use for our&nbsp;analysis.</p>
<p>There are a variety of different ways to interact with R. If you&#8217;re on
Windows or a Mac the standard downloads of R have a decent graphical
interface for editing scripts and running commands. If you&#8217;re an Emacs
hacker, <a href="http://ess.r-project.org/"><span class="caps">ESS</span></a> is a great library that interfaces nicely with R. If
working inside of Eclipse is your thing, then use <a href="http://www.walware.de/goto/statet">StatEt</a>. Personally,
I use <a href="http://rstudio.org/">R-Studio</a> for most of my work. Further screenshots will be based
on R-Studio, but you should be able to follow along with other&nbsp;interfaces.</p>
<p>Installing R-Studio is straightforward. Visit the
<a href="http://rstudio.org/download/desktop">R-Studio Desktop download page</a> and download and
install the version for your&nbsp;platform.</p>
<h2>Downloading the&nbsp;Data</h2>
<p>I&#8217;ve posted the <a href="https://docs.google.com/open?id=0B43zbOfTSOoUZXQydlhGOW1kUHc">Tinkerpop Social Graph</a> as a Neo4j
database, you should visit it and download
<a href="https://docs.google.com/folder/d/0B43zbOfTSOoUZXQydlhGOW1kUHc/edit?docId=0B43zbOfTSOoUOWF6NjZNVnUtN2s">TinkerpopSocialGraph.20120501.db.tar.gz</a>. After
downloading it you should go into the directory where you downloaded
and compiled Gremlin and extract it. If you&#8217;re on a Mac or Linux, the commands
will generally be something like&nbsp;this:</p>
<div class="codehilite"><pre><span class="n">cd</span> <span class="o">~/</span><span class="n">gremlin</span>
<span class="n">tar</span> <span class="o">-</span><span class="n">zxvf</span> <span class="o">~/</span><span class="n">Downloads</span><span class="o">/</span><span class="n">TinkerpopSocialGraph</span><span class="p">.</span>20120501<span class="p">.</span><span class="n">db</span><span class="p">.</span><span class="n">tar</span><span class="p">.</span><span class="n">gz</span>
</pre></div>


<p>The dataset is fairly large, about <span class="caps">148MB</span> compressed. It&#8217;s quite a bit
of data and if you&#8217;re a lazy student taking your first <span class="caps">SNA</span> class it
should have enough data to do a really kick-ass class project. If
you&#8217;re a grad student and interested in writing a paper on this sort
of data <a href="mailto:patrick@wagstrom.net">email me</a> and we can probably&nbsp;collaborate.</p>
<h1>Exploring the&nbsp;Graph</h1>
<p>Gremlin provides a interactive interpreter that we can use to explore
the graph. You can start it up by running <code>./gremlin.sh</code>. Then run the
following commands. lines that begin with <code>gremlin&gt;</code> are the lines you
should type into the&nbsp;interpreter.</p>
<p>To begin with we we&#8217;ll connect to graph and get a specific node from
the database. In this case, we&#8217;ll pull up the node that represents
Marko Rodriguez, the main developer of tools from&nbsp;Tinkerpop.</p>
<div class="codehilite"><pre>         <span class="err">\</span><span class="o">,,,</span><span class="s">/</span>
<span class="s">         (o o)</span>
<span class="s">-----oOOo-(_)-oOOo-----</span>
<span class="s">gremlin&gt; g = new Neo4jGraph(&quot;tinkerpop/</span><span class="n">tinkerpop</span><span class="o">.</span><span class="na">db</span><span class="s2">&quot;)</span>
<span class="s2">==&gt;neo4jgraph[EmbeddedGraphDatabase [/Users/pwagstro/gremlin/tinkerpop/tinkerpop.db]]</span>
<span class="s2">gremlin&gt; marko = g.idx(&quot;</span><span class="n">user</span><span class="o">-</span><span class="n">idx</span><span class="s2">&quot;).get(&quot;</span><span class="n">login</span><span class="s2">&quot;,&quot;</span><span class="n">okram</span><span class="err">&quot;</span><span class="o">).</span><span class="na">next</span><span class="o">()</span>
<span class="o">==&gt;</span><span class="n">v</span><span class="o">[</span><span class="mi">8</span><span class="o">]</span>
<span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">marko</span><span class="o">.</span><span class="na">map</span><span class="o">()</span>
<span class="o">==&gt;</span><span class="n">location</span><span class="o">=</span><span class="n">Santa</span> <span class="n">Fe</span><span class="o">,</span> <span class="n">New</span> <span class="n">Mexico</span>
<span class="o">==&gt;</span><span class="n">sys_last_updated</span><span class="o">=</span><span class="mi">1335930109</span>
<span class="o">==&gt;</span><span class="n">blog</span><span class="o">=</span><span class="nl">http:</span><span class="c1">//markorodriguez.com</span>
<span class="o">==&gt;</span><span class="n">type</span><span class="o">=</span><span class="n"><span class="caps">USER</span></span>
<span class="o">==&gt;</span><span class="n">gravatarId</span><span class="o">=</span><span class="nl">https:</span><span class="c1">//secure.gravatar.com/avatar/fb12ea6a621399613aae4d692533e067?d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-140.png</span>
<span class="o">==&gt;</span><span class="n">followers</span><span class="o">=</span><span class="mi">57</span>
<span class="o">==&gt;</span><span class="n">following</span><span class="o">=</span><span class="mi">12</span>
<span class="o">==&gt;</span><span class="n">createdAt</span><span class="o">=</span><span class="mi">1257359950</span>
<span class="o">==&gt;</span><span class="n">name</span><span class="o">=</span><span class="n">Marko</span> <span class="n">A</span><span class="o">.</span> <span class="n">Rodriguez</span>
<span class="o">==&gt;</span><span class="n">login</span><span class="o">=</span><span class="n">okram</span>
<span class="o">==&gt;</span><span class="n">fullname</span><span class="o">=</span><span class="n">Marko</span> <span class="n">A</span><span class="o">.</span> <span class="n">Rodriguez</span>
<span class="o">==&gt;</span><span class="n">gitHubId</span><span class="o">=</span><span class="mi">148925</span>
<span class="o">==&gt;</span><span class="n">sys_events_added</span><span class="o">=</span><span class="mi">1335918859</span>
<span class="o">==&gt;</span><span class="n">user_type</span><span class="o">=</span><span class="n">User</span>
<span class="o">==&gt;</span><span class="n">totalPrivateRepoCount</span><span class="o">=</span><span class="mi">0</span>
<span class="o">==&gt;</span><span class="n">private_gist_count</span><span class="o">=</span><span class="mi">0</span>
<span class="o">==&gt;</span><span class="n">sys_last_full_update</span><span class="o">=</span><span class="mi">1335918850</span>
<span class="o">==&gt;</span><span class="n">biography</span><span class="o">=</span><span class="n">graph</span> <span class="n">algebra</span><span class="o">,</span> <span class="n">digital</span> <span class="n">librarianship</span><span class="o">,</span> <span class="n">computational</span> <span class="n">eudaemonics</span><span class="o">,</span> <span class="n">graph</span> <span class="n">theory</span><span class="o">,</span> <span class="n">network</span> <span class="n">science</span><span class="o">,</span> <span class="n">government</span> <span class="n">architecture</span><span class="o">,</span> <span class="n">network</span> <span class="n">metrics</span><span class="o">,</span> <span class="n">decision</span> <span class="n">support</span> <span class="n">systems</span><span class="o">,</span> <span class="n">computational</span> <span class="n">social</span> <span class="n">choice</span> <span class="n">theory</span><span class="o">,</span> <span class="n">social</span> <span class="n">networks</span><span class="o">,</span> <span class="n">scientometrics</span><span class="o">,</span> <span class="n">collective</span> <span class="n">intelligence</span><span class="o">,</span> <span class="n">semantic</span> <span class="n">networks</span><span class="o">,</span> <span class="n">ontologies</span><span class="o">,</span> <span class="n">bibliometrics</span><span class="o">,</span> <span class="n">information</span> <span class="n">science</span><span class="o">,</span> <span class="n">swarm</span> <span class="n">intelligence</span><span class="o">,</span> <span class="n">information</span> <span class="n">markets</span><span class="o">,</span> <span class="n">peer</span><span class="o">-</span><span class="n">review</span> <span class="n">process</span><span class="o">,</span> <span class="n">computational</span> <span class="n">sociology</span><span class="o">,</span> <span class="n">knowledge</span> <span class="n">engineering</span><span class="o">,</span> <span class="n">computer</span> <span class="n">architecture</span><span class="o">,</span> <span class="n">programming</span> <span class="n">languages</span><span class="o">,</span> <span class="n">theoretical</span> <span class="n">computing</span><span class="o">,</span> <span class="n">psychometrics</span><span class="o">,</span> <span class="n">multi</span><span class="o">-</span><span class="n">relational</span> <span class="n">graphs</span><span class="o">,</span> <span class="n">knowledge</span> <span class="n">representation</span><span class="o">,</span> <span class="n">reasoning</span><span class="o">,</span> <span class="n">neural</span> <span class="n">networks</span><span class="o">,</span> <span class="n">multi</span><span class="o">-</span><span class="n">valued</span> <span class="n">logic</span><span class="o">,</span> <span class="n">neural</span> <span class="n">growth</span> <span class="n">algorithms</span><span class="o">,</span> <span class="n">recommendation</span> <span class="n">algorithms</span><span class="o">,</span> <span class="n">distributed</span> <span class="n">computing</span><span class="o">,</span> <span class="n">ethics</span><span class="o">.</span>
<span class="o">==&gt;</span><span class="n">diskUsage</span><span class="o">=</span><span class="mi">0</span>
<span class="o">==&gt;</span><span class="n">url</span><span class="o">=</span><span class="nl">https:</span><span class="c1">//api.github.com/users/okram</span>
<span class="o">==&gt;</span><span class="n">public_gist_count</span><span class="o">=</span><span class="mi">14</span>
<span class="o">==&gt;</span><span class="n">collaborators</span><span class="o">=</span><span class="mi">0</span>
<span class="o">==&gt;</span><span class="n">email</span><span class="o">=</span><span class="n">marko</span><span class="nd">@markorodriguez.com</span>
<span class="o">==&gt;</span><span class="n">sys_created_at</span><span class="o">=</span><span class="mi">1335918699</span>
<span class="o">==&gt;</span><span class="n">ownedPrivateRepoCount</span><span class="o">=</span><span class="mi">0</span>
<span class="o">==&gt;</span><span class="n">public_repo_count</span><span class="o">=</span><span class="mi">0</span>
</pre></div>


<p>The values output by <code>marko.map()</code> are the properties of the vertex
that represents Marko in the database. With the exception of the
properties that being with <code>sys_</code>, which were added by
<a href="https://github.com/pridkett/gitminer">GitMiner</a> when the data were imported, all of the other
properties are obtained directly from the <a href="http://api.github.com/">GitHub <span class="caps">API</span></a>.</p>
<p>In a similar vein we can get the vertex that represents Gremlin using
the following&nbsp;commands:</p>
<div class="codehilite"><pre><span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">gremlin</span> <span class="o">=</span> <span class="n">g</span><span class="o">.</span><span class="na">idx</span><span class="o">(</span><span class="s2">&quot;repo-idx&quot;</span><span class="o">).</span><span class="na">get</span><span class="o">(</span><span class="s2">&quot;reponame&quot;</span><span class="o">,</span> <span class="s2">&quot;tinkerpop/gremlin&quot;</span><span class="o">).</span><span class="na">next</span><span class="o">()</span>
<span class="o">==&gt;</span><span class="n">v</span><span class="o">[</span><span class="mi">673</span><span class="o">]</span>
<span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">gremlin</span><span class="o">.</span><span class="na">map</span><span class="o">()</span>
<span class="o">==&gt;</span><span class="n">openIssues</span><span class="o">=</span><span class="mi">17</span>
<span class="o">==&gt;</span><span class="n">isFork</span><span class="o">=</span><span class="kc">false</span>
<span class="o">==&gt;</span><span class="n">sshUrl</span><span class="o">=</span><span class="n">git</span><span class="nd">@github.com</span><span class="o">:</span><span class="n">tinkerpop</span><span class="s">/gremlin.git</span>
<span class="s">==&gt;pushedAt=1335827022</span>
<span class="s">==&gt;sys_last_updated=1335929775</span>
<span class="s">==&gt;type=<span class="caps">REPOSITORY</span></span>
<span class="s">==&gt;masterBranch=master</span>
<span class="s">==&gt;htmlUrl=https://github.com/</span><span class="n">tinkerpop</span><span class="s">/gremlin</span>
<span class="s">==&gt;hasIssues=true</span>
<span class="s">==&gt;isPrivate=false</span>
<span class="s">==&gt;createdAt=1258695334</span>
<span class="s">==&gt;description=A Graph Traversal Language</span>
<span class="s">==&gt;name=gremlin</span>
<span class="s">==&gt;cloneUrl=https://github.com/</span><span class="n">tinkerpop</span><span class="s">/gremlin.git</span>
<span class="s">==&gt;gitUrl=git://github.com/</span><span class="n">tinkerpop</span><span class="s">/gremlin.git</span>
<span class="s">==&gt;fullname=tinkerpop/</span><span class="n">gremlin</span>
<span class="o">==&gt;</span><span class="n">watchers</span><span class="o">=</span><span class="mi">600</span>
<span class="o">==&gt;</span><span class="n">gitHubId</span><span class="o">=</span><span class="mi">379199</span>
<span class="o">==&gt;</span><span class="n">svnUrl</span><span class="o">=</span><span class="nl">https:</span><span class="c1">//github.com/tinkerpop/gremlin</span>
<span class="o">==&gt;</span><span class="n">homepage</span><span class="o">=</span><span class="nl">http:</span><span class="c1">//gremlin.tinkerpop.com</span>
<span class="o">==&gt;</span><span class="n">url</span><span class="o">=</span><span class="nl">https:</span><span class="c1">//api.github.com/repos/tinkerpop/gremlin</span>
<span class="o">==&gt;</span><span class="n">size</span><span class="o">=</span><span class="mi">341021</span>
<span class="o">==&gt;</span><span class="n">updatedAt</span><span class="o">=</span><span class="mi">1335827026</span>
<span class="o">==&gt;</span><span class="n">forks</span><span class="o">=</span><span class="mi">30</span>
<span class="o">==&gt;</span><span class="n">sys_created_at</span><span class="o">=</span><span class="mi">1335918734</span>
<span class="o">==&gt;</span><span class="n">hasDownloads</span><span class="o">=</span><span class="kc">true</span>
<span class="o">==&gt;</span><span class="n">language</span><span class="o">=</span><span class="n">Java</span>
<span class="o">==&gt;</span><span class="n">reponame</span><span class="o">=</span><span class="n">tinkerpop</span><span class="o">/</span><span class="n">gremlin</span>
<span class="o">==&gt;</span><span class="n">hasWiki</span><span class="o">=</span><span class="kc">true</span>
</pre></div>


<p>While this provides a lot of information about individual vertices in
the database, it doesn&#8217;t provide information about how projects or
people are related. We get at this information by looking at the edges
connected to a vertex. Within databases such as Neo4j and OrientDB
edges are directed and always got from a single source node to a
single target node. This query will iterate over all of the outgoing
edges from Marko and count up their&nbsp;types.</p>
<div class="codehilite"><pre><span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">m</span> <span class="o">=</span> <span class="o">[:]</span>
<span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">marko</span><span class="o">.</span><span class="na">outE</span><span class="o">.</span><span class="na">label</span><span class="o">.</span><span class="na">groupCount</span><span class="o">(</span><span class="n">m</span><span class="o">).</span><span class="na">iterate</span><span class="o">();</span> <span class="kc">null</span>
<span class="o">==&gt;</span><span class="kc">null</span>
<span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">m</span><span class="o">.</span><span class="na">sort</span><span class="o">{</span><span class="n">a</span><span class="o">,</span><span class="n">b</span> <span class="o">-&gt;</span> <span class="n">a</span><span class="o">.</span><span class="na">value</span> <span class="o">&lt;=&gt;</span> <span class="n">b</span><span class="o">.</span><span class="na">value</span><span class="o">}</span>
<span class="o">==&gt;</span><span class="n"><span class="caps">EMAIL</span></span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">ORGANIZATION_MEMBER</span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n"><span class="caps">GRAVATAR</span></span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">ISSUE_ASSIGNEE</span><span class="o">=</span><span class="mi">9</span>
<span class="o">==&gt;</span><span class="n"><span class="caps">FOLLOWING</span></span><span class="o">=</span><span class="mi">12</span>
<span class="o">==&gt;</span><span class="n">REPO_WATCHED</span><span class="o">=</span><span class="mi">34</span>
<span class="o">==&gt;</span><span class="n">PULLREQUEST_COMMENT_OWNER</span><span class="o">=</span><span class="mi">37</span>
<span class="o">==&gt;</span><span class="n"><span class="caps">FOLLOWER</span></span><span class="o">=</span><span class="mi">57</span>
<span class="o">==&gt;</span><span class="n">USER_EVENT</span><span class="o">=</span><span class="mi">300</span>
<span class="o">==&gt;</span><span class="n">ISSUE_OWNER</span><span class="o">=</span><span class="mi">404</span>
<span class="o">==&gt;</span><span class="n">ISSUE_COMMENT_OWNER</span><span class="o">=</span><span class="mi">639</span>
<span class="o">==&gt;</span><span class="n">ISSUE_EVENT_ACTOR</span><span class="o">=</span><span class="mi">700</span>
</pre></div>


<p>There are a lot of types of edges in the database (see [EdgeType.java
in the project source][edgetype] for the complete list). In this case
we&#8217;ll focus on the project social network, which is shown through the
<code>FOLLOWING</code> and <code>FOLLOWER</code> relationships. At the time of data pull
Marko was following 12 people and had 57&nbsp;followers.</p>
<p>Likewise, we can do a similar query for incoming&nbsp;edges:</p>
<div class="codehilite"><pre><span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">m</span> <span class="o">=</span> <span class="o">[:]</span>
<span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">marko</span><span class="o">.</span><span class="na">inE</span><span class="o">.</span><span class="na">label</span><span class="o">.</span><span class="na">groupCount</span><span class="o">(</span><span class="n">m</span><span class="o">).</span><span class="na">iterate</span><span class="o">();</span> <span class="kc">null</span> 
<span class="o">==&gt;</span><span class="kc">null</span>
<span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">m</span><span class="o">.</span><span class="na">sort</span><span class="o">{</span><span class="n">a</span><span class="o">,</span><span class="n">b</span> <span class="o">-&gt;</span> <span class="n">a</span><span class="o">.</span><span class="na">value</span> <span class="o">&lt;=&gt;</span> <span class="n">b</span><span class="o">.</span><span class="na">value</span><span class="o">}</span>           
<span class="o">==&gt;</span><span class="n">REPO_CONTRIBUTOR</span><span class="o">=</span><span class="mi">6</span>
<span class="o">==&gt;</span><span class="n"><span class="caps">FOLLOWER</span></span><span class="o">=</span><span class="mi">9</span>
<span class="o">==&gt;</span><span class="n">EVENT_FOLLOW_USER</span><span class="o">=</span><span class="mi">14</span>
<span class="o">==&gt;</span><span class="n">PULLREQUEST_MERGED_BY</span><span class="o">=</span><span class="mi">27</span>
<span class="o">==&gt;</span><span class="n"><span class="caps">FOLLOWING</span></span><span class="o">=</span><span class="mi">41</span>
</pre></div>


<p>When we reverse the direction and look at incoming edges these numbers
differ, and it shows that there are only nine people that Marko is a
follower of and 41 people that Marko is following. The difference in
these values is because the data only contains the sample of people
around the Tinkerpop projects. Thus, we can see that there are
57-41=16 people that are following Marko that don&#8217;t show up in the
data. This is because they don&#8217;t have activity, such as creating
issues, commenting on issues, or watching a repository, that would
pick them up in our sample. We know they exist, but we don&#8217;t have much
information about&nbsp;them.</p>
<h1>Your First Graph&nbsp;Traversal</h1>
<p>Now that you&#8217;ve gotten a feel for getting information about a single
vertex in graph, it&#8217;s time to do a simple traversal. To start with,
lets get the names of all of the contributors to&nbsp;gremlin.</p>
<div class="codehilite"><pre><span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">gremlin</span><span class="o">.</span><span class="na">out</span><span class="o">(</span><span class="s1">&#39;REPO_CONTRIBUTOR&#39;</span><span class="o">).</span><span class="na">login</span>
<span class="o">==&gt;</span><span class="n">pauljackson</span>
<span class="o">==&gt;</span><span class="n">espeed</span>
<span class="o">==&gt;</span><span class="n">spmallette</span>
<span class="o">==&gt;</span><span class="n">invalid</span><span class="o">-</span><span class="n">email</span><span class="o">-</span><span class="n">address</span>
<span class="o">==&gt;</span><span class="n">joshsh</span>
<span class="o">==&gt;</span><span class="n">jramsdale</span>
<span class="o">==&gt;</span><span class="n">NQuinn</span>
<span class="o">==&gt;</span><span class="n">peterneubauer</span>
<span class="o">==&gt;</span><span class="n">tinkerpop</span>
<span class="o">==&gt;</span><span class="n">zcox</span>
<span class="o">==&gt;</span><span class="n">xedin</span>
<span class="o">==&gt;</span><span class="n">okram</span>
</pre></div>


<p>This query starts with the Gremlin vertex we identified before and
then follows all edges labeled <code>REPO_CONTRIBUTOR</code> which is GitHub&#8217;s
way of saying someone has code in the project repository. Once we&#8217;ve
followed all of those edges we can fetch the login name of the&nbsp;users.</p>
<p>In a similar vein, we can get the name of all of the projects that
Marko has contributed to using the following&nbsp;query:</p>
<div class="codehilite"><pre><span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">marko</span><span class="o">.</span><span class="na">in</span><span class="o">(</span><span class="s1">&#39;REPO_CONTRIBUTOR&#39;</span><span class="o">).</span><span class="na">fullname</span>
<span class="o">==&gt;</span><span class="n">tinkerpop</span><span class="s">/rexster</span>
<span class="s">==&gt;tinkerpop/</span><span class="n">furnace</span>
<span class="o">==&gt;</span><span class="n">tinkerpop</span><span class="s">/gremlin</span>
<span class="s">==&gt;tinkerpop/</span><span class="n">pipes</span>
<span class="o">==&gt;</span><span class="n">tinkerpop</span><span class="s">/blueprints</span>
<span class="s">==&gt;tinkerpop/</span><span class="n">frames</span>
</pre></div>


<p>Now, we can put the two together. Our first query got a list of all of
the people who contributed to Gremlin. Let&#8217;s take it step further and
get the list of all of the people who have contributed to projects
that Marko has contributed&nbsp;to.</p>
<div class="codehilite"><pre><span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">marko</span><span class="o">.</span><span class="na">in</span><span class="o">(</span><span class="s1">&#39;REPO_CONTRIBUTOR&#39;</span><span class="o">).</span><span class="na">out</span><span class="o">(</span><span class="s1">&#39;REPO_CONTRIBUTOR&#39;</span><span class="o">).</span><span class="na">login</span>
<span class="o">==&gt;</span><span class="n">joshsh</span>
<span class="o">==&gt;</span><span class="n">jordanlewis</span>
<span class="o">==&gt;</span><span class="n">okram</span>
<span class="o">==&gt;</span><span class="n">spmallette</span>
<span class="o">[</span> <span class="n"><span class="caps">OUTPUT</span></span> <span class="n"><span class="caps">TRUNCATED</span></span> <span class="n"><span class="caps">FOR</span></span> <span class="n"><span class="caps">BREVITY</span></span> <span class="o">]</span>
</pre></div>


<p>This, however shows many people multiples. Let&#8217;s just count how many
times each name appears and then sort the list. This will give a rough
idea of the people that Marko works closest&nbsp;to.</p>
<div class="codehilite"><pre><span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">m</span> <span class="o">=</span> <span class="o">[:]</span>
<span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">marko</span><span class="o">.</span><span class="na">in</span><span class="o">(</span><span class="s1">&#39;REPO_CONTRIBUTOR&#39;</span><span class="o">).</span><span class="na">out</span><span class="o">(</span><span class="s1">&#39;REPO_CONTRIBUTOR&#39;</span><span class="o">).</span><span class="na">login</span><span class="o">.</span><span class="na">groupCount</span><span class="o">(</span><span class="n">m</span><span class="o">).</span><span class="na">iterate</span><span class="o">();</span>
<span class="kc">null</span>
<span class="o">==&gt;</span><span class="kc">null</span>
<span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">m</span><span class="o">.</span><span class="na">sort</span><span class="o">{</span><span class="n">a</span><span class="o">,</span><span class="n">b</span> <span class="o">-&gt;</span> <span class="n">a</span><span class="o">.</span><span class="na">value</span> <span class="o">&lt;=&gt;</span> <span class="n">b</span><span class="o">.</span><span class="na">value</span><span class="o">}</span>
<span class="o">==&gt;</span><span class="n">espeed</span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">invalid</span><span class="o">-</span><span class="n">email</span><span class="o">-</span><span class="n">address</span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">NQuinn</span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">zcox</span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">xedin</span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">svzdvd</span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">jtakakura</span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">sgomezvillamor</span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">fescale</span><span class="o">-</span><span class="n"><span class="caps">AC</span></span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">hendrens</span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">countvajhula</span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">tor5</span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">pierredewilde</span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">lvca</span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">alexaverbuch</span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">dmitriid</span><span class="o">=</span><span class="mi">1</span>
<span class="o">==&gt;</span><span class="n">jordanlewis</span><span class="o">=</span><span class="mi">2</span>
<span class="o">==&gt;</span><span class="n">pauljackson</span><span class="o">=</span><span class="mi">2</span>
<span class="o">==&gt;</span><span class="n">peterneubauer</span><span class="o">=</span><span class="mi">2</span>
<span class="o">==&gt;</span><span class="n">tinkerpop</span><span class="o">=</span><span class="mi">2</span>
<span class="o">==&gt;</span><span class="n">spmallette</span><span class="o">=</span><span class="mi">3</span>
<span class="o">==&gt;</span><span class="n">joshsh</span><span class="o">=</span><span class="mi">4</span>
<span class="o">==&gt;</span><span class="n">jramsdale</span><span class="o">=</span><span class="mi">4</span>
<span class="o">==&gt;</span><span class="n">okram</span><span class="o">=</span><span class="mi">6</span>
</pre></div>


<p>Taking this a step forward, lets look at what other projects people in
this set watch. We need to branch out another layer, but first we need
to be careful and add in a <code>dedup()</code> in the pipe to ensure that we&#8217;re
not counting some projects too&nbsp;often.</p>
<div class="codehilite"><pre><span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">m</span> <span class="o">=</span> <span class="o">[:];</span> 
<span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">marko</span><span class="o">.</span><span class="na">in</span><span class="o">(</span><span class="s1">&#39;REPO_CONTRIBUTOR&#39;</span><span class="o">).</span><span class="na">out</span><span class="o">(</span><span class="s1">&#39;REPO_CONTRIBUTOR&#39;</span><span class="o">).</span><span class="na">dedup</span><span class="o">().</span><span class="na">out</span><span class="o">(</span><span class="s1">&#39;REPO_WATCHED&#39;</span><span class="o">).</span><span class="na">fullname</span><span class="o">.</span><span class="na">groupCount</span><span class="o">(</span><span class="n">m</span><span class="o">).</span><span class="na">iterate</span><span class="o">();</span> <span class="kc">null</span>            
<span class="o">==&gt;</span><span class="kc">null</span>
<span class="n">gremlin</span><span class="o">&gt;</span> <span class="n">m</span><span class="o">.</span><span class="na">sort</span><span class="o">{</span><span class="n">a</span><span class="o">,</span><span class="n">b</span> <span class="o">-&gt;</span> <span class="n">a</span><span class="o">.</span><span class="na">value</span> <span class="o">&lt;=&gt;</span> <span class="n">b</span><span class="o">.</span><span class="na">value</span> <span class="o">}</span>
<span class="o">[</span> <span class="n"><span class="caps">OUTPUT</span></span> <span class="n"><span class="caps">TRUNCATED</span></span> <span class="n"><span class="caps">FOR</span></span> <span class="n"><span class="caps">BREVITY</span></span> <span class="o">]</span>
<span class="o">==&gt;</span><span class="n">tong</span><span class="s">/hxmpp.lop=3</span>
<span class="s">==&gt;twitter/</span><span class="n">flockdb</span><span class="o">=</span><span class="mi">3</span>
<span class="o">==&gt;</span><span class="n">twitter</span><span class="s">/gizzard=3</span>
<span class="s">==&gt;banker/</span><span class="n">mongulator</span><span class="o">=</span><span class="mi">3</span>
<span class="o">==&gt;</span><span class="n">dgreco</span><span class="s">/graphbase=3</span>
<span class="s">==&gt;espeed/</span><span class="n">bulbs</span><span class="o">=</span><span class="mi">4</span>
<span class="o">==&gt;</span><span class="n">tinkerpop</span><span class="s">/tinkubator=4</span>
<span class="s">==&gt;nerlo/</span><span class="n">nerlo</span><span class="o">=</span><span class="mi">4</span>
<span class="o">==&gt;</span><span class="n">nathanmarz</span><span class="s">/storm=4</span>
<span class="s">==&gt;neo4j/</span><span class="n">community</span><span class="o">=</span><span class="mi">5</span>
<span class="o">==&gt;</span><span class="n">tinkerpop</span><span class="s">/furnace=6</span>
<span class="s">==&gt;tinkerpop/</span><span class="n">frames</span><span class="o">=</span><span class="mi">6</span>
<span class="o">==&gt;</span><span class="n">tinkerpop</span><span class="s">/rexster=8</span>
<span class="s">==&gt;tinkerpop/</span><span class="n">pipes</span><span class="o">=</span><span class="mi">9</span>
<span class="o">==&gt;</span><span class="n">tinkerpop</span><span class="s">/gremlin=11</span>
<span class="s">==&gt;tinkerpop/</span><span class="n">blueprints</span><span class="o">=</span><span class="mi">18</span>
</pre></div>


<p>It&#8217;s no surprise that the projects in the tinkerpop stack are the most
watched projects among the developers who work on Tinkerpop
projects. However, there are a few other interesting pieces of
software the seem popular. Among others <a href="https://github.com/nathanmarz/storm#readme">Storm</a> is a realtime computation
system written in Java and Clojure that&#8217;s great for munging through
thousands of logs. <a href="https://github.com/espeed/bulbs">Bulbs</a> is a nifty python interface to many
of graph databases. <a href="https://github.com/nerlo/nerlo">Nerlo</a> is a mechanism to use Neo4j
from within Erlang. My apologies if my descriptions are wrong, as some
of these projects are new to me&nbsp;too.</p>
<p>That&#8217;s enough about traversals in the data for now. I&#8217;ll leave to
explore the data on your own. In future articles I&#8217;ll cover more about
actually mining the&nbsp;relationships.</p>
<h1>Exporting a Graph to&nbsp;GraphML</h1>
<p>While graph databases and Gremlin are very useful for storing your
data and doing traversals on data, they&#8217;re not always well structured
for doing computation on the data and gaining insight over a wide
number of projects. In grad school I studied with <a href="http://www.casos.cs.cmu.edu/bios/carley/carley.html">one of the leaders
in the field of social network analysis</a>, and now that she&#8217;s
given me a hammer, it seems like everything looks like a nail. In this
section I describe how to get your data out of a graph database and
into a program like&nbsp;R.</p>
<p>A common interchange format for social network data is in the GraphML
format - an <span class="caps">XML</span> specification for describing graphs. It was first used
by individuals interested in visualizing large scale graphs. As such,
it has significant drawbacks that make it less than ideal compared to
other formats such as DynetML (e.g. on a single graph, no nesting,
edges must all be directed or undirected). In any case, it&#8217;s what we
have, so we&#8217;ll use it. Fortunately, both Gremlin and the igraph
package for R, which we&#8217;ll be using later, support&nbsp;GraphML.</p>
<p>I&#8217;ve created a simple script that you can run in your current Gremlin
session. You should be able to just paste this code into your running
gremlin session and it will save the network to file called
<code>follower.graphml</code>.</p>
<!-- GitHub Gist: 2690667 -->

<!-- Filename: "graphmlExport.groovy" -->

<script src="https://gist.github.com/2690667.js"></script>

<p>The astute observer will notice a couple of things about this. First,
we&#8217;re using a specialized method to get all of the users associated
with the Gremlin project on GitHub. However, we&#8217;re not following all
of the ways a user can be associated. For example, we&#8217;re not looking a
issues, pull requests, commits, or other&nbsp;events.</p>
<p>Secondly, we&#8217;re skipping a lot of edges and vertices. In this case
we&#8217;re skipping every edge that doesn&#8217;t lead to a user in this set. The
reason for that is because if we didn&#8217;t skip these edges we&#8217;d have a
network with 30,000+ nodes as opposed to to the 606 in this
network. While it&#8217;s possible to do analysis on networks of that size,
it is much slower and would prove to be a bit of a distraction&nbsp;here.</p>
<div class="image caption center">
    <img src="/weblog/media/2012/05/gremlinNetwork.png" alt="Network as visualized in Cytoscape">
    <p>Network as visualized in&nbsp;Cytoscape</p>
</div>

<p>This finishes the section of the article dealing with gremlin from the
command line. From here on out the operations are done in&nbsp;R.</p>
<p><strong> Very Important: </strong> Before exiting Gremlin run the command
<code>g.shutdown()</code> to close the graph database. If you don&#8217;t do this
then you&#8217;ll have to wait for a recovery process then next time you
look at the&nbsp;data.</p>
<h1>Examining the Data in&nbsp;R</h1>
<p>Within R the first thing to do is to make sure you have the igraph
package installed. You can do this by running the following command
and following the&nbsp;directions:</p>
<div class="codehilite"><pre>install.packages<span class="p">(</span><span class="s">&#39;igraph&#39;</span><span class="p">)</span>
</pre></div>


<p>Now that we&#8217;ve got igraph installed, it&#8217;s time to have some
fun. First, we need to tell R to use the functions inside of the
igraph library and to load our&nbsp;data.</p>
<div class="codehilite"><pre>library<span class="p">(</span>igraph<span class="p">)</span>
setwd<span class="p">(</span><span class="s">&quot;~/gremlin&quot;</span><span class="p">)</span>
graph <span class="o">&lt;-</span> read.graph<span class="p">(</span><span class="s">&quot;follower.graphml&quot;</span><span class="p">,</span> format<span class="o">=</span><span class="s">&quot;graphml&quot;</span><span class="p">)</span>
</pre></div>


<p>First let&#8217;s get some summary information. This can be done with the
<code>ecount</code> and <code>vcount</code> functions. It shows that in the current network
there are 510 edges and 606&nbsp;nodes. </p>
<div class="codehilite"><pre><span class="o">&gt;</span> ecount<span class="p">(</span>graph<span class="p">)</span>
<span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">510</span>
<span class="o">&gt;</span> vcount<span class="p">(</span>graph<span class="p">)</span>
<span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="m">606</span>
</pre></div>


<p>This network has a lot of isolates in it. That&#8217;s somewhat to be
expected as not every user utilizes the follower feature of
github. The following commands will remove isolates from our data set
and results in a network of 236 vertices and 510&nbsp;edges.</p>
<div class="codehilite"><pre><span class="o">&gt;</span> isolates <span class="o">&lt;-</span> which<span class="p">(</span>degree<span class="p">(</span>graph<span class="p">,</span> mode <span class="o">=</span> <span class="s">&#39;all&#39;</span><span class="p">)</span> <span class="o">==</span> <span class="m">0</span><span class="p">)</span> <span class="o">-</span> <span class="m">1</span>
<span class="o">&gt;</span> graph <span class="o">&lt;-</span> delete.vertices<span class="p">(</span>graph<span class="p">,</span> isolates<span class="p">)</span>
<span class="o">&gt;</span> summary<span class="p">(</span>graph<span class="p">)</span>
Vertices: <span class="m">236</span> 
Edges: <span class="m">510</span> 
Directed: <span class="kc"><span class="caps">TRUE</span></span> 
No graph attributes.
Vertex attributes: location<span class="p">,</span> sys_last_updated<span class="p">,</span> type<span class="p">,</span> blog<span class="p">,</span> gravatarId<span class="p">,</span> following<span class="p">,</span> followers<span class="p">,</span> createdAt<span class="p">,</span> name<span class="p">,</span> login<span class="p">,</span> fullname<span class="p">,</span> gitHubId<span class="p">,</span> sys_events_added<span class="p">,</span> user_type<span class="p">,</span> totalPrivateRepoCount<span class="p">,</span> private_gist_count<span class="p">,</span> biography<span class="p">,</span> sys_last_full_update<span class="p">,</span> diskUsage<span class="p">,</span> url<span class="p">,</span> public_gist_count<span class="p">,</span> collaborators<span class="p">,</span> email<span class="p">,</span> sys_created_at<span class="p">,</span> company<span class="p">,</span> ownedPrivateRepoCount<span class="p">,</span> public_repo_count<span class="p">,</span> id.
Edge attributes: sys_created_at<span class="p">,</span> id.
</pre></div>


<p>First, lets get an idea of the degree of the vertices in the
graph. This command creates a histogram that clumps vertices by the
number of edges they have. We see that only a very few have many
edges, most have fewer than 10 edges. I should stress, this does not
reflect the total number of people those accounts follow, rather it
reflects only the total number of people within Gremlin that each
account&nbsp;follows.</p>
<div class="codehilite"><pre>hist<span class="p">(</span>degree<span class="p">(</span>graph<span class="p">))</span>
</pre></div>


<div class="image caption center">
    <img src="/weblog/media/2012/05/degreeHistogram.png">
    <p></p>
</div>

<p>Now, lets look a couple of the classic centrality
measures. Betweenness centrality calculates the proportion of all
shortest paths between vertices that a particular vertex sits on. If
communication had to go person to person and could only go along
connections that are established, these people would prove to be key
in the&nbsp;network.</p>
<div class="codehilite"><pre>results <span class="o">&lt;-</span> data.frame<span class="p">(</span>login<span class="o">=</span>get.vertex.attribute<span class="p">(</span>graph<span class="p">,</span> <span class="s">&quot;login&quot;</span><span class="p">))</span>
results<span class="p">$</span>betweenness <span class="o">&lt;-</span> betweenness<span class="p">(</span>graph<span class="p">)</span>
results<span class="p">$</span>evcent <span class="o">&lt;-</span> evcent<span class="p">(</span>graph<span class="p">)$</span>vector
</pre></div>


<p>Now that we&#8217;ve calculated those centralities, let&#8217;s take a look. We&#8217;ll
start with betweenness. According to this data the user that has the
most central role is <a href="https://github.com/spmallette">spmallette</a>, an active participant
in the tinkerpop communities, followed by <a href="https://github.com/ahzf">ahzf</a>, a developer
who is working on .Net ports of many blueprints services. In third
place is a research account from a university in Korea. This account
shows up all over the place and I generally consider it to be a spam
account. It follows tens of thousands users and therefore creates
artificially short paths between users, boosting it&#8217;s score in the
process. In fourth place is <a href="https://github.com/okram">Marko</a>, the leader of&nbsp;Tinkerpop.</p>
<div class="codehilite"><pre><span class="o">&gt;</span> results<span class="p">[</span>order<span class="p">(</span><span class="o">-</span>results<span class="p">$</span>betweenness<span class="p">),</span> c<span class="p">(</span><span class="s">&quot;login&quot;</span><span class="p">,</span> <span class="s">&quot;betweenness&quot;</span><span class="p">)][</span><span class="m">1</span>:<span class="m">10</span><span class="p">,</span> <span class="p">]</span>
         login betweenness
<span class="m">165</span> spmallette    <span class="m">8415.414</span>
<span class="m">156</span>       ahzf    <span class="m">8332.422</span>
<span class="m">134</span>     hcilab    <span class="m">7963.272</span>
<span class="m">16</span>       okram    <span class="m">6492.181</span>
<span class="m">107</span>  igrigorik    <span class="m">5793.494</span>
<span class="m">235</span>  joshbuddy    <span class="m">4623.192</span>
<span class="m">87</span>      collin    <span class="m">3305.313</span>
<span class="m">172</span>   pangloss    <span class="m">2442.794</span>
<span class="m">219</span>   stonegao    <span class="m">2165.359</span>
<span class="m">60</span>        dann    <span class="m">1876.813</span>
</pre></div>


<p>In the betweenness centrality model, which is a directed model, users
who follow few additional users are penalized. As Marko only follows a
handful of users, his score is low, despite the fact that many people
in the community follow&nbsp;him.</p>
<p>However, when we use eigenvector centrality, which is a more robust
centrality metric, is used, we find a more interesting picture. <a href="https://github.com/okram">Marko</a>
and <a href="https://github.com/peterneubauser">peterneubauer</a> are the top individuals, followed by <a href="https://github.com/spmallette">spmallette</a> and
<a href="https://github.com/joshsh">joshsh</a>, additional developers of&nbsp;Tinkerpop.</p>
<div class="codehilite"><pre><span class="o">&gt;</span> results<span class="p">[</span>order<span class="p">(</span><span class="o">-</span>results<span class="p">$</span>evcent<span class="p">),</span> c<span class="p">(</span><span class="s">&quot;login&quot;</span><span class="p">,</span> <span class="s">&quot;evcent&quot;</span><span class="p">)][</span><span class="m">1</span>:<span class="m">10</span><span class="p">,</span> <span class="p">]</span>
            login    evcent
<span class="m">16</span>          okram <span class="m">1.0000000</span>
<span class="m">167</span> peterneubauer <span class="m">0.9739565</span>
<span class="m">165</span>    spmallette <span class="m">0.6607181</span>
<span class="m">13</span>         joshsh <span class="m">0.5872818</span>
<span class="m">107</span>     igrigorik <span class="m">0.5665205</span>
<span class="m">226</span>         thobe <span class="m">0.5273195</span>
<span class="m">219</span>      stonegao <span class="m">0.5220129</span>
<span class="m">14</span>   alexaverbuch <span class="m">0.5173995</span>
<span class="m">178</span>       nawroth <span class="m">0.4466164</span>
<span class="m">156</span>          ahzf <span class="m">0.4284547</span>
</pre></div>


<p>There&#8217;s always more that you can do with these tools, and in the
future I&#8217;ll discuss some more, but for now I hope this has given you a
taste for how to mine social networks from GitHub.&nbsp;Enjoy!</p>
                        ]]></description>
    <pubDate>Sun, 13 May 2012 17:09:00 UTC</pubDate>
    <guid>http://patrick.wagstrom.net/weblog/2012/05/13/mining-github-followers-in-tinkerpop/</guid>
</item>
                    <item>
    <title>Static Bloggin</title>
    <link>http://patrick.wagstrom.net/weblog/2012/05/11/static-bloggin/</link>
    <description><![CDATA[
                            <p>This is my first new post written in markdown for the static version
of
<a href="http://patrick.wagstrom.net/weblog">patrick.wagstrom.net/weblog</a>. The
only reason I was running both <span class="caps">PHP</span> and MySQL on my server was to host
wordpress which became a pain in the ass with all of the
upgrades. This eliminates all of those nasty security holes and allows
me to focus a little bit more on just writing. Which is what a weblog
is supposed to&nbsp;be.</p>
<p>I&#8217;m running <a href="http://octopress.org/">Octopress</a>, which is a blogging
framework based on <a href="http://jekyllrb.com/">Jekyll</a>. The downside to
this is that means that it cannot accomodate dynamic elements,
therefore all comments need to be farmed off to an external
service. Fortunately, I was already using
<a href="http://www.intensedebate.com/">IntenseDebate</a>. With only a very small
amount of work I was able to migrate everything over to the new
system. Perhaps most substantial is that I had to write a patch to
Octopress to support IntenseDebate. I&#8217;ve since created a
<a href="https://github.com/imathis/octopress/pull/557">pull request for IntenseDebate support</a>
on Github. Hopefully the authors will see fit to pull it&nbsp;in.</p>
<p>So yeah, it&#8217;s a little more work now that I don&#8217;t have a web interface
to do things like manage images and remember my links, but I can write
posts from any text editor, which is <span class="caps">VERY</span> handy for when I&#8217;m stuck in
airplanes and too cheap to pay for&nbsp;WiFi.</p>
<p>Overall I&#8217;m not certain if this is a good idea. In the past I&#8217;ve
extolled various reasons why
<a href="http://patrick.wagstrom.net/weblog/2008/12/10/youre-too-smart-to-do-it-yourself/">you shouldn&#8217;t try to do it yourself</a>. However
there is also merit to doing it yourself. Up until this point I&#8217;ve
been an active Ruby Hater, and it&#8217;s becoming clear that I should at
least be peripherally aware of what Ruby can do. Although my
extensions to this point have not involved hacking Ruby, it might at
some point in the&nbsp;future.</p>
<p>So, for now, enjoy the fact that every post is showing up again in
your <span class="caps">RSS</span> reader and marvel at the beautiful new theme. With no more
worries about annoying security faults and a faster response&nbsp;time.</p>
                        ]]></description>
    <pubDate>Fri, 11 May 2012 04:36:00 UTC</pubDate>
    <guid>http://patrick.wagstrom.net/weblog/2012/05/11/static-bloggin/</guid>
</item>
                    <item>
    <title>I am not a climatologist, and neither are most of these people</title>
    <link>http://patrick.wagstrom.net/weblog/2012/01/28/i-am-not-a-climatologist-and-neither-are-most-of-these-people/</link>
    <description><![CDATA[
                            <p>In the past couple of days I have twice received an <a href="http://online.wsj.com/article/SB10001424052970204301404577171531838421366.html?mod=WSJ_hp_mostpop_read">opinion piece from the
Wall Street Journal which suggests that the models used for estimating climate
change are grossly pessimistic and that we really need not be concerned with
anthropogenic climate
change</a>.
It was signed by sixteen scientists and engineers. The problem is that almost
none of these people are climatologists - which is the field they are claiming
is producing invalid science. Anyone can call themselves a scientist - having a
Ph.D. helps - but, just because you are a scientist does not mean that you can
speak authoritatively on all issues related to science. Stephen Hawking is a
brilliant scientist, but he studies astrophysics, not climatology. I trust him
on a lot of things, but I wouldn&#8217;t trust him on climate change. Nor would I
trust Albert Einstein, Louis Pastuer, Marie Curie, or Isaac Newton on issues of
climate&nbsp;change.</p>
<p>So, who are these climate change deniers that have the right frothing at the
mouth again? Let&#8217;s take a quick&nbsp;look.</p>
<ul>
<li><strong><a href="http://en.wikipedia.org/wiki/Claude_All%C3%A8gre">Claude Allegre</a>, former director of the Institute for the Study of the Earth, University of Paris</strong> - Is a geochemist, which might make him qualified. It&#8217;s hard to tell as he has spent most of his time doing political work recently. He appears to have a strong contrarian streak, such as in 1996 when he insisted that asbestos was harmless and that anger over it was caused by mass hysteria. That last time I checked the link between asbestos and mesothelioma was pretty&nbsp;firm.</li>
<li><strong><a href="http://en.wikipedia.org/wiki/J._Scott_Armstrong">J. Scott Armstrong</a>, cofounder of the Journal of Forecasting and the International Journal of Forecasting</strong> - This one gave me a decent chuckle. At first I thought he was a climate forecasting scientist, nope. Armstrong&#8217;s expertise is in marketing style forecasting, as in trends. His journal is also published by Elsevier. I think I threw up a little in my&nbsp;mouth.</li>
<li><strong><a href="http://www.rockefeller.edu/research/faculty/labheads/JanBreslow/">Jan Breslow</a>, head of the Laboratory of Biochemical Genetics and Metabolism, Rockefeller University</strong> - A medical doctor and not a climatologist. Breslow is perhaps most well known for his work on heart disease. This is great work he has done, but it&#8217;s not atmospheric&nbsp;science.</li>
<li><strong><a href="http://www.marshall.org/experts.php?id=252">Roger Cohen</a>, fellow, American Physical Society</strong> - It&#8217;s difficult to find information on Cohen. Prior to retirement he worked for ExxonMobil research, but that&#8217;s about all I can find. I can&#8217;t seem to find any publications on any issue. However, he does have a very common name, making him hard to google. He frequently consort with William Happer, who appears later in the&nbsp;list.</li>
<li><strong><a href="http://en.wikipedia.org/wiki/Edward_E._David_Jr.">Edward David</a>, member, National Academy of Engineering and National Academy of Sciences</strong> - As a member of the National Academy of Engineering I have great respect for Dr. David. However, he is an electrical engineer and has been largely retired from research for more than 20 years. Did I mention he was director of research at Exxon from&nbsp;1977-1985?</li>
<li><strong><a href="http://en.wikipedia.org/wiki/William_Happer">William Happer</a>, professor of physics, Princeton</strong> - Seems to have moved away from research as he&#8217;s advanced in his career. During his prime he was a leader in the field of spectroscopy. Which, in case you didn&#8217;t know, has nothing to do with climate change. During his 2009 testimony to congress he indicated that an increase in <span class="caps">CO2</span> is good for the planet because it&#8217;s good for plants. Yes, very much like the Competitive Enterprise Institute&#8217;s &#8220;<span class="caps">CO2</span>, We Call it Life&#8221;&nbsp;vieo.</li>
<li><strong><a href="http://www.sp.phy.cam.ac.uk/SPWeb/home/mjk1.html">Michael Kelly</a>, professor of technology, University of Cambridge, <span class="caps">U.K.</span></strong> - Kelly primarily works on semi-conductors, specifically <span class="caps">SRAM</span>. He is not a climatologist or even a chemical engineer or&nbsp;chemist.</li>
<li><strong><a href="http://www.sourcewatch.org/index.php?title=William_Kininmonth">William Kininmonth</a>, former head of climate research at the Australian Bureau of Meteorology</strong> - Kininmonth is, perhaps, a meteorologist, although there is little information easily available about his activities. It is known that he is not a prominent researcher in any field and his &#8220;Australasian Climate Research Institute&#8221; is run out of his home and appears to be only his own&nbsp;writings.</li>
<li><strong><a href="http://en.wikipedia.org/wiki/Richard_Lindzen">Richard Lindzen</a>, professor of atmospheric sciences, <span class="caps">MIT</span></strong> - Lindzen is perhaps the most qualified individual on this list. He is well known for his skepticism of anthropogenic climate change. He stands out from the other signatories because he can speak with true scientific authority on the&nbsp;issue.</li>
<li><strong><a href="http://www.mii.vt.edu/MACR/faculty/mcgrath.html">James McGrath</a>, professor of chemistry, Virginia Technical University</strong> - McGrath studies polymers and fuel cells. He is a scientist, but not a climate&nbsp;scientist.</li>
<li><strong><a href="http://www.fas.org/about/bio/nichols.html">Rodney Nichols</a>, former president and <span class="caps">CEO</span> of the New York Academy of Sciences</strong> - This one took me a while longer to find out information about. I believe that Dr. Nichols is a physicist from Harvard, which means he could be a climatologist. However, looking at his publication record for the last 40 years you&#8217;ll find that most of his work is dealing with science and technology policy &#8211; issues that are close to my heart. However, this doesn&#8217;t qualify him as a climatologist. I&#8217;m sure he is well learned in a variety of topics, but I don&#8217;t believe he has a deep knowledge of the current research on&nbsp;climatology.</li>
<li><strong><a href="http://en.wikipedia.org/wiki/Harrison_Schmitt">Harrison H. Schmitt</a>, Apollo 17 astronaut and former <span class="caps">U.S.</span> senator</strong> - As an astronaut Harrison Schmitt was on the mission that took the famous &#8220;<a href="http://en.wikipedia.org/wiki/The_Blue_Marble">Blue Marble</a>&#8221; picture of the earth. In fact, evidence indicates that Schmitt most likely took the photo that has been credited with being a critical catalyst for the environment movement in the 1970&#8217;s. Outside of his astronaut career he was a university professor, geologist, and senator from New Mexico. None of these are related to the atmosphere or climate&nbsp;science.</li>
<li><strong><a href="http://en.wikipedia.org/wiki/Nir_Shaviv">Nir Shaviv</a>, professor of astrophysics, Hebrew University, Jerusalem</strong> - Shaviv is primarily an astrophysicst known for his work on cosmic rays and luminosity. He has his own theory of global warming which says that the cosmic rays of the sun are responsible for global warming. His theory has not been widely accepted and has faced great challenges because of the fact that the solar output has been decreasing since the mid&nbsp;1980&#8217;s.</li>
<li><strong><a href="http://en.wikipedia.org/wiki/Henk_Tennekes">Henk Tennekes</a>, former director, Royal Dutch Meteorological Service</strong> - Also a professor of Aeronautical Engineering at Penn State, Tennekes is most well known for his work on turbulence in airflows. In fact, he literally wrote the book on it. Unfortunately, that&#8217;s not a book on climate change. He was reportedly ousted from the Royal Dutch Meteorologic Service for his denial of climate change and his sometimes reliance on biblical texts for justification. Look, I&#8217;m a Christian and a scientist, but I realize that I can&#8217;t use biblical texts to justify my work, that&#8217;s not how science&nbsp;works.</li>
<li><strong><a href="http://en.wikipedia.org/wiki/Antonino_Zichichi">Antonio Zichichi</a>, president of the World Federation of Scientists, Geneva</strong> - Primarily a sub-nuclear physicist who has worked at labs like <span class="caps">CERN</span> and FermiLab. His title of President of the World Federation of Scientists is self bestowed as he is the founder. It should not be considered to be an analog to the Federation of American Scientists. He is a highly cited researcher, and has done significant work in popularizing science in Italy, but he is not a&nbsp;climatologist. </li>
</ul>
<p>Out of the sixteen people listed I count one atmospheric scientist, Lindzen,
and a half, Allegre. In any community of scientists you&#8217;ll have dissenters. The
fact that they could round up only one and a half climate scientists for this
letter should show you just how strong the case for global warming really is.
Want more evidence? 255 scientists, all members of the National Academy of
Science, including 11 Nobel laureates wrote a scathing response, <a href="http://www.sciencemag.org/content/328/5979/689.full">rejected by
the Wall Street Journal and later published in
Science</a>.</p>
                        ]]></description>
    <pubDate>Sun, 29 Jan 2012 00:16:00 UTC</pubDate>
    <guid>http://patrick.wagstrom.net/weblog/2012/01/28/i-am-not-a-climatologist-and-neither-are-most-of-these-people/</guid>
</item>
                    <item>
    <title>Looking for Summer Interns in the Software Technology Group at IBM TJ Watson Research Center</title>
    <link>http://patrick.wagstrom.net/weblog/2012/01/10/looking-for-summer-interns-in-the-software-technology-group-at-ibm-tj-watson-research-center/</link>
    <description><![CDATA[
                            <div class="image caption center">
    <img src="/weblog/media/2012/01/Watson-Research-Center.jpg" alt="IBM TJ Watson Research Center (Photo by Simon Greig)">
    <p><span class="caps">IBM</span> <span class="caps">TJ</span> Watson Research Center &mdash; <a href="http://www.flickr.com/photos/xrrr/2478142363/">Photo by Simon&nbsp;Greig</a></p>
</div>

<p>Are you one of the best software engineering students in the world? Do you dig mining software repositories? Are you a wizard at social network analysis? Interested in a great summer job looking at what makes software teams work? Even better, want to work with&nbsp;me?</p>
<p><a href="http://researcher.ibm.com/view_project.php?id=1811">The Software Technologies Group at the <span class="caps">IBM</span> <span class="caps">TJ</span> Watson Research Center in Hawthorne, <span class="caps">NY</span> is looking for summer interns!</a> I started at <span class="caps">IBM</span> back in 2007 as an intern and had a great time meeting some of the smartest students from around the world. Students are given a chance to work with the best technology in the world and often end submitting papers to <span class="caps">ICSE</span>, <span class="caps">CHI</span>, <span class="caps">CSCW</span>, or <span class="caps">FSE</span> as a result of their work with&nbsp;us.</p>
<p>We suggest that you <a href="https://jobs3.netmedia1.com/cp/job_summary.jsp?job_id=RES-0451733">apply online</a>. If you&#8217;ve got questions you can email directly for more information. But hurry up, as we&#8217;re going to start our selection and interview process&nbsp;soon.</p>
<p><span class="caps">PS</span>. For faculty, this is a great way offload students for the summer if you&#8217;d like to take off to St. Barth&#8217;s for a few&nbsp;months.</p>
                        ]]></description>
    <pubDate>Wed, 11 Jan 2012 01:55:00 UTC</pubDate>
    <guid>http://patrick.wagstrom.net/weblog/2012/01/10/looking-for-summer-interns-in-the-software-technology-group-at-ibm-tj-watson-research-center/</guid>
</item>
            </channel>
</rss>