For the last year or so I’ve been quite happy with the results that Spam Assassin was able to give me when filtering my email. Lately I’ve been getting a little disappointed with it’s performance, I could upgrade to a new version, or I could switch to a new filter. I had previously shied away from Bayesian filters because I didn’t fully understand the math that was going on behind them.
Posts
Odd’s are this article will be applicable to approximately 0 readers of my blog, but hopefully Google will catch it and someone else may find it helpful in the future. Here is the situation, CMU has access to a wide variety of online information sources through the library, unfortunately these require you to be on campus, or using a CMU IP address in order to utilize them. For windows users there is the VPN solution, but what about linux users?
I just installed fedora on my laptop and it wasn’t as painless as advertised. Initially I had thought that I could just do a simple upgrade from my old RedHat 9 installation, but I ran into some serious snags. The install procedure progressed just fine and took about 45 minutes or so. Then I tried to log in, but GDM crashed, well it didn’t crash in the proper way. Ordinarily when GDM crashes it will try different choosers until one works.
Today the new release of fedora came out. Like any good sensible people, rather than getting slashdotted, they put up some bittorrent releases for people to download. Anyway, so while I was at school today I was getting 400k/s on both the upstream and downstream channels. When I went home I was only getting 8k/s download and about 16k/s upload. Needless to say I was not pleased. It also was quite difficult for me to go look at other stuff on the net.
I’ve hacked together code so the XSLTRenderer plugin can now use libxslt to do the transformations on the server side. This should be more efficient than shelling out to xsltproc all the time. If I hack away on modpython stuff tomorrow then I might see about getting it to cache the already read stylesheet in memory. This would reduce another disk read and make server side rendering even faster. Yay.
This is a little patch that will give you the ability to get the full month name of an entry in pyBlosxom. This is especially useful if you don’t like abbreviations (like myself). This creates two new variables you can use in your templates, $month which is the full month name and $daord which is the ordinal thingey after the number (like “nd”, “th”, “rd”, “st”). Sorry if it doesn’t work for non-English speakers.
In the spirit of making every damn thing on this site XML, I went through and translated the category plugin into XML spitting out fun. It’s not quite perfect yet as it doesn’t nest XML elements, but it works well enough to generate a simple list. You can see it in action (post XSLT processing of course) on the side of this weblog. Feel free to download it from my projects page.
One of the more interesting things about pyblosxom is that it’s all based off a single URL and data is passed to it in the form of a path rather than all on the hook. Anyway, this means that unless you want a nasty looking path like http://patrick.wagstrom.net/pyblosxom/pyblosxom.cgi/2003/oct then you need to do something to hide it. There are a few hints that talk about hiding stuff. But I think that I’d rather do it a more robust way than just renaming the CGI as something like blog.
As part of my program here I’ve got an IBM Thinkpad A31. In general it’s a pretty nice little system, it’s got a few issues that I’ll try to document here and how I’ve managed to get around them in hopes that this information can help someone else out.
One of the most annoying things is what happened when I upgraded the kernel. I currently run RedHat 9 and it works quite well.
So now that I’ve got my blog automagically rendering XML into HTML through XSLT for browsers that can’t hack it, I’d like to extend that to my entire web server. This will take a little bit more work. First I’ll describe the general concept:
User makes a request for a given page such as index.xml or foo.php
Post construction of the page (so this will work with PHP and what not) the module checks the header on the page, if it is text/xml then we proceed to the next step, otherwise return data unprocessed