Search Engine Showdown: Google vs AOL

May 13th, 2009 by Xerxes No comments »

I think it’s safe for me to say “shea right – if i hate AOL Search as much as I despise AOL the ISP, this article will not be favourable to AOL in any way, shape or form”.

Let the games begin.

AOL Search fails to render properly in Google Chrome

AOL Search fails to render properly in Google Chrome

Wow off to a flying start here, boys…. </sarcasm>

The second thing to peeve me off is that AOL Search doesn’t have a search provider exposed in their meta-data. So I have to create one for myself. Fortunately, Chrome makes this pretty easy, but that’s not the point – You guys are providing a search service. Irrespective of how shit it may or may not be, FFS at least make it easy for me to *TRY* and use your product?

…all this, and I haven’t actually started using it yet. How ominous. I’m hoping that the little “powered by Google” actually means “we cant do search anymore, and have given up. Here’s one which does it better”.

……………

One week (maybe a little more) has passed and well, lets just say i’m not as disappointed as I thought I would be. Mainly because AOL search really does seem to be effectively a wrapper around Google. As an example, I searched the hottest topic going around on the tubes at the moment (the effects of socialism on post-war Germany), and most of the results were the same, except that Google also listed a result to its Book Search service. Apart from little things like that, these two are inseparable. Even AOL’s image search is just a face-mask over Google Images.

The design of the page leaves a little to be desired however, as AOL shamelessly place advertisements on the top of the page in an attempt to drive click-throughs. My ad-busting eyeballs detect this easily so the placement of the ad isn’t so much the problem. The problem is that they have sneakily set the length of the HREF for each paid link to be the full width of the page, which means by clicking in what should be “blank space” you trigger the link and click-through the paid ad. Naughty, naughty.

All said and done, I couldn’t help but realise I just commented to a colleauge without realising that I’m finishing up this post so that “I can go back to using Google”. I guess even subconsciously I find any experience outside Google’s to be less than engaging.

On a final note, The Wolfram Alpha didn’t launch as soon as I was hoping it would, so there’ll be a week’s rest where I go back to Google, before trying out the new kid on the block on 18th May. Yes, I am aware of the broad misrepresentation of Alpha being as “Google killer” but it would still be fun to try :)

Search Engine Showdown: Google vs Ask

May 5th, 2009 by Xerxes 1 comment »

In this, the 3rd installment of The (Not So) Great Search Engine Showdown, I reflect on my experience using Ask.com compared to Google.

I don’t have a great deal of time so this post is going to be brief. I really only have one _serious_ gripe about Ask – that stupid fu#@$%g Answerbar at the top of the page everytime you navigate to a search result. NO, ask.com! I wanted you to give me the search result, not a pain in the ass waste of screen real estate. What also frustrated me about this “feature” was it’s sheer unpredictability. Most web results would display the Anusbar at the top, but others (like Wikipedia) would be displayed in full glory without being crippled.
The “Close Permanently” button was never hit with such gusto, i’m sure. To demonstrate just how much, i’ve prepared the following illustration:

How to close the Ask.com Answerbar

How to close the Ask.com Answerbar

By way of quality of results, I actually found Ask to be better than I was expecting. Certainly I felt like I wasn’t missing Google, though on a few occassions I had to drop back just to be sure I wasn’t missing anything (turns out I wasn’t). Overall the web search results were as good as Yahoo’s, though one thing that irritated was that Ask.com mixes the paid advertising results in with the organic search results. I’m sure they’ll claim that they’re putting the top-most organic result first and then allowing the rest of the results to be shown underneath the paid section, but we all know the truth. Money grubbers.

When it was originally launched as “Ask Jeeves”, the website’s search technology was based on doing some NLP against your search query and it would try to return the best results based on the context of your question. A few years ago Jeeve’s was given the arse from his job, and the company took the arse to their search results, because (quite simply) their NLP wasn’t advanced enough to provide accurate results compared to Big Brother

However having played with with Ask.com this week, I noticed they still have a Q&A section (it claims is in Beta) which allows you to phrase a question and let the NLP try and answer it for you. Not one to turn down a good opportunity to test NLP products (and get a comparative feeling for the upcoming Wolfram Alpha test i’ll hopefully be performing), I Ask’ed the following question in the name of science:

Putting Ask.com's NLP to the Public Service Announcement test.

Putting Ask.com's NLP to the Public Service Announcement test.

It’s heart-warming to see that even if you speak broken English like the second guy, you can still get valuable advice on the interwebs.

This week, I throw away all credibility as I try out AOL’s search. If using this website results in me getting another fking AOL starter CD, i’ll sht the roof.

Poor-man’s Benchmarking in Ruby

April 29th, 2009 by Xerxes No comments »

As part of evaluating several libraries for specification testing in Ruby (MSpec, RSpec, Bacon), I wanted to benchmark the performance of the library against a simple suite of tests to see if one was particularly slower than any of the others. It wasn’t intended to be very scientific but to at least expose a slow framework, if any.

Each benchmark was performed by creating a suite of specifications based around Bacon’s whirlwind sample (consisting of 5 specs), and executed the suite 10,000 times. This benchmark test was run 5 times in order to weed out any statistical anomolies. Nb: For this analysis, I didn’t benchmark RSpec because it’s not terribly compatible with IronRuby just yet.

The results and code can be found below. In a nutshell, it does seem as though MSpec performs slower than Bacon, but when you consider that over a 50,000 test sample it was only (roughly) 10 seconds slower, the difference is negligible.

  Bacon MSpec
Run 1 27.642 37.337
Run 2 25.598 37.755
Run 3 25.607 37.424
Run 4 25.317 36.439
Run 5 25.105 36.352
# This is the code for the spec test wrapper.
# To execute, save the file as "spec_runner.rb" and execute
#    ruby spec_runner.rb
#
#

ITERATIONS = 10000

require 'rubygems'

@old_stdout = $stdout
$stdout = StringIO.new

def milestone(n)
$stdout = @old_stdout
puts ("Reached milestone: ##{n}")
$stdout = StringIO.new

end

def time_it(&func)
	start_time = Time.now
	1.upto(ITERATIONS) do |n|
		func.call
		milestone(n) if n % 1000 == 0
	end
	end_time = Time.now
	end_time-start_time
end

bacon_time = time_it do
	load 'whirlwind_bacon.rb'
end

mspec_time = time_it do
	load 'whirlwind_mspec.rb'
end

$stdout = @old_stdout
puts "bacon time: #{bacon_time}"
puts "mspec time: #{mspec_time}"
# This is the Bacon test file. Save it as "whirlwind_bacon.rb"
#
#

require 'bacon'

describe 'A new array' do
	before do
		@ary = Array.new
	end

	it 'should be empty' do
		@ary.should.be.empty
		@ary < < 1
		@ary.should.include 1
	end

	it 'should have zero size' do
		@ary.size.should.equal 0
		@ary.size.should.be.close 0.1, 0.5
	end

	it 'should raise on trying fetch any index' do
		lambda { @ary.fetch 0 }.
			should.raise(IndexError).
			message.should.match(/out of array/)
	end

	it 'should have an object identity' do
		@ary.should.not.be.same_as Array.new
	end

	palindrome = lambda { |obj| obj == obj.reverse }
	it 'should be a palindrome' do
		@ary.should.be.a palindrome
	end
end
# This is the MSpec test file. Save it as "whirlwind_mspec.rb"
#
#
require 'mspec'

describe 'A new array' do
	before do
		@ary = Array.new
	end

	it 'should be empty' do
		@ary.should be_empty
		@ary < < 1
		@ary.should include(1)
	end

	it 'should have zero size' do
		@ary.size.should.equal 0
		@ary.size.should be_close(0.1, 0.5)
	end

	it 'should raise on trying fetch any index' do
		d = lambda { @ary.fetch 0 }
		d.should raise_error(IndexError, /out of array/)
	end

	it 'should have an object identity' do
		@ary.should !equal(Array.new)
	end

	palindrome = lambda { |obj| obj == obj.reverse }
	it 'should be a palindrome' do
		(palindrome.call @ary).should be_true
	end
end

Search Engine Showdown: Google vs Yahoo Search

April 27th, 2009 by Xerxes 2 comments »

In the second installment in my series of evaluating search engines, I take a look at Yahoo’s search offering – specifically the locally-branded Yahoo7 search

The first test – TICK. A Yahoo search on my name turns up very good results. My website first, and underneath that one of my blog posts. Closely followed by Facebook and LinkedIn. If i wanted to stalk myself, this is clearly a good place to start.

A cute little feature is that my Facebook search result contains deep links to come Facebook features like “Send Message“, and “Poke“. Way to get in with the 2.0, Yahoo.

After that, it starts getting a bit weird, and the results lose a lot of meaning. Some old documentation I wrote when in another job shows up on the first page, despite it being excessively out-of-date and not updated for at least 3 years, I didn’t think this content would fare at all.

In terms of visuals, the search results are very Google’esque…nay, identical. Yahoo results are minimalistic with Web, Image, Video, News, Maps and More at the top of the screen and a link to the cached version located conveniently in a position which makes defending a case of plagiarism from Google infinitely hard. I guess the up-side to this is that people will hit Yahoo search results and feel like they’re in familiar territory.

Which I guess leads me into Yahoo’s foray into Federated search called Alpha. Yahoo claim that “Alpha is a new beta product from Yahoo!7 that introduces the concept of Federated Search. With Alpha, you can search across many different information sources all on one place”.. Holy tuna, batman! “Search across many information sources from one place”?….Sounds like a regular search engine to me. *bored* The quality of search results don’t appear to be any different to regular Yahoo, but the UI is very different. Kind of like Live Search (and we all remember how that went)…

<fast-forward one week>

I’ve been using Y7′s search now for the week and I have to admit, I was acutally quite comfortable with the results it was giving. When evaluating MS Live Search, I was constantly living in this fear that I was missing quality search results and would fall back to Google just to make sure I was getting the right information when I needed it. However with Yahoo, I felt confident enough with what it gave me to not feel like I was missing out on good results. I honestly feel like I could replace Google with Yahoo if I needed to (which I don’t).

The next engine to go under the knife – Ask.com. They don’t have any locally branded content, and I’ve just got a gut-feeling this will be a difficult week :|

Getting CruiseControl.NET To Talk To Git

April 22nd, 2009 by Xerxes 3 comments »

CruiseControl.NET is an automated build system ported from Java to the .NET framework. The current stable release of CCNET is v1.4.3. Unfortunately this version of CCNET does not natively support using Git as a source control provider. So if you’re making the switch from (say) SVN or VSS, at the time of writing, you will have a few bumps in the road ahead. NB: This page assumes you have a working copy of git running on your machine

To get Git working with CCNET, I found the excellent ccnet.git.plugin project on Github. This code is a plugin for CCNET which exposes basic functionality (and a little more) to allow CCNET to use Git as a source repository.

Firstly you need to download said source and compile the binaries. In case you’re super lazy, here’s one I prepared earlier – ccnet.git.plugin binary download

The plugin works by dropping it straight into your CCNET server’s folder with the other binaries. In most cases, this will be c:\Program Files\CruiseControl.NET\server\. Make sure your restart CCNET.

The next thing is to configure your project to use git as the source control provider. The README has an excellent example of how to configure the project. My initial project block ended up looking something like this (renamed to protect the innocent):

  <project name="FittingApp.Project" queue="FittingApp.Project">
    <sourcecontrol type="git">
	<repository>git@bumblebee:FittingProject.git</repository>
	<timeout>30000</timeout>
	<executable>c:\program files\git\bin\git.exe</executable>
	<workingDirectory>C:\build\projects\FittingApp.Project\</workingDirectory>
    </sourcecontrol>

    <triggers>
    </triggers>

    <tasks>
    </tasks>

    <publishers>
      <xmllogger />
      <statistics />
    </publishers>
  </project>

One important thing to note is that the README (at the time of writing) doesn’t mention the timeout element you can use in your configuration. The default value is quite high. I prefer to lower it and found this property by perusing the tests.

Finally after all that, everything should be done and ready to rock, right? Turns out not so. One problem I stumbled into (and took a while to resolve) was the build timing out when it was doing a fetch. The CPU was idle and there was no traffic over the network. The process would timeout and the build would fail. The funny thing was that I could open a command-prompt console myself and fetch the remote repo no problem. But when being performed by CCNET, it would timeout during the fetch.

Afer digging further, it looked like the SSH authentication wasn’t working and that the auth process didn’t accept the default SSH credentials I created earlier. I suspect it was waiting for me to enter a password for the remote git account. Of course there’s no interaction with this process so eventually it times out. After a long back-and-forth with the problem, I got in touch with the author of the plugin and he suggested checking that the HOME environment variable is set to %USERPROFILE%, otherwise git wouldn’t be able to find the git config settings. This solved the problem, and the build started working sweet. (big props, Kevin – thanks :) )

With all that done, you should now be staring down the barrel of a CCNET installation successfully talking to Git. Hope this helps someone else out there.

Search Engine Showdown: Google Vs Live Search

April 20th, 2009 by Xerxes 1 comment »

In my (what i hope to complete) series of comparing Google to other search engines, The first engine i’m testing out is Microsoft Live Search.

I guess the first (obvious) thing to try is to search for my name (on a side note, i’m soooo tempted to use “google” as a verb, but that would be inappropriate when testing out the competition, no?). The first two results are correct, or at least relevant (ie: my website) and the rest of the results are neither here nor there in terms of relevance – there really wasn’t a whole lot it could do with my name except find literal matches in page content.

One thing which surprises me is that the results returned vary greatly depending on how many results I show on each page. Setting the limit to the minimum of 10, and I seem to get results all about me and most of them pretty relevant chronologically. However if i switch it to 30 results per page the niceties pretty much drop dead as the search results spew into variations of my Facebook profile in different culture sub-domains (ja-jp being the most relevant out of about 7 others). FAIL.

Live search, (unlike Google) has a neat little equation solver (example), which would have been great about 10 years ago when i was actually doing calculus and solving quadratic equations. Relevant now? Probably not. I would expect the Wolfram Alpha to drop a big steaming shizzle all over this feature given it’s company history. So Maths equation solving – FAIL.

One problem i’m finding is that i’m just not used to the format of search results from Live search. I find that if what i’m searching for is not essential to what i’m doing (or should be doing) it turns me off and i want to just leave the site without getting any results. This is a very bad UX and it’s probably all in my perception of what “good” search results look like.
Must fight urge to judge a book by it’s cover.

……<fast forward a few days>……

and so it is i come to the end of the week and in all seriousness it couldn’t come soon enough. I tried, I really did. Microsoft has a loooong way to go before they could even begin to think about claiming that their search engine is actually a competitor to Google, and not just another smoking pile of crap. You know things are in trouble when you need to create a short-cut to Google’s search because the Live results are just plain inadequate.

Suffice to say, i’m very disappointed with Live Search and don’t think it’s ready to be considered a contender for search king of the net. i’m glad to get my browser away from it and move onto something else.

Yahoo – stand up. You’re next.

Managing Multiple Git Accounts

April 18th, 2009 by Xerxes 1 comment »

I’m in a situation where I want to keep different settings for several Git repositories. My work’s Git repo and settings (like email address and private key) would be different to my GitHub email address and key.

After following the setup details on GitHub of how to setup username and email for github, and providing your SSH keys , I was left in an awkward situation where my global configuration was setup for GitHub, but didn’t know how to configure my work repo to authenticate properly.

It turns out that Live search does actually work for one scenario, and I found another guide on GitHub explaining everything required to configure multiple Git accounts.

  1. What’s most important is knowing that unless you’re using the same public/private key pair, you will need to generate a new key for the server, and give it a filename different to the default id_rsa
    $ ssh-keygen
    Generating public/private rsa key pair.
    Enter file in which to save the key (/c/Documents and Settings/Xerxes/.ssh/id_rsa): /c/Documents and Settings/Xerxes/.ssh/id_rsa_github

    This file needs to be given a name different to the default id_rsa, ideally consisting the name of the repo.

  2. Once the key is generated, you need to create a config file in your ~/.ssh/ directory. This file allows you to configure connection settings per repository, overriding the global values set earlier.
    Host github.com
      HostName github.com
      User git
      IdentityFile ~/.ssh/id_rsa_github
    

    Save that file.

  3. One final step in the mix is to configure the repo itself to use the correct email address when committing to the git repo. This is really only to ensure that the commit history has a valid email address associated to it. For instance, I don’t want my private email address being recorded in my work commit logs, and similarly I don’t want my work email address getting recorded in my GitHub commit logs.

    I’m sure there would have to be a way to do this using the console, but the way I know to set the email address for a single repo is to use the git gui command, goto Edit -> Options and do it via the interface.

    git gui repo configuration

    git gui repo configuration

Now you should be right to issue any commands to GitHub and have it authenticate using the key. When you push back to the origin, it will now also use the repo settings and not the global settings.

EDIT: For some reason, i omitted the “.com” in the github.com host entry. Thanks to @davetchepak for the pickup

Specification Testing in .NET using Ruby

April 15th, 2009 by Xerxes 1 comment »

Not long ago, I posted a big fat blat of information from my investigation in trying to get Ruby based spec testing integrated with .NET. In this post, I make some sense of all that content and (more importantly) drop a sample of taking advantage of this. (nb: This post is essentially a direct-rip of an internal document I created for this purpose)

Overview

The purpose of this page is to run through a process which will ultimately allow the reader (thats you) to write Ruby based specifications for your .NET code.

Why?

Why would you want to do this? The intended purpose for this practice is to gain the most benefit when doing BDD. Trying to do BDD in C# results in a lot of syntactical noise in the code which distracts from the goal of having clear, readable specifcations of how the intended function should behave. Additionally, any traditional C# BDD toolset requires the specifications to be statically compiled into a test binary in order to be executed. The advantage of using Ruby is that the scripted nature of the language allows physical (as well as logical) separation of speficiations from code, opening up the realm of possibility that specifications are written by non-technical folk. Furthermore, the Ruby syntax lends itself to building DSLs perfect for the purpose of allowing clean, almost human-readable code.

Scope of investigation

The investigation work preceding this post was set 3 goals:

  • Determine the viability of using Cucumber as an automated feature verification utility
  • Determine the viability of using RSpec as an automated specification verification utility
  • Determine the viability of using IronRuby as conduit to allow Ruby specificaitons to execute against compiled C# code. (applicability to any other CLR-supported language is then assumed).

Investigation results

The results of the investigation showed that:

  • RSpec is currently not supported on IronRuby due to a number of bugs in the IronRuby project. (Based upon a discussion with @jschementi and this article (toward the bottom))
  • Accordingly, Cucumber is currently not supported on IronRuby due to it using RSpec internally (explained onthe ruby forums)
  • The IronRuby team have worked to incorporate support for a more lean specification testing tool MSpec which is very similar in syntax to RSpec, but not as functionally complete.
  • MSpec will work with IronRuby to write Ruby based specifications to verify .NET compiled applications.

What tools are we using?

Based on the results of the investigation, the best way to approach this method of testing is to use the MSpec library to write specifications against C# code, execute them using IronRuby and in future, once IronRuby is more stable we can look to migrating over to cucumber for feature-style verification on top of M/RSpec.

RubyGems

RubyGems is a package which allows you to download ruby components and utilities (known as Gems). The default RubyGems package which comes with the one-click installer might be outdated when you download it, so the best thing to do here is to update RubyGems to the latest version

gem update –system

In the event you’re behind a company firewall, or you need to use an HTTP proxy for whatever reason, you need to tell the GEM command to use the http proxy as it doesn’t honour your default internet options. Substitute the server and port where appropriate and then run the gem update:

SET HTTP_PROXY=http://your.proxy.server:3128

You’re now minty fresh with the latest RubyGem package.

IronRuby (IR)

Go and download the latest release of the IronRuby project from ironruby.net. The current “official” pre-release release is v0.3, and it doesn’t have any installer. To “install”, make sure you extract the contents to the location

c:\ironruby\*

This is the standard installation location for IR. Once there, it’s recommended that you update your system path to include the path your IR’s bin folder.

Setting PATH Environment for IronRuby

Setting PATH Environment for IronRuby


Required Gems

We now get to the part where you need to install some of the gems required for specification testing. As mentioned at the start, RSpec and Cucumber isn’t 100% working with IR just yet, however it’s worthwhile installing them anyway to test things are working as expected and whatnot.

gem install mspec
gem install cucumber
gem install win32console
##gem install rspec
##gem install hoe

The last 2 should automatically be installed when you install cucumber as they’re dependencies, but if they don’t make sure you install them! If you really want to keep it lean you can get away with just mspec and none of the others.

Now the standard install of IR has its own repository of gems which can be managed thorugh IR’s igem utility. The reason we don’t use IR’s gem utility to install mspec is because mspec is a pretty special script which (basically) allows us to tell it which ruby interpreter we wish to use for running tests (explanation, thanks to @jredville). The neato thing here is that we then don’t need to install mspec speficially for IR, we can repurpose the MRI’s version.

Testing it out

Now that all the major stuff is installed lets test it out by creating a simple app in C# and write a specification in Ruby to verify its behaviour. Create a new folder to save the source files in.

using System;

namespace HelloWorld
{
    public class HelloClass
    {
        public string SayHello()
        {
            return "Hello from C#";
        }
    }
}

Here we have a class which returns a string when the method SayHello() is invoked.

require "mspec"
require "HelloWorld.dll"

describe "the hello dot net app" do
	before do
		@app = HelloWorld::HelloClass.new
	end

	it "should say hello from c#" do
		@app.say_hello.to_s.should == "Hello from C#"
	end
end

This is our specification for the behaviour of the application.

To compile the C# class, open up a Visual Studio Command Prompt, CD to the source directory and type

csc /target:library /out:HelloWorld.dll HelloWorld.cs

…and now to run this puppy:

mspec -t c:ironrubybinir.exe sayhello_spec.rb

Here, we are invoking the mspec ruby script and passing two arguments. The first -t c:ironrubybinir.exe tells the ruby script that we with to execute the mspec specifications using a different Ruby interpreter to MRI. The interpreter we want to use in this case is IronRuby. The second argument tells it which spec we’re running. When mspec runs, it finds the -t argument and hands-off execution of the spec to another instance of mspec executing under IronRuby. This gives us the flexibility of being able to execute standard ruby specs and also calling out to IronRuby for .NET interop if needed.

The observant of you might notice that the call to @app.say_hello has a to_s chained afterward. IronRuby will return a ClrString as the object type when the interop call returns a CLR type string. the CLR’s ClrString and Ruby’s string are not interchangeable. You need to call to_s on the ClrString to treat it like a ruby string. This behaviour is at least explained, albeit I need to dig deeper to understand why they couldn’t have an implicit cast operator (or dynamic language equivalent thereof).

One thing that’s important to note is that although i’ve dropped a few source files without too much explanation, you would actually build this up iteratively using the same TDD testing style you’ve always been used to. In fact, this form of specification testing makes test-first easier to do.

Further work

This article gives a straightforward overview of how to begin testing C# code with Ruby but it doesn’t go all the way.

  • Ideally we would like to use Cucumber for automated feature acceptance test verification. Unfortunately the current build of IronRuby doesn’t work with Cucumber and RSpec but there should be ways to get the current IR implementation to work with a few tweaks.
  • Need to define and configure a standard project skeleton such that you don’t need to download and extract IR in order to get the system working. In a perfect situation, we could download only the source for the software without requiring any dependencies installed (including ruby!).

One Search Engine Per Week

April 14th, 2009 by Xerxes 5 comments »

I had a little time over the long weekend to reflect on things in the past, and one conversation which came to mind was a casual chat with an engineer at Yahoo7 I met at a party about a year ago. I can’t for the life of me remember his name, but I do remember our conversation.

Maybe it was bravado, maybe it was arrogance, and it certainly was alcohol induced, but I asked him point blank “You work at Yahoo7. Compared to Google, how do you personally find Y7′s search results?”. Not unlike me to put some fuel into the fire, I was kind of expecting him to defend his company, defend the search engine backing his company’s website, stomp his foot and slap me across the face with a glove………and he did (except for the glove).

The reason this moment stuck with me wasn’t because he launched into a tirade of fact vs fiction and MapReduce mumbo-jumbo, but because his answer was a brutally honest “Personally, I find the results on par and sometimes maybe a little less than Google, but the real test is in inviting you to try it”. The night went on, and i’m sure i stumbled into a taxi and got home safely, but I never really forgot his response.

Admittedly i’ve been putting it off for a while, and on occasion i’ve considered doing it but always found an excuse to stay within the comfort zone that Google provides. Well that changes this week as I’ve finally decided to bite the bullet and drop Google for a few weeks as I try using a different search engine each week in my daily routine and see how it feels and whether all search engines really are so close that Google’s superiority of results is just perception.

Having looked at some statistics of search engine market share, the candidates up for testing in this very un-scientific assessment are:

This at least helps me weed out most of the smaller players and the engines of eras long since forgotten.

Where appropriate, i’ve tried to use locally branded variants of the website purely for my own benefit. I’ve thrown the Wolfram Alpha into the list because it generated significant interest in the blogosphere in the last 30 days to at least warrant a look once it’s released.

Starting this week, i’m going to try Live search. It’s set as my default search engine for Chrome and i’ll be consciously trying to use it over Google..

Wish me luck.

Xerxes Future Predictions – Computing Power

April 9th, 2009 by Xerxes No comments »

I was thinking a few days ago about how things have changed in IT and software over the last number of years and how some things were overwhelmingly successful and of course others were quite underwhelming.

This got me thinking about what will probably happen in a few years, and I thought i’d put it down on paper (of sorts) to look back at in a few years time and wonder wtf was I thinking. NB: I don’t make these statements because I actually think I know – i’m just taking educated guesses here…

In terms of computing power, we’ve seen Moore’s Law hold true for over 40 years, albeit with a slight shift around the early part of this decade when we started reaching the physical speed limits of single-core processors. However this problem was easily circumvented as manufacturers continued to reduce transistor size and throw more cores on the same die. Inherently this has shifted the problem of maximising software performance from raw CPU throughput to parallelism of software operations.

Of course despite all of this progress, there will be a physical limit that current design and manufacturing processes can sustain. Beyond that, we would need to look into quantum computing to continue pushing down the speed and size of computing.

Xerxes Predicts:

  • 1-2 years:
    • Consumer computing power will remain dual-core for netbooks and laptops
    • Quad-core processors will become more common for desktop machines
    • Transistor manufacturing technology will remain largely in the 45nm range, possibly getting smaller.
    • Quantum computing will remain more research based
  • 2-5 years:
    • Power-efficient processors like Intel’s Atom used for netbooks will get faster in CPU – meeting current desktop speed limitations
    • Note/netbooks will have quad-core standard, with more power pushed to individual components (eg: video, I/O)
    • Desktop computing will move to around 8-cores up to 10 or 12 per processor, however heating considerations will become more prevalent
    • Quantum computing will make its foray into the commercialised world. Until the technology is commercialised it will have virtually nil adoption. Possible uses for commercial quantum computing would be in research facilities for genome decoding, genetic folding or weather modelling.
  • 5-10 years:
    • Consumers will discuss CPU-cores in their computers as we currently discuss CPU speed
    • Quantum computing will become more prevalent in dedicated hardware electronics like Cisco routers or hardware firewalls.
    • Consumer commercialisation of quantum computing is still a little bit away but will bolster a new generation of technological advancements

Ruby and .NET for the uninitiated

April 6th, 2009 by Xerxes 1 comment »

Coming with no experience in both Ruby or IronRuby, this is just a big dump of my thoughts on the topic as i make my way around to learn them both and bring them into direct use within our project.

  • Most frustrating thing i want to say upfront and get out of the way now – almost all documentation i’ve found for Ruby (the language) is discussed in the context of using Rails. Whilst this might be true for 95% of Ruby users, it frustrates the poo-poo’s out of me. Rails is quite obviously a very advanced framework, and it really helps blur the distinction between what is “Ruby” and what is “Rails”. Particuarly to do with shortcuts provided by the rails framework I haven’t found elsewhere (for example, ruby script/generate rspec)
  • Installing Ruby is as difficult as downloading the installer for Windows and running it. That’s where my degree pays for itself.

  • Of course depending on how old the download package you’re using is, you’ll need to update your gems to the latest version with gem update –system

  • In the event you’re behind a company firewall, or you need to use an HTTP proxy for whatever reason, you need to tell the GEM command to use the http proxy as it doesn’t honour your default internet options. SET HTTP_PROXY=http://your.proxy.server:3128, substituting the server and port where appropriate.

  • When it comes time to play with IronRuby, you’ll need to install a few gems which didn’t seem to come with the pre-built version of the assembly.

    Firstly, make sure you put your IronRuby bin directory into the path. If you want to use RSpec (which you do), then you’ll need to igem install rspec and igem install hoe. if you do igem install cucumber you should get both for free.

    Secondly, it seems like recent changes to the IR project have broken the implementation of the expand_path method. Although there is claimed to be an available workaround, I have not had any luck thusfar with it. (Will try recompiling from source). A quick chat with @jschementi confirms there is something broken with the current build and will be fixed soon.

  • mspec works, though you might have some complications with getting mspec to use the right ruby interpreter. big props go out to @jredville for helping me sort that one out. If you run mspec by itself or try to invoke it directly through IR, you will invariably end up with an error like this:

    c:\source\ruby\calcdotnet>imspec spec
    ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]

    1) An exception occurred during: loading c:/source/ruby/calcdotnet/spec/calculator_spec.rb ERROR
    LoadError: 127: The specified procedure could not be found. – Init_calculator
    c:/source/ruby/calcdotnet/lib/calculator.dll
    c:/source/ruby/calcdotnet/lib/calculator.dll
    c:/ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require’
    c:/source/ruby/calcdotnet/spec/calculator_spec.rb:1

    Finished in 0.090000 seconds

    1 file, 0 examples, 0 expectations, 0 failures, 1 error

    The trick here is that mspec is designed to allow itself to swap one copy out for another. If you pass mspec the argument “-t c:\ironruby\bin\ir.exe” it will use IR instead of the ruby interpreter. eg: mspec -t c:\ironruby\bin\ir.exe spec/

Organising GMail’s Inbox and Weirdness with Starring Items

March 19th, 2009 by Xerxes 1 comment »

If you’re like me, you’ve already thrown out the system of creating folders and meticulously sorting your email on arrival. It was a frustrating system because it was time-consuming and lets face it, humans aren’t designed to be organised machines. So I love the GMail model where I don’t *have* to file my email I can just read it and forget about it.

My system for prioritising mail works like this:

  1. If its obviously spam, I spam it
  2. If its new, I read it.
  3. After reading, if it was useful information i’ll do one of two things:
    1. If I need to follow up on the item, i’ll mark it as unread
    2. If I need to keep it for reference because I suspect i might need it quickly in the near future, i’ll star it.
  4. All other mail stays in the inbox, or is already covered by another filter which applys a label and archives.

This means that my inbox can usually fill pretty quickly. I like being in control of what I archive and when, but the downside is that my inbox can grow out of control before I take it back. The important point here though, is that the software works for me and not the other way around.

My inbox is generally my default view, but I’ve also used some of the GMail labs features to set-up a system to quickly find the items i’d previously flagged for one reason or another. Using GMail’s QuickLinks lab feature, i’ve setup a search shortcut to my “Unread Or Starred” emails (label:starred OR is:unread in:inbox). Direct access, and i’m happy once again.

Unfortunately, ultimately I get to the point when even I realise just how much crap i’ve accumulated in my inbox. In order to sort out my “flagged” emails from all the other crap in the inbox, I follow the following process:

  1. Open my “Unread or Starred” search link, and apply a label called “TEMP MARK” to the emails. Gmail allows you to have multiple labels per message so no sweat here
  2. Perform a search (or use another Quick-Link shortcut) to the following search criteria (!label:temp-mark in:inbox). This does an inverted selection on my inbox to pick out anything that doesn’t meet my criteria
  3. Hit “Select all” and archive.
  4. Go back to the inbox and remove the temp mark from the current inbox items. mainly for cleanup

Now the astute of you reading this would ask why don’t I just use an inverted query of Unread or Starred (!label:starred AND !is:unread in:inbox) and use that? Well i’ve tried it and GMail screws it up and somehow returns more emails than it actually should. I really cant understand why….

All suggestions are welcome :)

UPDATE
I just spent 10 mins trying to work it out – its so baffling….
My inbox contains 10 items, of which I have specially flagged 7, I would expect the inverted query to return 3. But it returns 5. What are the 2 extra emails? They are emails which were starred from within a conversation and not from the item list. It seems like GMail doesn’t recognise items which are starred within the conversation view as having the label “starred”, so my initial query of “Unread Or Starred” isn’t actually bringing everything back in the first place. *sigh*

References are not addresses

February 19th, 2009 by Xerxes No comments »

Just read this excellent post by Eric Lippert about why it is incorrect to describe references in .NET as a pointer to a memory location

In the article he goes on to explain the differences between a pointer and a reference, and why they are not mutually interchangeable.

I have to admit, I’ve been incorrectly defining a reference. After reading this article from now on my definition of a “reference type” will be a type which contains a reference to an object held internally within the .NET GC. This reference is not *necessarily* a pointer and should be thought of more like a unique handle to the GC’s object than a pointer to a memory address.

Finding out what’s process is listening on port 80

February 14th, 2009 by Xerxes No comments »

Just saw an excellent tweet by @chadmyers:

FYI, to find which process has an open port: netstat -o -n -a | findstr 0.0:80
Where “80″ is the port in question (i.e. 80, 443, etc). Get the PID (i.e. 5688) and open Task Manager, proc tab, add the PID column, sort

Awesome hack – i didn’t know this. More specifically, I didn’t know Windows had a poor-man’s grep in findstr

Take that, Skype!

What’s In A Code Review?

February 5th, 2009 by Xerxes 7 comments »

My team at work has a system whereby after a task is completed and committed to source control, it is assigned to a developer unrelated to the task for a code review. The idea here is that fresh eyes on the problem may pick up something not considered by the original participant(s).

But what happens during a “code review”? There’s no single answer to this question and my experience is that it varies considerably based on a number of factors. After thinking for a little bit about what I do when code reviewing, I came up with a list of stages in a code review and figured I might just share it with everyone. The list below outlines the process I (personally) go through when reviewing a change. I don’t actively pursue these topics one by one but I noticed that I do consider them subconsciously.

  • What was the problem being solved?
  • Do I have a clear understanding of the problem?
  • Can I reproduce the problem?
  • What code was checked in?
  • Is the code change unit tested?
  • Does the code change fix the problem (functional test)?
  • Does the code change address the problem (code analysis)?
  • Does the change consider other possible related problems?
  • Does the code change introduce new bugs?
  • Are there any stylistic recommendations?

What was the problem being solved?
Coming in as a fresh set of eyes means that I don’t know the problem apart from the very high-level description I heard about during our stand-up meeting. If the change is trivial, or if there is enough information in the issue system, then I won’t bug the developer, otherwise I will ask them what the change was about and get them to describe the problem being solved.

Do I have a clear understanding of the problem?
For issues I don’t immediately understand, I tend to ask questions until I get an “aha” moment. With a solid understanding of what the enhancement or fix was, I’m better armed to think both inside and outside the box for any potential problems.

NB: Thinking outside the box is great, but if you can’t even think within the box, then you’ll struggle – ask as many questions as you need to until you completely get what’s going on.

Can I reproduce the problem?
This step is optional depending on the situation. I say that very carefully, because I’m of the opinion that if you can reproduce the problem then you should just to confirm it actually existed in the first place. When working with a junior developer once, he spent a few hours trying to fix a problem and during code review we discovered that it was an environmental specific setting on his machine, not a problem with the system. This is a good reason to reproduce the problem first. Depending on the seniority of the developer, I’ll make a judgmental call as to whether to try and repro or not, where more senior developers are more likely to be trusted in having identified the real cause of a problem.

One situation where you might not reproduce the problem at the start is based on your development workflow. If you perform reviews after the change has been made, then the only way you can reproduce is before you sync to the latest version (which contains the change) or you have to rollback the change. I don’t particularly like using this as an excuse though, and if i can comment out a few “key” lines in order to repro a problem, i will.

What code was checked in?
Source control is great. What was checked-in for the fix? Generated code files? Hand-made changes? modified related binaries?

If a file was committed because a modification was necessary, then (der) it had to be checked in. But what about some files which are modified but do not necessarily need to be checked in? One example in Delphi is when you modify a module, it makes changes to both the code file (.PAS) and the visual form file (.DFM). The PAS changes consist of your code, and unless you made specific UI changes, there is no reason why the designer should have changed the .DFM. It’s silly, it’s redundant and worse yet if it gets committed it adds unnecessary noise to your change history.

Another example more .NET pertinent would be autogenerating code from a DB or XML or whatever. If the schema hasn’t actually changed, then the only difference in your files will be an autogenerated comment of the date/time the file was modified. But the interfaces haven’t changed so who cares what date the file was last generated?

I hate making this mistake so i’m sure to point it out to anyone’s code I review.

Is the code change unit tested?
I wont get into the good vs evil of unit testing so much as to say that in most cases, I would expect to see some form of automated test checked in with the code. If it’s not tested, it’s impossible to prove the problem ever existed. I say “most cases” because on some projects (particularly the very brown/black-field ones), some testing is near impossible. I expect that new code which doesn’t interact a lot with old code to have a number of comprehensive tests.

Does the code change fix the problem (functional test)
This is pretty straightforward. Usually you will have a list of steps to reproduce the problem, or a way of examining the new feature. Follow these steps carefully and ensure that the base scenario works as expected. Then start trying alternatives in a methodical approach in order to determine that something immediately around the changed area hasn’t unintentionally been affected.

For instance if you’re testing that when a button is clicked and option A is turned on, the widget will somersault. Then check that when the button is pressed with only option B enabled, the widget does a pirouette. Once satisfied check that with both options, the widget still somersaults and also pirouettes at the end.

The number of permutations will obviously change and increase exponentially the more variables you introduce to the problem. You can improve this by using the theory behind Pair-wise testing

Does the code change address the problem (code analysis)
Okay so you’ve tested the change from the front-end and it appears to work. Now what was changed in the code? There might be plenty of ways to solve a problem and it’s entirely plausible that the worst one was taken.

For example assume the following bug: When you click button A, an exception dialog is shown claiming that the widget was null and the process couldn’t continue. The worst fix in this case, is to swallow the exception and do nothing about it, potentially leaving the system in an unstable state. A better solution would be to find out the cause of why the widget was null and prevent the situation from arising in the first place.

This is not a question about style, or design (anti)patterns or comments – this is about whether the implementation was appropriate for the problem.

Does the change consider other possible related areas?
Joe makes a fix to the CarpetWidget class and sends it to you for review. Has Joe considered that the same problem exists in the HardwoodWidget, or the TileWidget? Sure you might have functionally tested all relevant combinations, but tracing through the code in your head may expose other areas conditions which you previously may not have considered, or may have deemed irrelevant.

Does the code change introduce new bugs
So the developer has fixed the problem and they’ve covered all possible conditions – great! Now what’s wrong in THEIR code? New bugs can be easily introduced without realising and can be quite insidious. One example of this might be using an unmanaged resource and not disposing after use. Or catching an exception and not re-throwing it the correct way. Simple mistakes which may never show up, but if they did they could be considerably problematic.

Are there any stylistic/design recommendations?
This is by far the most abused by most people performing a code review. In my experience most developers generally fall into three camps. The first camp are developers who have suggestions to improve style and/or design but are too afraid or shy to come forward and express their opinion. These people are not helpful to a development team. They are short-circuiting the feedback cycle which doesn’t promote knowledge sharing. Everyone has an opinion. Some people have better ones than others. If you don’t communicate, one (or both) parties will never improve.

The second camp consists of developers who have nothing but petty and small suggestions. They’re approach to a code review is “My way would be different, so i’ll pick on all the small things which make me feel better”. For example, changing a “while” loop to a “do..until” loop because it might be semantically correct. Seriously, who cares? The while loop is a much more common looping paradigm than a do-until and therefore more likely to be understood first time by anyone else reading the code. In the absence of any technical reason why the loop should be changed, it’s an unnecessary waste of time and not productive to the end goal (delivering the product).

The third camp consists of developers who acknowledge the flaws in the other two camps and consciously balance what is important and what isn’t. These are people who have a fair number of years under their belt and appreciate the value of time as a commodity. Sure, changing that while loop to a do-until would be cute but the code is already written, it works, it’s tested and just because it’s not mine doesn’t mean it’s bad!

A lot of developers think that a “code review” involves only this one stage and fail to really consider many (if any) of the other points above. (“Yeah it looks good, that’ll do”). I once worked in a place where the code review process consisted of my manager scrolling up and down through the diff-comparison and checking that there looked to be enough lines modified. Needless to say some people are not cut out for the job.

And that’s how it works for me. Writing this post has helped me solidify the process a little more, even if only in my own head. I welcome your thoughts on the topic – maybe there’s something i’m missing?