Categories
Linux OS/Software Perl Programming Languages Technology Uncategorized

Verifying the Tube station “mackerel” factoid

I’ve heard the following little nugget of information before:

St John’s Wood is the only station on the London Underground network which does not contain any of the letters in the word “mackerel”.

I have never had any reason to doubt this, but neither had I checked it. It popped up again recently on a Facebook thread. Someone suggested that it was not true (or perhaps no longer true) because of Hoxton. However, strictly speaking Hoxton is on the London Overground network, not London Underground.

Anyway, I decided that I should go ahead and check the veracity of the statement. The geeky way.

  1. After a while searching for an existing plaintext list of Tube station names, I couldn’t find one, so instead I downloaded the Wikipedia page List of London Underground stations, had a quick look at the source HTML to see if it would be easy to pull the station names out of the page.

  2. It was; each line of the table starts with <th scope=”row”>, and follows a set pattern after that, as you might expect:

    <th scope="row"><a href="/wiki/Acton_Town_tube_station"
    title="Acton Town tube station">Acton Town</a></th>

    This is all one line in the original HTML, I’ve just broken it to two for display.

    So I can pull out just the lines containing station names using a simple grep.

  3. All I’m interested in is the bit between the opening <a …> and closing </a> tag. At this point I tend to resort to Perl to do anything remotely complex with regex replacement.

    $ grep '<th scope="row">' \
    list_of_london_underground_stations_from_wikipedia.html \
    | perl -pe 's/.*>([^<]+)<\/a>.*/$1/' \
    > list_of_london_underground_stations.txt

    Since I’ve broken out the Perl for this job, I could have thrown away the initial grep and incorporated it into the Perl instead, but this is just a quick hack and it already works, so why bother? I’m not aiming for elegance here.

  4. Verify the list:

    $ cat list_of_london_underground_stations.txt
    Acton Town
    Aldgate
    Aldgate East
    Alperton
    [...]
    Watford High Street
    Watford Junction
    Watford Vicarage Road
    $ wc -l  list_of_london_underground_stations.txt
         275 list_of_london_underground_stations.txt
    

    Looks good. We have just the station names, one per line, and there are 275 lines, which sounds about right. [The list includes a few planned stations at the end. I decided to keep these in.]

  5. Now I can grep for those station names for those not containing any of the letters in mackerel:
    $ egrep -vi '[mackerl]' list_of_london_underground_stations.txt
    St. John's Wood

And there you have it. Assuming of course that the Wikipedia list is correct and complete, the factoid is confirmed. And it took just a couple of minutes. Plus about 20 minutes writing it up afterwards…

Of course, now I can substitute other words for ‘mackerel’ too. For example, there are three stations that do not contain any of the letters in ‘herring’.

For your convenience and further exploration, you may download my plain text list of London Underground stations. Remember, it includes every station on the Wikipedia page as of 4th March 2015, including several stations that are at the planning stage so do not actually exist yet (or not on the current Tube network anyway); these are at the end of the list and are: Battersea, Cassiobridge, Nine Elms, Watford High Street, Watford Junction, Watford Vicarage Road.