Archive for December, 2011

Project Euler 1

Sunday, December 11th, 2011

I took a look at the first problem on the Project Euler site earlier this afternoon:

http://projecteuler.net/problem=1

This asks you to find the sum of the natural numbers that are less than 1000 and are multiples of 3 or 5.

I decided that Haskell’s filter function and a lambda would make this problem trivially easy to solve. In GHCi:

Prelude> let sumMultiplesOf3Or5 max = sum(filter(\x -> (mod x 3 == 0) || (mod x 5 == 0))[1 .. max - 1])
Prelude> sumMultiplesOf3Or5 1000
233168

As I played around with other values for max, something unexpected appeared before my eyes. A pattern of repeated digits begins to emerge in the sums of the multiples of 3 and 5 less than 10 to the power of i as i increases.

Prelude> map sumMultiplesOf3Or5 (map (\i -> 10 ^ i) [1..6])
[23,2318,233168,23331668,2333316668,233333166668]

Something similar happens with the sums of the multiples of 3 or 5 less than 20, 200,…,30,300, and so on:

Prelude> map (\j -> map sumMultiplesOf3Or5 (map (\i -> j * 10 ^ i) [1..6])) [1..9]
[[23,2318,233168,23331668,2333316668,233333166668],[78,9168,931668,93316668,9333166668,933331666668],[195,20850,2098500,209985000,20999850000,2099998500000],[368,37268,3732668,373326668,37333266668,3733332666668],[543,57918,5829168,583291668,58332916668,5833329166668],[810,83700,8397000,839970000,83999700000,8399997000000],[1133,114218,11432168,1143321668,114333216668,11433332166668],[1428,148668,14926668,1493266668,149332666668,14933326666668],[1845,188550,18895500,1889955000,188999550000,18899995500000]]

Reformatted:

[[23,2318,233168,23331668,2333316668,233333166668], -- [10,100,1000,10000,100000,1000000]
[78,9168,931668,93316668,9333166668,933331666668], -- [20,200,2000,20000,200000,2000000]
[195,20850,2098500,209985000,20999850000,2099998500000], -- and so on
[368,37268,3732668,373326668,37333266668,3733332666668],
[543,57918,5829168,583291668,58332916668,5833329166668],
[810,83700,8397000,839970000,83999700000,8399997000000],
[1133,114218,11432168,1143321668,114333216668,11433332166668],
[1428,148668,14926668,1493266668,149332666668,14933326666668],
[1845,188550,18895500,1889955000,188999550000,18899995500000]]

I’m not sure if this continues indefinitely. I wonder if similar patterns emerge with numbers in different bases. I’m curious about trying multiples of numbers other than 3 or 5. At this point, I’ve really no idea why this happens, but my curiosity is aroused.

I’m also really impressed with Haskell, a language that I barely know. The only other time I’ve used functional programming seriously is with C# and VB.Net for LINQ, of which I am a huge fan.

GHCN Monthly and MS Access

Sunday, December 4th, 2011

For some of the last few evenings, I have been learning about MS Access’s data import functionality in order to interrogate the data of the Global Historical Climatology Network-Monthly dataset. This dataset holds records of temperature readings dating back to the beginning of the eighteenth century from more than seven thousand stations around the globe.

GHCN Monthly

The data are in text files with a fixed width format. It’s very straightforward to set up the import format (although there are many columns!) and save the import specification so that future datasets can be imported as they are released with ease. The largest data files have almost half a million rows, yet Access can import the data in a few seconds. The resulting table of readings of average, minimum and maximum temperatures for each month for each station has more than one million rows. I have not experimented with adding indexes yet but, in spite of this, I can run queries that are not painfully slow.

Now that I have the data into an Access database, I hope to start analysing the data. I have produced a couple of charts for a single station but am yet to run any serious calculations. I have read of a similar project at:

A Quick and Dirty Analysis of GHCN Surface Temperature Data

The algorithm that caerbannog (the blogger) uses in his C++ program to smooth the data and calculate averages is fairly simple. Something similar should be possible (or even easy) using the MS Office tools.

Ultimately, my aim for this is to make this the basis of an ICT lesson or project. As caerbannog notes “Never in history has science been more accessible to the general public than it is now.” The quantities of data that can be accessed for free are as enormous as the power of the computers that are now cheaply available. I hope that my students will be inspired to look deeply into the issue and this will help develop their sense of empirical curiosity.