People don’t sit on Park Benches at Random

March 16th, 2014

As I was approaching Berkeley Square at lunchtime on Thursday, I noticed a pattern in the way that the people were sitting on the park benches.

2014-03-13 12.45.16-small

Even though it was a sunny day, the square was not very full, and there were about twice as many bench seats as there were people. Nobody wanted to sit too close to another stranger (this is London, after all), so they sat the maximum distance apart that they could. This resulted in evenly distributed people – Empty, Person, Empty, Person, and so on. The sequence cannot be called random because the people are actively deciding where to sit.

I decided to look at a simple random sequence in F#.

To start off with, I decided to define a type for my program. This might not sound like a promising starting point for a program, but in F# it’s so painless that it is a good way to begin thinking about the domain and what states make sense in your domain. For more on this, see Scott Wlaschin’s slides on Domain Driven Design in F#.

My domain is coin tossing. Therefore, I created the following discriminated union in F# interactive:

> type CoinToss = Tails | Heads;;
 
type CoinToss =
  | Tails
  | Heads

This allows a coin toss to be either heads or tails but not both or neither. Static typing allows the program to make definite statements about the world.

I also needed a random number generator:

> let r = System.Random();;
 
val r : System.Random

F# allows you to define infinite sequences that can be evaluated lazily. The following is a potentially infinite sequence of random coin tosses.

> let coins = Seq.initInfinite (fun _ -> if r.Next(2) = 0 then Heads else Tails);;
 
val coins : seq<CoinToss>

r.Next(2) means the next random integer less than 2 but greater than or equal to 0, that is a random 0 or 1.

If the idea of an infinite sequence of coin tosses sounds weird and unruly, here’s how we can make use of them. The take function allows us to tame infinity. Or at least take a little bit of it.

The following says “Take 10 of the randomly generated coins and pass them along (|>) to a function that iterates through them one at a time and prints them out”

> Seq.take 10 coins |> Seq.iter (printfn "%A");;
Heads
Heads
Tails
Heads
Heads
Tails
Tails
Heads
Tails
Tails
val it : unit = ()

That sequence is not very dissimilar from the lunchers on the park benches in the photo above. There are no long clusters of one thing (person, heads, etc.) or another (empty seat, tails, etc.).

However, this isn’t always the case:

> Seq.take 10 coins |> Seq.iter (printfn "%A");;
Tails
Tails
Heads
Heads
Heads
Tails
Heads
Tails
Heads
Heads
val it : unit = ()

or even:

> Seq.take 10 coins |> Seq.iter (printfn "%A");;
Tails
Tails
Tails
Tails
Tails
Heads
Heads
Heads
Heads
Heads
val it : unit = ()

Five tails followed by five heads looks completely fixed. In a casino, you might start to get worried. But it happened. Trust me.

One way to check if the coin is being tossed fairly, is to count how many times each face lands. This function does that:

> let rec countCoins coins (heads, tails) = 
    match coins with 
    | [] -> heads, tails
    | hd :: tl ->
        let newCounts =
            match hd with
            | Heads -> heads + 1, tails
            | Tails -> heads, tails + 1
        countCoins tl newCounts;;
 
val countCoins : coins:CoinToss list -> heads:int * tails:int -> int * int

The way to read this function is as follows.

If the argument coins matches an empty list, return the counts for heads and tails as they are.

Otherwise, coins must be a list with a first element (hd) and the remaining part of the part of the list (tl), which might be an empty list. We need to find the new counts of heads or tails by adding 1 to either count. Finally, we call the countCoins function again with the remainder of the list (tl). This will continue until we have stripped all the first elements from the list and tl is an empty list. The function then terminates as we have satisfied the first condition above.

For convenience, I defined a function that gets a number of coin tosses and puts them into a list to be counted:

> let getCoins numberOfCoins = Seq.take numberOfCoins coins |> Seq.toList;;
 
val getCoins : numberOfCoins:int -> CoinToss list

For a short list, the counts for each face might not be equal:

> countCoins (getCoins 10) (0, 0);;
val it : int * int = (6, 4)

Note that the counts for heads and tails start off as zeroes.

However, with a larger number of coins (in this case, pown 10 7 or 10 to the power of 7 or 10 million), the counts should both approach half the number of tosses:

> countCoins (getCoins (pown 10 7)) (0, 0);;
val it : int * int = (4997646, 5002354)

Walking on Hampstead Heath

March 16th, 2014

Living in London certainly makes one appreciate the few sunny days that we do get.

Half of London seemed to be on Hampstead Heath this afternoon.

2014-03-16 12.55.51-small

It’s unseasonably warm at the moment. It is certainly very strange to see Londoners dressed in flip flops and shorts when the leaves haven’t even come on on the trees yet.

Panorama

I just hope that the good weather lasts!

Using F# to minimise a function

August 22nd, 2013

Finding the minimum of functions is at the heart of optimisation. Mathematicians, engineers and programmers have come up with a large number of approaches to solving this problem, including differentiation, genetic algorithms and even exhaustive search.

Consider a quadratic function such that could be written in F# as

let f x = (x ** 2.0) - (2.0 * x) + 1.0

Finding the minimum means finding the input value for the function that returns to lowest value. If we plot the curve, then the minimum is the lowest point of the curve.

One way to find this is find the derivative of the function:

(2.0 * x) - 2.0

and solve the equation where the derivative is equal to 0, which is when x is 1. This probably the best way to solve this problem in practice.

However, not all functions have derivatives that are easy to find. Therefore, an alternative way to minimise a function is an exhaustive search of the inputs in some range. This is not an elegant or generally efficient solution, but it works.

In F#, we might proceed as follows.

We define the range that we want to search and the step between candidate inputs:

let min = 0.0
let max = 10.0
let step = 0.01

We know where to start searching- with the lowest candidate input:

let firstCandidate = f min, min

This creates a tuple of two floats, the first being the output and the second being the lowest candidate input.

We also want a sequence of tuples of two floats that are the remaining candidate solutions:

let remainingCandidates = seq { for c in min + step .. step .. max do yield f c, c }

Note that at this point the sequence has not been enumerated and the function has not been run with each of the candidate inputs. Because sequences are evaluated lazily, the function will not be run for each of the candidate inputs until it is asked for.

We are trying to minimise the function, therefore we need a function that can compare the first elements of two candidate solutions:

let findMin currentMinSln candidateSln =
    match fst currentMinSln < fst candidateSln with
    | true -> currentMinSln
    | false -> candidateSln

Running this code in fsi shows its type to be:

val findMin : 'a * 'b -> 'a * 'b -> 'a * 'b when 'a : comparison

The F# compiler can infer that the function takes in two tuples, each with two elements. The tuples must be of the same type and the first element of each tuple must be comparable. Way to go, type inference!

Everything has been set up at this point. We just need to run the code:

let minSln = Seq.fold findMin firstCandidate remainingCandidates

The fold function is to reduce a sequence to a single value, starting with a given accumulator. In this case, the first candidate solution that we created earlier is our starting point. The output of the function for each candidate input is compared to that accumulator.

Running the code finds the same answer (1) that we found with calculus.

This code runs pretty quickly, but this approach is generally slow. Our range is only 10 wide and the step is 0.01, so there are only 1000 candidate inputs. Increasing the size of the range or decreasing the step would increase the number of inputs. More worryingly, the size of the search space increases exponentially as the number of inputs to the function increases. So a function that took two inputs for the same range for each input would have a million candidate inputs, three inputs would take a billion and so on.

However, it is also quite straightforward to parallelise this approach. The sequence of candidate solutions could be split up into partitions and each partition could be sent off to a different computer. Each batch would find a local minimum for the range it was given. Finding the minimum for the whole of our range of inputs is simply a matter of find the minimum in the returned list.

The complete code for this can be found here:

https://github.com/robert-impey/CodingExperiments/blob/master/F%23/Loose/MinimiseQuadratic.fs

Project Euler 1 in F# using Sequences

August 21st, 2013

I have written previously about solving the first problem on the Project Euler website using F# and Haskell.

The problem is to find the sum of the natural numbers that are divisible by 3 or 5 that are less than 1000. To solve this in F#, my first solution looked like this:

[0..999]
|> List.filter (fun x -> x % 3 = 0 || x % 5 = 0)
|> List.sum

This works. However, it was pointed out to me that using lists to generate the numbers below 1000 in my F# solution was memory hungry as the list will be created and stored in memory then piped to the filter and so on. The list is evaluated eagerly. This isn’t a problem with a small list like this, but it could create issues with more elements.

I took a look at sequences in F#, which are evaluated lazily. To get an idea of what this means, consider the sequence of natural numbers:

let countingNumbers = 
    seq { 
        for i in 1 .. System.Int32.MaxValue do 
            printfn "Yielding: %d" i
            yield i 
    }

If you are unfamiliar with lazy evaluation, it might look like this will create all the natural numbers up to System.Int32.MaxValue (2147483647) and print each number as it goes. However, if you run this with F# Interactive (aka fsi), it creates a seq<int> without printing anything out. At this point, the sequence is yet to be enumerated.

The ingenious thing about sequences is that we can work with them without enumerating them. Consider these lines:

let max = 999
let firstUpToMax = Seq.take max countingNumbers

The Seq.take function allows us to take the first elements in a sequence. However, running these lines in fsi does not cause the for loop in the sequence above to be run. The print and yield statements are not run.

F#’s ability to delay enumeration is not limited to sequences. Queries also allow us to describe a sequence of numbers without enumerating them.

I define the following function that will be used to filter the sequence:

let isDivisibleBy3Or5 x = 
    x % 3 = 0 || x % 5 = 0

and then create this query:

let divisibleBy3Or5 = 
    query {
        for countingNumber in firstUpToMax do
        where (isDivisibleBy3Or5 countingNumber)
        select countingNumber
    }

Programmers who are familiar with SQL or LINQ should have no problem understanding this query. In plain English, we might say “we will go through the sequence and find the values that match the given criterion.” Note the use of the future tense. Running this code in fsi does not result in enumeration of our sequence and the printing of the “Yielding…” lines.

I could find the sum of the numbers matching our criteria using a built in function like this:

let seqSum = Seq.sum divisibleBy3Or5

This will cause the sequence to be enumerated and finds the same sum as the list method above.

However, by defining the following function:

let noisySum total next =
    printfn "Adding %d to %d" next total
    total + next

I will be able to see the order in which the sequence is enumerated and when the values are pumped down the pipeline to be folded into the sum.

let seqFoldSum = 
    divisibleBy3Or5
    |> Seq.fold noisySum 0

At this point we get the following out put at fsi:

Yielding: 1
Yielding: 2
Yielding: 3
Adding 3 to 0
Yielding: 4
Yielding: 5
Adding 5 to 3
Yielding: 6
...
Yielding: 996
Adding 996 to 231173
Yielding: 997
Yielding: 998
Yielding: 999
Adding 999 to 232169
 
val seqFoldSum : int = 233168

Which clearly demonstrates the order of execution and that only the numbers that are divisible by 3 or 5 are added to the sum.

Does this make much difference to the memory usage? For this problem, this is not the bottle neck. The integers storing the sum of the numbers overflow before the input lists become large enough to use a lot of memory. However, passing values down the pipeline as they are yielded is somehow more pleasing to this programmer. Also, lazy evaluation means that enumerating the numbers and performing the mathematics do not take place until the result is actually needed. By creating a sequence and query, the programmer can create the code and pass it around the program without doing any heavy computation or using up much memory.

The complete listing of this code can be found at GitHub:

https://github.com/robert-impey/CodingExperiments/blob/master/F%23/Loose/Euler1.fs

Finding the Nearest Tube Station with LINQ

August 20th, 2013

I recently wrote a little command line program in C# that can tell you the name of the nearest tube station. I’ve put the program on GitHub in my Coding Experiments repository:

https://github.com/robert-impey/CodingExperiments/tree/master/C%23/NearestTube

The interface is very simple- this more of an experiment than something that I intend for mass consumption! You enter a location via the command line as the latitude and longitude:

Nearest Tube CLI

The starting point of the program was to calculate the distances between points. To do this I adapted some JavaScript code that John D. Cook wrote:

http://www.johndcook.com/lat_long_distance.html

Mathematical code looks very similar in many languages, and the translation from JavaScript to C# was trivially simple as the languages are very similar in this case. This code is part of the Point class:

        /// <summary>
        /// Porting JavaScript code from
        /// http://www.johndcook.com/lat_long_distance.html
        /// </summary>
        /// <param name="that"></param>
        /// <returns></returns>
        public double Distance(Point that)
        {
            // Compute spherical coordinates
 
            double rho = 6373;
 
            // convert latitude and longitude to spherical coordinates in radians
            // phi = 90 - latitude
            double phi1 = (90.0 - this.Latitude) * Math.PI / 180.0;
            double phi2 = (90.0 - that.Latitude) * Math.PI / 180.0;
            // theta = longitude
            double theta1 = this.Longitude * Math.PI / 180.0;
            double theta2 = that.Longitude * Math.PI / 180.0;
 
            // compute spherical distance from spherical coordinates
            // arc length = \arccos(\sin\phi\sin\phi'\cos(\theta-\theta') + \cos\phi\cos\phi')
            // distance = rho times arc length
            return rho * Math.Acos(
                    Math.Sin(phi1) * Math.Sin(phi2) * Math.Cos(theta1 - theta2)
                    + Math.Cos(phi1) * Math.Cos(phi2)) * 1000;
        }

Once I was able to calculate the distances between points on the map, I needed to be able to sort the tube stations of London by distance from the given point. This is the sort of code that LINQ to objects handles very naturally. In the SequentialTubeStationFinder class, I have the following method:

        public TubeStation FindNearestTubeStation(Point point)
        {
            return (from tubeStation in tubeStations
                    orderby tubeStation.Point.Distance(point)
                    select tubeStation).First();
        }

tubeStations is a generic list of TubeStation objects. Each has a Point property that is used to sort the tube stations and find the nearest one to our location.

The declarative style of programming that LINQ uses is much easier to read (and therefore maintain) than the equivalent code written with a for loop.

Trying F#

March 18th, 2013

As part of my ongoing interest in functional programming languages, I have decided to try F#. I have been looking at the Try F# page:

http://www.tryfsharp.org/

This provides an online REPL environment for exploring the language. I have a number of students at school who are exploring JavaScript and Python as first languages through Code Academy. I thought it would be a useful experiment to try this sort of educational activity with an unfamiliar language myself.

At first glance, I quite like the learning experience of reading through explanations and running some code, although it’s not as much fun as trying to make something yourself. In order to consolidate my learning, I tried to solve the first of the problems in Project Euler, which is to find the sum of the natural numbers less than 1000 that are multiples of 3 or 4. (I have already done this in Haskell).

I came up with this:

[0..999]
|> List.filter (fun x -> x % 3 = 0 || x % 5 = 0)
|> List.sum

One liner to show logs without Rotated Backup files

May 25th, 2012

I’ve been looking at the Apache log files on a web server this morning. There are many virtual hosts on the machine and the log rotation scripts have created many numbered backup files of the logs. To make the directory listing more readable, I have been using the following one-liner:

ls -l /var/log/apache2 | grep -Ev -e '[[:digit:]]+(\.gz)?$'

This will only display the current log files, assuming that /var/log/apache2 is the directory in which you store your Apache logs and that you do not store any other files there.

I hope it helps.

Decimal to Binary

April 13th, 2012

In order to prepare for teaching computing, I’ve been reading Introduction to Computing by David Evans.

In the first chapter there is an excellent explanation of binary numbers using binary search trees. You can build a bitstring for the binary representation of a number by starting at the root (highest node on the tree) and comparing the number to the midpoints of a range of numbers that you are searching. If you answer a question with “yes” append “1″ to your bitstring, otherwise append “0″.

The number 6 would be “Yes”, “Yes”, “No” or 110.

I decided to implement the logic of this binary tree in Java.

private static String decimalToBinary(int input) {
    if (input < 0 || input > 7) {
        throw new IllegalArgumentException();
    }
 
    StringBuilder result = new StringBuilder();
 
    if (input >= 4) {
        result.append('1');
        if (input >= 6) {
            result.append('1');
            if (input == 7) {
                result.append('1');
            } else {
                result.append('0');
            }
        } else {
            result.append('0');
            if (input == 5) {
                result.append('1');
            } else {
                result.append('0');
            }
        }
    } else {
        result.append('0');
        if (input >= 2) {
            result.append('1');
            if (input == 3) {
                result.append('1');
            } else {
                result.append('0');
            }
        } else {
            result.append('0');
            if (input == 1) {
                result.append('1');
            } else {
                result.append('0');
            }
        }
    }
 
    return result.toString();
}

This works for converting decimal numbers to binary strings. I think that writing a program like this might work as an exercise for secondary school students for learning about conditional statements in programming and binary numbers.

However, the range of inputs is severely limited (0 to 7 inclusive). I wrote another method and a recursive helper method that can take arbitrary non-negative decimals in the range of Java ints and return the corresponding binary number:

private static String decimalToBinaryRecursive(int input) {
    if (input < 0) {
        throw new IllegalArgumentException();
    }
 
    if (input == 0) {
        return "0";
    }
 
    if (input == 1) {
        return "1";
    }
 
    // Find how many bits we need.
    int bits = (int)(Math.log(input) / Math.log(2)) + 1;
 
    // Find the greatest number that can be represented with this many bits.
    int max = (int)Math.pow(2, bits);
 
    // We want to know how much to change this value by to search from the mid point.
    int delta = max / -2;
 
    StringBuilder result = new StringBuilder();
 
    return decimalToBinaryRecursiveHelper(input, max, delta, result);
}
 
private static String decimalToBinaryRecursiveHelper(int input, int previousMidpoint, int delta, StringBuilder result) {
    if (delta == 0) {
        return result.toString();
    }
 
    int midpoint = previousMidpoint + delta;
 
    if (input >= midpoint) {
        result.append('1');
        return decimalToBinaryRecursiveHelper(input, midpoint, Math.abs(delta) / 2, result);
    } else {
        result.append('0');
        return decimalToBinaryRecursiveHelper(input, midpoint, Math.abs(delta) / -2, result);
    }
}

The recursive method is considerably more complex and would be much harder for a secondary school student to write. The mathematics might be unfamiliar for most secondary school students, particularly the code to do with logarithms. You could simplify the mathematics of that code with a while loop that compared increasing powers of 2 to the input. It might be a possible extension task for a gifted and talented student. It could also be a task for a lesson on recursive methods.

I can see that teaching computing will present challenges. There are an enormous number of toy programs that we can get students to create in order to learn about programming. However, in order to extend any of these tasks to make them more practical and interesting, it is very easy to set challenges that are rely on outside skills, knowledge and understanding that many secondary school students do not have.

The whole program with a main method for testing looks like this:

package decimaltobinary;
 
/**
 * @author Robert Impey
 */
public class DecimalToBinary {
    public static void main(String[] args) {
        System.out.println("Simple Approach");
        for (int i = 0; i <= 7; i++) {
            System.out.printf("%d\t%s\n", i, decimalToBinary(i));
        }
 
        System.out.println("Recursive Approach");
        int bits = 8;
        for (int i = 0; i <= Math.pow(2, bits) - 1; i++) {
            System.out.printf("%d\t%s\n", i, decimalToBinaryRecursive(i));
        }
    }
 
    private static String decimalToBinary(int input) {
        if (input < 0 || input > 7) {
            throw new IllegalArgumentException();
        }
 
        StringBuilder result = new StringBuilder();
 
        if (input >= 4) {
            result.append('1');
            if (input >= 6) {
                result.append('1');
                if (input == 7) {
                    result.append('1');
                } else {
                    result.append('0');
                }
            } else {
                result.append('0');
                if (input == 5) {
                    result.append('1');
                } else {
                    result.append('0');
                }
            }
        } else {
            result.append('0');
            if (input >= 2) {
                result.append('1');
                if (input == 3) {
                    result.append('1');
                } else {
                    result.append('0');
                }
            } else {
                result.append('0');
                if (input == 1) {
                    result.append('1');
                } else {
                    result.append('0');
                }
            }
        }
 
        return result.toString();
    }
 
    private static String decimalToBinaryRecursive(int input) {
        if (input < 0) {
            throw new IllegalArgumentException();
        }
 
        if (input == 0) {
            return "0";
        }
 
        if (input == 1) {
            return "1";
        }
 
        // Find how many bits we need.
        int bits = (int)(Math.log(input) / Math.log(2)) + 1;
 
        // Find the greatest number that can be represented with this many bits.
        int max = (int)Math.pow(2, bits);
 
        // We want to know how much to change this value by to search from the mid point.
        int delta = max / -2;
 
        StringBuilder result = new StringBuilder();
 
        return decimalToBinaryRecursiveHelper(input, max, delta, result);
    }
 
    private static String decimalToBinaryRecursiveHelper(int input, int previousMidpoint, int delta, StringBuilder result) {
        if (delta == 0) {
            return result.toString();
        }
 
        int midpoint = previousMidpoint + delta;
 
        if (input >= midpoint) {
            result.append('1');
            return decimalToBinaryRecursiveHelper(input, midpoint, Math.abs(delta) / 2, result);
        } else {
            result.append('0');
            return decimalToBinaryRecursiveHelper(input, midpoint, Math.abs(delta) / -2, result);
        }
    }
}

This post contains content that was released with the following license:
http://creativecommons.org/licenses/by-nc-sa/3.0/us/
and is published with the same license.

New Domain for the Blog

February 6th, 2012

This blog will now be hosted at

http://www.reversing-entropy.com/

to reflect the title. Any requests to the old address:

http://impey.info/blog/

will be redirected to the new address.

Project Euler 1

December 11th, 2011

I took a look at the first problem on the Project Euler site earlier this afternoon:

http://projecteuler.net/problem=1

This asks you to find the sum of the natural numbers that are less than 1000 and are multiples of 3 or 5.

I decided that Haskell’s filter function and a lambda would make this problem trivially easy to solve. In GHCi:

Prelude> let sumMultiplesOf3Or5 max = sum(filter(\x -> (mod x 3 == 0) || (mod x 5 == 0))[1 .. max - 1])
Prelude> sumMultiplesOf3Or5 1000
233168

As I played around with other values for max, something unexpected appeared before my eyes. A pattern of repeated digits begins to emerge in the sums of the multiples of 3 and 5 less than 10 to the power of i as i increases.

Prelude> map sumMultiplesOf3Or5 (map (\i -> 10 ^ i) [1..6])
[23,2318,233168,23331668,2333316668,233333166668]

Something similar happens with the sums of the multiples of 3 or 5 less than 20, 200,…,30,300, and so on:

Prelude> map (\j -> map sumMultiplesOf3Or5 (map (\i -> j * 10 ^ i) [1..6])) [1..9]
[[23,2318,233168,23331668,2333316668,233333166668],[78,9168,931668,93316668,9333166668,933331666668],[195,20850,2098500,209985000,20999850000,2099998500000],[368,37268,3732668,373326668,37333266668,3733332666668],[543,57918,5829168,583291668,58332916668,5833329166668],[810,83700,8397000,839970000,83999700000,8399997000000],[1133,114218,11432168,1143321668,114333216668,11433332166668],[1428,148668,14926668,1493266668,149332666668,14933326666668],[1845,188550,18895500,1889955000,188999550000,18899995500000]]

Reformatted:

[[23,2318,233168,23331668,2333316668,233333166668], -- [10,100,1000,10000,100000,1000000]
[78,9168,931668,93316668,9333166668,933331666668], -- [20,200,2000,20000,200000,2000000]
[195,20850,2098500,209985000,20999850000,2099998500000], -- and so on
[368,37268,3732668,373326668,37333266668,3733332666668],
[543,57918,5829168,583291668,58332916668,5833329166668],
[810,83700,8397000,839970000,83999700000,8399997000000],
[1133,114218,11432168,1143321668,114333216668,11433332166668],
[1428,148668,14926668,1493266668,149332666668,14933326666668],
[1845,188550,18895500,1889955000,188999550000,18899995500000]]

I’m not sure if this continues indefinitely. I wonder if similar patterns emerge with numbers in different bases. I’m curious about trying multiples of numbers other than 3 or 5. At this point, I’ve really no idea why this happens, but my curiosity is aroused.

I’m also really impressed with Haskell, a language that I barely know. The only other time I’ve used functional programming seriously is with C# and VB.Net for LINQ, of which I am a huge fan.