Archive for August, 2013

Using F# to minimise a function

Thursday, August 22nd, 2013

Finding the minimum of functions is at the heart of optimisation. Mathematicians, engineers and programmers have come up with a large number of approaches to solving this problem, including differentiation, genetic algorithms and even exhaustive search.

Consider a quadratic function such that could be written in F# as

let f x = (x ** 2.0) - (2.0 * x) + 1.0

Finding the minimum means finding the input value for the function that returns to lowest value. If we plot the curve, then the minimum is the lowest point of the curve.

One way to find this is find the derivative of the function:

(2.0 * x) - 2.0

and solve the equation where the derivative is equal to 0, which is when x is 1. This probably the best way to solve this problem in practice.

However, not all functions have derivatives that are easy to find. Therefore, an alternative way to minimise a function is an exhaustive search of the inputs in some range. This is not an elegant or generally efficient solution, but it works.

In F#, we might proceed as follows.

We define the range that we want to search and the step between candidate inputs:

let min = 0.0
let max = 10.0
let step = 0.01

We know where to start searching- with the lowest candidate input:

let firstCandidate = f min, min

This creates a tuple of two floats, the first being the output and the second being the lowest candidate input.

We also want a sequence of tuples of two floats that are the remaining candidate solutions:

let remainingCandidates = seq { for c in min + step .. step .. max do yield f c, c }

Note that at this point the sequence has not been enumerated and the function has not been run with each of the candidate inputs. Because sequences are evaluated lazily, the function will not be run for each of the candidate inputs until it is asked for.

We are trying to minimise the function, therefore we need a function that can compare the first elements of two candidate solutions:

let findMin currentMinSln candidateSln =
    match fst currentMinSln < fst candidateSln with
    | true -> currentMinSln
    | false -> candidateSln

Running this code in fsi shows its type to be:

val findMin : 'a * 'b -> 'a * 'b -> 'a * 'b when 'a : comparison

The F# compiler can infer that the function takes in two tuples, each with two elements. The tuples must be of the same type and the first element of each tuple must be comparable. Way to go, type inference!

Everything has been set up at this point. We just need to run the code:

let minSln = Seq.fold findMin firstCandidate remainingCandidates

The fold function is to reduce a sequence to a single value, starting with a given accumulator. In this case, the first candidate solution that we created earlier is our starting point. The output of the function for each candidate input is compared to that accumulator.

Running the code finds the same answer (1) that we found with calculus.

This code runs pretty quickly, but this approach is generally slow. Our range is only 10 wide and the step is 0.01, so there are only 1000 candidate inputs. Increasing the size of the range or decreasing the step would increase the number of inputs. More worryingly, the size of the search space increases exponentially as the number of inputs to the function increases. So a function that took two inputs for the same range for each input would have a million candidate inputs, three inputs would take a billion and so on.

However, it is also quite straightforward to parallelise this approach. The sequence of candidate solutions could be split up into partitions and each partition could be sent off to a different computer. Each batch would find a local minimum for the range it was given. Finding the minimum for the whole of our range of inputs is simply a matter of find the minimum in the returned list.

The complete code for this can be found here:

https://github.com/robert-impey/CodingExperiments/blob/master/F%23/Loose/MinimiseQuadratic.fs

Project Euler 1 in F# using Sequences

Wednesday, August 21st, 2013

I have written previously about solving the first problem on the Project Euler website using F# and Haskell.

The problem is to find the sum of the natural numbers that are divisible by 3 or 5 that are less than 1000. To solve this in F#, my first solution looked like this:

[0..999]
|> List.filter (fun x -> x % 3 = 0 || x % 5 = 0)
|> List.sum

This works. However, it was pointed out to me that using lists to generate the numbers below 1000 in my F# solution was memory hungry as the list will be created and stored in memory then piped to the filter and so on. The list is evaluated eagerly. This isn’t a problem with a small list like this, but it could create issues with more elements.

I took a look at sequences in F#, which are evaluated lazily. To get an idea of what this means, consider the sequence of natural numbers:

let countingNumbers = 
    seq { 
        for i in 1 .. System.Int32.MaxValue do 
            printfn "Yielding: %d" i
            yield i 
    }

If you are unfamiliar with lazy evaluation, it might look like this will create all the natural numbers up to System.Int32.MaxValue (2147483647) and print each number as it goes. However, if you run this with F# Interactive (aka fsi), it creates a seq<int> without printing anything out. At this point, the sequence is yet to be enumerated.

The ingenious thing about sequences is that we can work with them without enumerating them. Consider these lines:

let max = 999
let firstUpToMax = Seq.take max countingNumbers

The Seq.take function allows us to take the first elements in a sequence. However, running these lines in fsi does not cause the for loop in the sequence above to be run. The print and yield statements are not run.

F#’s ability to delay enumeration is not limited to sequences. Queries also allow us to describe a sequence of numbers without enumerating them.

I define the following function that will be used to filter the sequence:

let isDivisibleBy3Or5 x = 
    x % 3 = 0 || x % 5 = 0

and then create this query:

let divisibleBy3Or5 = 
    query {
        for countingNumber in firstUpToMax do
        where (isDivisibleBy3Or5 countingNumber)
        select countingNumber
    }

Programmers who are familiar with SQL or LINQ should have no problem understanding this query. In plain English, we might say “we will go through the sequence and find the values that match the given criterion.” Note the use of the future tense. Running this code in fsi does not result in enumeration of our sequence and the printing of the “Yielding…” lines.

I could find the sum of the numbers matching our criteria using a built in function like this:

let seqSum = Seq.sum divisibleBy3Or5

This will cause the sequence to be enumerated and finds the same sum as the list method above.

However, by defining the following function:

let noisySum total next =
    printfn "Adding %d to %d" next total
    total + next

I will be able to see the order in which the sequence is enumerated and when the values are pumped down the pipeline to be folded into the sum.

let seqFoldSum = 
    divisibleBy3Or5
    |> Seq.fold noisySum 0

At this point we get the following out put at fsi:

Yielding: 1
Yielding: 2
Yielding: 3
Adding 3 to 0
Yielding: 4
Yielding: 5
Adding 5 to 3
Yielding: 6
...
Yielding: 996
Adding 996 to 231173
Yielding: 997
Yielding: 998
Yielding: 999
Adding 999 to 232169
 
val seqFoldSum : int = 233168

Which clearly demonstrates the order of execution and that only the numbers that are divisible by 3 or 5 are added to the sum.

Does this make much difference to the memory usage? For this problem, this is not the bottle neck. The integers storing the sum of the numbers overflow before the input lists become large enough to use a lot of memory. However, passing values down the pipeline as they are yielded is somehow more pleasing to this programmer. Also, lazy evaluation means that enumerating the numbers and performing the mathematics do not take place until the result is actually needed. By creating a sequence and query, the programmer can create the code and pass it around the program without doing any heavy computation or using up much memory.

The complete listing of this code can be found at GitHub:

https://github.com/robert-impey/CodingExperiments/blob/master/F%23/Loose/Euler1.fs

Finding the Nearest Tube Station with LINQ

Tuesday, August 20th, 2013

I recently wrote a little command line program in C# that can tell you the name of the nearest tube station. I’ve put the program on GitHub in my Coding Experiments repository:

https://github.com/robert-impey/CodingExperiments/tree/master/C%23/NearestTube

The interface is very simple- this more of an experiment than something that I intend for mass consumption! You enter a location via the command line as the latitude and longitude:

Nearest Tube CLI

The starting point of the program was to calculate the distances between points. To do this I adapted some JavaScript code that John D. Cook wrote:

http://www.johndcook.com/lat_long_distance.html

Mathematical code looks very similar in many languages, and the translation from JavaScript to C# was trivially simple as the languages are very similar in this case. This code is part of the Point class:

        /// <summary>
        /// Porting JavaScript code from
        /// http://www.johndcook.com/lat_long_distance.html
        /// </summary>
        /// <param name="that"></param>
        /// <returns></returns>
        public double Distance(Point that)
        {
            // Compute spherical coordinates
 
            double rho = 6373;
 
            // convert latitude and longitude to spherical coordinates in radians
            // phi = 90 - latitude
            double phi1 = (90.0 - this.Latitude) * Math.PI / 180.0;
            double phi2 = (90.0 - that.Latitude) * Math.PI / 180.0;
            // theta = longitude
            double theta1 = this.Longitude * Math.PI / 180.0;
            double theta2 = that.Longitude * Math.PI / 180.0;
 
            // compute spherical distance from spherical coordinates
            // arc length = \arccos(\sin\phi\sin\phi'\cos(\theta-\theta') + \cos\phi\cos\phi')
            // distance = rho times arc length
            return rho * Math.Acos(
                    Math.Sin(phi1) * Math.Sin(phi2) * Math.Cos(theta1 - theta2)
                    + Math.Cos(phi1) * Math.Cos(phi2)) * 1000;
        }

Once I was able to calculate the distances between points on the map, I needed to be able to sort the tube stations of London by distance from the given point. This is the sort of code that LINQ to objects handles very naturally. In the SequentialTubeStationFinder class, I have the following method:

        public TubeStation FindNearestTubeStation(Point point)
        {
            return (from tubeStation in tubeStations
                    orderby tubeStation.Point.Distance(point)
                    select tubeStation).First();
        }

tubeStations is a generic list of TubeStation objects. Each has a Point property that is used to sort the tube stations and find the nearest one to our location.

The declarative style of programming that LINQ uses is much easier to read (and therefore maintain) than the equivalent code written with a for loop.