Archive for the ‘Web’ Category

One liner to show logs without Rotated Backup files

Friday, May 25th, 2012

I’ve been looking at the Apache log files on a web server this morning. There are many virtual hosts on the machine and the log rotation scripts have created many numbered backup files of the logs. To make the directory listing more readable, I have been using the following one-liner:

ls -l /var/log/apache2 | grep -Ev -e '[[:digit:]]+(\.gz)?$'

This will only display the current log files, assuming that /var/log/apache2 is the directory in which you store your Apache logs and that you do not store any other files there.

I hope it helps.

Script for Listing Etexts from Project Gutenberg

Monday, August 16th, 2010


I’m a big fan of Project Gutenberg and have downloaded many of their etexts over the years. However, their etexts have numeric file names, which aren’t very human friendly. In order to keep track of the etexts that I have saved on my computer, I’ve written a little perl script to extract the author and title from the etexts and generate an HTML file to list them.

The code’s release under the GPL, so feel free to tinker with the code and share alike.

Richard Feynman on Tuva

Thursday, July 16th, 2009

We can find Richard Feynman’s Messenger Lectures on physics at the intriguingly named Tuva site:

Dr. Feynman is an engaging lecturer; it is perhaps regrettable that all lectures are not so entertaining.

At one point Dr. Feynman says that “It is impossible, when picking one particular example of anything, to avoid picking one that is atypical in some sense.” Of course, this is true by definition. If we were to find an example that was typical in every sense, it would be atypical in that it was not atypical in some sense, and so it would be atypical in some sense. Oh, the joy of school boy pedantry!

The video is rendered with a Silverlight player, which is perhaps not available on all platforms. It also used 100% of my CPU’s clock cycles and caused the laptop to crash three times. I guess that Silverlight has a long way to go before it can threateningly compete with Flash. On the one hand, it’s a good thing that Flash has some more competition (not that I am accusing the Adobe engineers of laziness, mind). On the other hand, the internet will not be as rich a place as it might be if a lot of content is only available to Microsoft’s customers. I thought that that war had been won a long time ago.

Michael Jackson Spam

Friday, June 26th, 2009


I’m already getting Michael Jackson spam in the comments for this blog. Those guys are quick. One day, we will get all our news from spam.

Bingo Card Generator

Saturday, June 6th, 2009

I’ve put together a bingo card generator:

It’s a response to the Bingo card generator at

I’ve been using the Teach-nology generator for a while for making bingo cards. My generator makes a few improvements to the way that the user operates. In particular, the user doesn’t have to hit ‘Shuffle’ and print for each student.

My kids tend to enjoy bingo. I let them play a game as a reward after a test. It’s more suited to less experienced learners, especially ones learning to match sounds to the words that they read. With more experienced learners, one can say the definition of the word, draw a picture on the board or do a charade instead of just saying the word.

Is there an algorithm for Wikipedia?

Friday, June 5th, 2009

Google’s latest offering,

is rather fun, but I’m not convinced that I will use it very often.

Compare search results like this:


The page on Wikipedia is much more useful. It seems that humans are better at making tables of data from diverse sources of information that computers are at this point. Will it always be this way?

Wikipedia has strict guidelines on how articles are written and how propositions should be backed by reliable sources. Could these guidelines be further formalised and pave the way for an algorithm that could write something like Wikipedia from scratch? Google seem to be attempting to build a system that can produce the pages on Wikipedia with names like “List_of_*”. For all I know, Google might have looked at all the articles on Wikipedia whose names match that pattern and used them to get their tables started.

Sport is a popular subject. It’s safe to say that there are lot of people who are willing to give up their free time to collate data on the subject. If some joker changed the Wikipedia table to say that Manchester United were relegated at the end of the previous season, this error would be corrected quickly as there is no lack of people who care deeply about the matter.

During a presentation for Wolfram Alpha, Stephen Wolfram was asked whether he had taken data from Wikipedia. He denied it and said that the problem with Wikipedia was that one user might conscientiously add accurate data for 200 or so different chemical compounds in various articles. Over the course of a couple of years, ever single article would get edited by different groups. The data diverged. He argued that these sorts of projects needed a director, such as himself. However, he said that his team had used Wikipedia to find out what people were interested in. If the article on carbon dioxide is thousands of characters long, is edited five times a day, has an extensive talk page, is available in dozens of languages, and has 40 references, it is safe to say that carbon dioxide is a chemical compound that people are interested in. This is true regardless of the accuracy content of the article. It would be pretty trivial for Google (or any Perl hacker with a couple of hours to spare and a few gigs of hard disk space) to rank all of the pages on Wikipedia according to public interest using the criteria that I just listed.

In many ways, an algorithmic encyclopaedia is to be preferred because of the notorious problems of vandalism and bias. However, tasks like condensing and summarising are not straightforward. The problem of deciding what to write about could analysing Wikipedia, as described above, and tracking visitor trends. Is there going to be a move to unseat Wikipedia in the coming years? How long before humans can be removed from the algorithm completely?

How not to fight global warming

Monday, May 4th, 2009

Above is a link to an article on the carbon footprint of the internet. In the comments, we can find the normal luddite opinions. If only people didn’t like the modern world, we could live in pre-industrial simplicity.

It seems embarrassingly obvious to me that if we have any hope of survival, it is in moving forward, rather than backwards. If we think that we can solve the world’s environmental problems by rejecting technology, then we’re sunk. Do the troglodyte commenters on the Guardian really think that the world is going to be able to implement the sort of engineering projects that are going to be necessary for a revolution in the world’s energy industry without the internet? How do they imagine engineers study and design things like solar panels, wind turbines or smart electricity grids? Using pencils, recycled paper and 30 year old text books?

The One Thing Needful

Friday, May 1st, 2009

The weather has been very nice here in Guri today. I’ve just read two articles predicting that this summer will be good for barbecues in Britain.

The Met Office apparently predicted the same thing last year and got it completely wrong. My guess is that they will be wrong again this year. My thinking is not really very scientific, just a willful and immature contrariness coupled with a lifetime of disappointing summers in England.

The sun is at the bottom of its sunspot cycle as well:

Noticing correlations between seemingly unrelated data has always been a rich source of new knowledge. I understand that correlation is not causation, but the reverse does appear to be true. If the expected outcome of a hypothesis does not coincide with the recorded data, then the hypothesis should not be trusted or a least questioned thoroughly.

I watched a lecture last night on Wolfram|Alpha

I imagine that when this goes live, there will be a lot of bloggers who use the data to show relationships that are very questionable. The old line about statistics being used like a drunkard uses a lamp post, for support rather than illumination, will no doubt apply. I am interested to see what all the amateur climatologists who have sprung up since climate change has obsessed the world will come up with. Like everything in life, more than 99% of it will be banal and worthless. It’s the rest that’s intesting. Luckily, the internet provides incredible filters for sorting through enormous amounts of information for finding the most interesting things to think about.

One of the problems that I can see with W|A is that it is closed and proprietary, although users will be able to access the data for free. The company may be able to run profitably. The search engines have done well being run this way. As far as I know, this is a new sort of service that has not been tried before. I hope that the likes of Google, Yahoo! and Microsoft try to build rivals to this quickly.

I hope that the diverse open source communities of the world try to come up with something to compete with it. At this stage, it is clear that a lot of human intervention is needed to get the data into the system. Wikipedia has shown that this is something that people are willing to give up their free time to do. Providing the vast computing resources for an open source version of this project is also a hurdle. I would certainly consider giving up some of my computer’s spare cycles for a distributed and open source version of this came along, and I am sure that I would not be alone. However, the popularity of Google compared to, for example, Yacy shows how difficult it is for these sorts of things to be fully open.

Another development that I hope that W|A brings about is to force universities and other publicly funded research institutions to do more to make all of their experimental data available in machine readable formats. A single open source project that can absorb all of the scientific, engineering, technical, sociological, economic and financial data in the world might not come about, but lots of smaller projects that each try to solve part of the problem might. No doubt, such projects would take pains to cooperate with the other projects.

What developments occur in the next few years in this field are the subjects of anyone’s guesses. As Dickens pointed out in Hard Times,

facts alone do not make a person educated or complete. However, I imagine that everyone being able to ask lots of little questions involving data and relationship between them will have a similar impact to that of Google and Wikipedia. In the past, if we wondered to ourselves “What is the name of the Aztec sun god?” we might not have bothered to go to a public library or even take an encyclopedia of the shelf to find out. Now, we are much more likely to find out about

because it takes such a short amount of time with our modern tools.

New TEFL Site

Sunday, February 8th, 2009

I’ve put together a few pages for a TEFL materials site:

My aim for this site is to be able to produce materials for TEFL lessons more quickly. The first page that I’ve put up generates materials for a missing information game:

I’ve been playing this game for a few weeks in the classroom, but I have grown tired of writing out the cards using MS Word.

As always, I’ve written the site using the Haddock CMS. It’s the first site to make use of the Sky theme plug-in:

The aim of theme plug-ins is to be able to make giving a style to a site simply a case of checking out a plug-in and then getting the HTML page class to extend a class in the theme plug-in directory.

It’s also the first site to make use of the new “Site Texts” plug-in:

This separates all texts from the code of the project. The texts are saved in files in a separate folder to the project-specific code. At the moment they need to be created and edited by hand, but a web interface in the admin section may follow.

The Sum of Human Knowledge

Thursday, January 15th, 2009

On the talk page for the Wikipedia article on the Postliterate Society there is a curious form of a logical fallacy:

Literacy is a particular interest of mine, and I have never heard of this. I would recommend deletion.

This seems to be an odd way of thinking for someone helping to write an encyclopedia: I’m an expert; I’ve never heard of this; this, therefore, cannot exist.

One truth that I am continually confronted with (especially when I visit Wikipedia) is that there are more things that exist than I have heard of. This is especially true in the areas in which I consider myself an expert.