From Fedora Project Wiki

Revision as of 07:18, 8 October 2009 by Msivak (talk | contribs) (Improve the description of Bash solution)

Watt hunting competition

This competition was part of DevCon09 event. It takes place in lab - where was 10 computers setting up for the competitors. And the aim was to enjoy programming a bit and look to methods how to optimize the code. And compete with the others.

Winners:

1. Dan Mach
2. Pavel Moravec
3. Tomas Janousek

(from 24 competitors)

Problem description:

We have three tasks - which have to be implement in 45 minutes.

Then we measure only the time of the run of program on test data set (we want to measure disk accesses too, but scomes tool was not stable enough and that there were only small differences in disk accesses in the solution we use for tests in comparison with the time differences). The input data were quite large to emphasize the differences between algorithms and implementation changes.

Then we compare the power consumption of all implementations (there were a penalisation for not implementing a task 20s).

The differences between the disc accesses were not to high in our case so we omit to measure them.


Tasks are:

1/ Implement a program which input is a positive number less then 10 000 (we will denote the number n) and the output is 1+2+3+...+(n-1) The return code have to be 0.

Example:

 	$ ./first 5
	10

2/ Implement a program, which input is a file name and this program put the file on output - but omit all white spaces (' ', '\t', '\n'). The return code have to be 0.

Example:

 	$ cat jmeno.txt
	. . . .
	..
	Ahoj svete
	... ...
	$ ./second jmeno.txt
	......Ahojsvete......

3/ Implement a program which input is file name and which return the number of occurrences of string "file" in input file. The string have to be separated with white space (' ', '\t', '\n') on the beginning and at the end. The return code have to be 0.

Example:

 	$ cat text.txt
	123 file files hula
	file pes
	pfile les
	$ ./third text.txt
	2

Solution:

1/ Because the test input number was high there were only slight differences between bash and c implementation. The winning solution use formula for the computing of the sum (File:02first.c).

2/ In c there is several methods to implement this tasks the slower was fscanf (File:02second.c), faster variants use getc (File:01second.c) and the fastest one uses fread (File:03second.c). Bash version which uses sed parser is rather slow, but the one using tr command is really fast.

3/ In c problem could use several methods either go through whole file and the fastest was using fscanf (File:03third.c), better version use some mechanism to find the occurrence of f character at first and then test whether it is part of file word (or some similar mechanism) (File:01third.c). The best solution uses mmap and order the tests based on the percent occurrence of character in text (File:02third.c).

Boyer-Moore algorithm would be also good for this task- but it is too complicated to write in time (nobody used it), Also the word was too short, so the results wouldn't be fast enough in comparison to bash.

In bash the winning strategy is to split the input into one-word-per-line form using the tr command, and count the matching lines using grep -c. Sed could be used too but it was more complex to write and slower.


In all cases there is fine to use come optimization flags. There are another methods how to speed up the solution (use fgetc_unlock and so on), nobody use it this competition (but feel free to try to test it :) ).