Watt hunting competition
This competition was part of [ https://fedoraproject.org/wiki/DeveloperConference2009 DevCon09] event. It takes place in lab - where was 10 computers setting up for the competitors. And the aim was to enjoy programming a bit and look to methods how to optimize the code. And compete with the others.
Winners:
1. Dan Mach 2. Pavel Moravec 3. Tomas Janousek
(from 24 competitors)
Problem description:
We have three tasks - which have to be implement in 45 minutes.
Then we measure only the time of the run of program on test data set (we want to measure disk accesses too, but scomes tool was not stable enough and that there were only small differences in disk accesses in the solution we use for tests in comparison with the time differences). The input data were quite large to emphasize the differences between algorithms and implementation changes.
Then we compare the power consumption of all implementations (there were a penalisation for not implementing a task 20s).
The differences between the disc accesses were not to high in our case so we omit to measure them.
Tasks are:
1/ Implement a program which input is a positive number less then 10 000 (we will denote the number n) and the output is 1+2+3+...+(n-1) The return code have to be 0.
Example:
$ ./first 5 10
2/ Implement a program, which input is a file name and this program put the file on output - but omit all white spaces (' ', '\t', '\n'). The return code have to be 0.
Example:
$ cat jmeno.txt . . . . .. Ahoj svete ... ... $ ./second jmeno.txt ......Ahojsvete......
3/ Implement a program which input is file name and which return the number of occurrences of string "file" in input file. The string have to be separated with white space (' ', '\t', '\n') on the beginning and at the end. The return code have to be 0.
Example:
$ cat text.txt 123 file files hula file pes pfile les $ ./third text.txt 2
Solution:
1/ Because the test input number was high there were only slight differences between bash and c implementation. The winning solution use formula for the computing of the sum (File:02first.c).
2/ In c there is several methods to implement this tasks the slower was fscanf (File:02second.c), faster variants use getc (File:01second.c) and the fastest one uses fread (File:03second.c). Bash version which uses sed parser is rather slow, but the one using tr command is really fast.
3/ In c problem could use several methods either go through whole file and the fastest was using fscanf (File:03third.c), better version use some mechanism to find the occurrence of f character at first and then test whether it is part of file word (or some similar mechanism) (File:01third.c). The best solution uses mmap and order the tests based on the percent occurrence of character in text (File:02third.c). In bash the winning strategy parse the output using grep command, and then use tr to find the proper number sed could be used too but it was slower again. Here could be used Boyer-Moore algorithm - but it is problem to have it in time (nobody use it), and the string is too short so the results are not so good.
In all cases there is fine to use come optimization flags. There are another methods how to speed up the solution (use fgetc_unlock and so on), nobody use it this competition (but feel free to try to test it :) ).