Published by Ostap Cherkashin on 2011 02 13.
This is a very brief update on the v1 performance. Here are the figures from exactly the same laptop as in the previous post (and again with the -Os gcc option):
$ cd bandicoot $ ctl pack $ bin/test/perf/relation write 1000 tuples in 1ms read 1000 tuples in 0ms write 10000 tuples in 2ms read 10000 tuples in 2ms write 100000 tuples in 18ms read 100000 tuples in 21ms write 1000000 tuples in 175ms read 1000000 tuples in 212ms join 1000x1000 tuples in 2ms join 10000x10000 tuples in 16ms join 100000x100000 tuples in 296ms join 1000000x1000000 tuples in 4983ms semidiff 1000x1000 tuples (res=500 tuples) in 1ms semidiff 10000x10000 tuples (res=5000 tuples) in 13ms semidiff 100000x100000 tuples (res=50000 tuples) in 263ms semidiff 1000000x1000000 tuples (res=500000 tuples) in 4743ms project 1000 tuples from 1000 in 1ms project 10000 tuples from 10000 in 13ms project 100000 tuples from 100000 in 222ms project 1000000 tuples from 1000000 in 3821ms select 1 tuples from 1000 in 0ms select 1 tuples from 10000 in 2ms select 1 tuples from 100000 in 15ms select 1 tuples from 1000000 in 154ms union 1000x1000 tuples (res=1500 tuples) in 1ms union 10000x10000 tuples (res=15000 tuples) in 17ms union 100000x100000 tuples (res=150000 tuples) in 338ms union 1000000x1000000 tuples (res=1500000 tuples) in 5835ms extend 1000 tuples in 0ms extend 10000 tuples in 4ms extend 100000 tuples in 43ms extend 1000000 tuples in 502ms sum 1000x1000 tuples in 1ms sum 10000x10000 tuples in 23ms sum 100000x100000 tuples in 362ms sum 1000000x1000000 tuples in 5710ms eq 1000x1000 tuples in 0ms eq 10000x10000 tuples in 14ms eq 100000x100000 tuples in 247ms eq 1000000x1000000 tuples in 4551ms
The test cases also got extended a little bit, and finally the Bandicoot grew up from 10000 tuples per argument ;-) Of course there are still many things which could be improved further, but it is a big step forward from where we were before. Here are the new formulas (#n stands for the number of tuples in the relation body):
|join||m * n||#m log #m + #n log #m|
|minus (semidiff)||m - n||#n log #n + #m log #n|
|union||m + n||#n log #n + #m log #n + #m|
|select||n select(x < 0)||#n|
|extend||n extend(x = 1)||#n|
|project||n project(x, y)||2 * #n log #n|
|summary||(m, n) summary(x = max(y, 0))||#m log #m + #n log #m|
(*) I tried to keep the formulas closer to the actual execution effort and hence I do not use the asymptotic notation here.