|
| | Subject: | Re: machine learning for loop unrolling | | Group: | Gcc | | From: | Ken Raeburn | | Date: | 18 Jun 2007 |
> - compile with the loop unrolled 1x, 2x, 4x, 8x, 16x, 32x and
> measure the time the benchmark takes
The optimal unrolling factor may not be a power of two, depending on
icache size (11 times the loop body size?), iteration count (13*n for
some unknown n?), and whether there are actions performed inside the
loop once or twice every N passes (for N not a power of two).
The powers of two would probably hit a lot of the common cases, but
you might want to throw in some intermediate values too, if it's too
costly to check all practical values.
Ken
|