Operations on 64 bits Bitboard types are slow on x86 compiled with gcc, so optimize this case. BTW profiling shows that pop_1st_bit() is a veeery performance critical path! Signed-off-by: Marco Costalba <mcostalba@gmail.com>