One thing I've been pondering... Has anyone considered doing a multidimensional benchmark? Essentially the following:
In theory, I think this could be written as a large polynom, but in practice since opcode processing varies, the method needs to be based on estimation finding the best match.