Benchmark for "better" performance

Imagine if you have 5-10 local testing algoes,
Then you implement a significant change to your main strategy and want to test it.
You run it against all local algos, and it wins every match …
So, without watching every single replay, what information can you use to determine if the change was significantly better (or worst) then the baseline ?
Number of Turns, and final Health seems logical, but also my speak for a more risky game and also need some parsing to get them.
I often use use the replay file size as first factor.

you could look at the endstats in the frame thing.there is some very usefull information.not sure if they are in the replay files i dont understand those