When Your Results Are Real But Your Sample Size Is Not
Refleqt Labs · 6 min read
A power analysis of our SES-012 benchmark comparisons found that ten out of ten were underpowered at n=100 with a single seed -- and we are publishing the audit before the results it audits.