Code demonstration
aieval.oracles.score_sql(generated_sql, expected_results, db_path, tolerance_config)
Translation: Run both queries against the database, compare results, allow for small rounding differences
Result: FAIL — result sets do not match
Different SQL producing the same correct results gets full credit.