Reports & Export
Generate beautiful HTML reports and export results in multiple formats.
HTML Dashboard Reports
Generate interactive reports with comprehensive visualizations:
# Generate report after running tests
praisonaibench --suite tests.yaml --report
# Generate report from existing results
praisonaibench --report-from output/json/benchmark_results_20241211_123456.json
# Compare multiple test results
praisonaibench --compare result1.json result2.json result3.json
Report Features
📊 Dashboard Tab
- Summary cards with key metrics
- Interactive charts:
- Status distribution (success/failure)
- Execution time by model
- Evaluation scores (radar chart)
- Errors & warnings
🏆 Leaderboard Tab
- Model rankings with multiple criteria:
- Overall Score (default)
- Functional Score
- Quality Score
- Pass Rate
- Speed (fastest first)
- Top 3 models highlighted with medals
- Click criteria to re-rank dynamically
⚖️ Comparison Tab
- Side-by-side model comparison
- Comprehensive metrics table:
- Overall score, functional score, quality score
- Pass rate with color coding
- Average execution time
- Total errors and warnings count
📋 Results Tab
- Complete test results table
- Individual test status, scores, time, tokens, cost
- Sortable columns
- Status indicators
Report Benefits
| Feature | Benefit |
|---|---|
| 🎨 Modern UI | Gradient headers, smooth transitions |
| 📱 Responsive | Works on all devices |
| ⚡ Lightweight | No external dependencies |
| 📊 Interactive | Chart.js powered |
| 💾 Standalone | Works offline |
| 📧 Shareable | Single HTML file |
CSV Export
Export results for spreadsheet analysis:
# Export to CSV format
praisonaibench --suite tests.yaml --format csv
# Results saved to: output/csv/benchmark_results_20241211_123456.csv
CSV Columns
- Test names and status
- Model information
- Execution times
- Token usage (input/output/total)
- Costs per test
- Evaluation scores
- Prompts and response lengths
- Error messages (if any)
CSV Use Cases
- Spreadsheet analysis (Excel/Google Sheets)
- Data visualization tools
- Statistical analysis
- Sharing with non-technical stakeholders
Comparison Reports
Multi-run comparison shows: - Side-by-side success rates - Performance trends - Cost and token usage evolution - Model improvements over time