Plugin System

Extensible evaluators for any language or task.

Overview

The plugin system allows you to create custom evaluators for Python, TypeScript, Go, or any other language/task type. Each plugin is just one file.

Creating a Plugin

1. Create Evaluator Class

# my_evaluator.py
from praisonaibench import BaseEvaluator

class MyEvaluator(BaseEvaluator):
    def get_language(self):
        return 'mylang'  # e.g., 'python', 'typescript', 'go'

    def evaluate(self, code, test_name, prompt, expected=None):
        # Your evaluation logic here
        return {
            'score': 85,      # 0-100
            'passed': True,   # score >= 70
            'feedback': [{'level': 'success', 'message': '✅ Works!'}],
            'details': {}
        }

2. Configure Package

# pyproject.toml
[project]
name = "praisonaibench-mylang"
version = "0.1.0"
dependencies = ["praisonaibench>=0.1.0"]

[project.entry-points."praisonaibench.evaluators"]
mylang = "my_evaluator:MyEvaluator"

3. Install Plugin

pip install -e .
# or
uv pip install -e .

Using Plugins

Specify the language in your test suite:

# tests.yaml
tests:
  - name: "python_test"
    language: "python"  # Auto-discovered
    prompt: "Write Python hello world"
    expected: "Hello World"

Run the suite:

praisonaibench --suite tests.yaml

Plugin Features

Feature	Description
✅ One file	~50 lines per plugin
✅ Auto-discovery	No config needed
✅ Backwards compatible	HTML evaluation unchanged
✅ Language detection	Auto-detects from code blocks
✅ Any task	Programming, text, translation, etc.

Example: Python Evaluator

# examples/plugins/python_evaluator.py
from praisonaibench import BaseEvaluator
import subprocess
import tempfile
import os

class PythonEvaluator(BaseEvaluator):
    def get_language(self):
        return 'python'

    def evaluate(self, code, test_name, prompt, expected=None):
        # Write code to temp file
        with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
            f.write(code)
            temp_path = f.name

        try:
            # Run the code
            result = subprocess.run(
                ['python', temp_path],
                capture_output=True,
                text=True,
                timeout=30
            )

            # Check output
            output = result.stdout.strip()
            score = 100 if expected and expected in output else 70

            return {
                'score': score,
                'passed': score >= 70,
                'feedback': [{'level': 'success', 'message': f'Output: {output}'}],
                'details': {'stdout': result.stdout, 'stderr': result.stderr}
            }
        finally:
            os.unlink(temp_path)

Language Detection

Plugins auto-detect language from: 1. Explicit language field in test 2. Code block language identifier (```python) 3. File extension patterns