AI Red Team Evaluation Report

Overall Safety

{{ overall_safety_score }}/100

Safety Score Break‑up

    {% for cat in safety_categories %} {% if cat.failed == 0 %} {% set row_class = "list-group-item-success" %} {% elif cat.passed == 0 %} {% set row_class = "list-group-item-danger" %} {% else %} {% set row_class = "list-group-item-warning" %} {% endif %}
  • {{ cat.title }} {{ cat.score }}%
  • {% endfor %}

Regulatory Readiness

Presents your compliance levels against leading AI governance frameworks. Each percentage reflects the share of required controls you meet.

{% for framework,pct in regulatory_readiness.items() %}
{{ framework }}
{{ pct }}%
{% endfor %}

Findings Summary

Critical / High / Medium issues & total jailbreaks.

{% for label,val in { 'Critical':findings_summary.critical, 'High':findings_summary.high, 'Medium':findings_summary.medium, 'Total Jailbreaks':findings_summary.jailbreaks, 'Total Findings':findings_summary.total_findings }.items() %}
{{ label }}
{{ val }}
{% endfor %}

Plugin Summary

Failure‑rate for each plugin category.

{% for p in findings_list %} {% set fail_pct = (p.failed / p.total * 100) | round(0) %}
{{ p.title }}
{{ fail_pct }}%
{{ p.failed }} of {{ p.total }} failed
{% endfor %}

Summary of Evaluation Results

{% macro score_str(o) -%} {%- if o is none -%} N/A {%- elif o.overall_score is defined -%} {{ "{:.2f}".format(o.overall_score.score) }} ({{ o.overall_score.severity.value }}) {%- elif o.score is defined -%} {{ "{:.2f}".format(o.score) }}{% if o.severity is defined %} ({{ o.severity.value }}){% endif %} {%- else -%} {{ o }} {%- endif -%} {%- endmacro %} {% set sorted_results = report.eval_results | sort(attribute='risk_score.overall.overall_score.score') | reverse %} {% for res in sorted_results %} {% endfor %}
Plugin IDPrompt GoalFailed Risk Score (Overall)JailbreakDescription
{{ res.prompt.goal }} {{ res.plugin_id }} {% if not res.responses[0].success %}Yes{% else %}No{% endif %} {{ '{:.2f}'.format(res.risk_score.overall.overall_score.score) }} ( {{ res.risk_score.overall.overall_score.severity.value }} ) {{ res.responses[0].jailbreak_achieved or '-' }} {{ res.responses[0].description or '–' }}
Finding for Run ID: {{ res.run_id }}
{% macro row(label,val) -%} {%- endmacro %} {{ row('Plugin ID', res.plugin_id) }} {{ row('Prompt Goal', res.prompt.goal) }} {{ row('Base Prompt', res.prompt.base_prompt) }} {{ row('Failed', 'Yes' if not res.responses[0].success else 'No') }} {{ row('Risk Score (AIVSS)', score_str(res.risk_score.aivss.aivss_score)) }} {{ row('Base Score', score_str(res.risk_score.base.base)) }} {{ row('Breakability Score', score_str(res.risk_score.breakability.breakability)) }} {{ row('Overall Score', score_str(res.risk_score.overall.overall_score)) }} {{ row('Jailbreak Achieved', res.responses[0].jailbreak_achieved or 'N/A') }} {{ row('Description', res.responses[0].description) }}
FieldValue
{{ label }} {{ val if val is not none else '–' }}
User Message
{{ res._user_message[:150] }}{% if res._user_message|length > 150 %}...{% endif %} {% if res._user_message|length > 150 %} Read more {% endif %} {{ res._user_message }}
Assistant Response
{{ res._assistant_response[:150] }}{% if res._assistant_response|length > 150 %}...{% endif %} {% if res._assistant_response|length > 150 %} Read more {% endif %} {{ res._assistant_response }}