SDR-Arena

Open arena

Benchmarking the generative personalization capabilities of LLMs

Agents

Benchmark Prompts

179

Best Quality Score

42.5%

Fastest Avg Duration

38.2s

Sort by

Quality Score Avg Duration Token Efficiency Search Efficiency

Rank	Agent	Quality Score	Duration	Tokens/Prompt	Searches/Prompt	Prompts
1	STORM Stanford STORM Team	42.5%	128.4s	39,250	12.3	179
2	Qwen + WebSearch DR-Bench Team	36.8%	55.3s	8,627	3.8	179
3	Azure GPT-4o + WebSearch DR-Bench Team	36.4%	38.2s	16,727	4.1	179
4	Open Deep Research LangChain Team	33.5%	156.7s	106,366	8.6	179

Quality scores based on 179 evaluated prompts using LLM-as-Judge Coverage (0-5 Likert)

Agent Details

Select an agent

Select an agent from the dropdown to view detailed metrics and methodology.

Browse individual benchmark prompts and compare how each agent responded.

Select Prompt

Prompt #0

Customer: University Hospital Frankfurt

Type: success_story

Seller: ASSA ABLOY

Products: electronic CLIQ locking system

Cluster: Healthcare

Date: June, 2022

STORM

Completed 80.0% 550.7s 0 tokens 9 searches

Search Queries

Coverage Evaluation

Ground Truth Point	Score	Reasoning
The electronic CLIQ® locking system solved the ...	3/5	The candidate pitch captures the core value of reliability and user...
The eCLIQ locking system solved the need for ro...	4/5	The candidate pitch captures the use of 1,100 cylinders and the web...
The eCLIQ locking system solved the need for co...	5/5	The candidate pitch perfectly captures the GT point, including the ...

Output

# Electronic CLIQ® Locking System

## Overview

The Electronic CLIQ® Locking System from ASSA ABLOY is a state-of-the-art access control solution designed to enhance security, convenience, and sustainability in various environments, including healthcare facilities. Based on award-winning CLIQ access control technology, eCLIQ combines robust mechanical components with advanced microelectronics to provide a secure, efficient, and user-friendly locking system [1].

## Key Features

### Robust and Durable Components

eCLIQ cylinders are built to last, with an integrated lubricant reservoir that ensures they remain maintenance-free for up to 200,000 cycles. The system's durability is further enhanced by its waterproof (IP67) keys, which are compact and easy to reprogram [1]. This robustness is particularly beneficial in a healthcare setting where reliability and longevity are critical.

### Secure and Efficient Access Management

eCLIQ keys are battery-powered and programmable, allowing for the creation of personalized access schedules for each user. Security managers can easily update access rights through the eCLIQ Manager software, which can be hosted locally or accessed via a secure web-based platform. This flexibility is crucial in a dynamic environment like a hospital, where access needs frequently change [1].

### Enhanced Security

The eCLIQ system employs AES encryption and rapid processing to ensure secure data transfer and protect against both physical and electronic attacks. In the event of a lost or stolen key, access rights can be quickly revoked, minimizing security risks [1]. This feature is especially important in healthcare settings where sensitive areas, such as pharmacies and patient rooms, require strict access controls.

### Low Maintenance and Sustainability

Unlike traditional electronic access control systems, eCLIQ does not require mains electricity. The microelectronics inside the locks are powered by the battery in each key, making the system energy-efficient and sustainable. The absence of cabling also reduces installation costs and minimizes disruption during setup [1]. This is particularly advantageous for large healthcare facilities like University Hospital Frankfurt, where minimizing downtime and operational disruptions is essential.

### User-Friendly and Flexible

eCLIQ keys are easy to use and can be updated with a variety of programming devices, including wall-mounted units. This simplicity ensures that even non-technical st

... (truncated)

Open Deep Research

Completed 26.7% 165.2s 0 tokens 0 searches

Coverage Evaluation

Ground Truth Point	Score	Reasoning
The electronic CLIQ® locking system solved the ...	2/5	The candidate pitch identifies the correct product and mentions imp...
The eCLIQ locking system solved the need for ro...	1/5	The candidate pitch vaguely mentions security improvements but does...
The eCLIQ locking system solved the need for co...	1/5	The candidate pitch does not address the flexibility of the system ...

Output

```json
[
    "Enhance Security and Compliance: University Hospital Frankfurt can significantly improve its security standards and compliance with the latest healthcare regulations by implementing the CLIQ® locking system. The advanced key control and access management features ensure that only authorized personnel can access sensitive areas, reducing the risk of unauthorized access and data breaches [1].",
    "Reduce Operational Costs: By adopting the CLIQ® system, University Hospital Frankfurt can streamline its key management processes, eliminating the need for manual key distribution and collection. This reduces labor costs and minimizes the time staff spend on administrative tasks, allowing them to focus more on patient care and other critical activities [2].",
    "Improve Patient Safety: The CLIQ® system offers real-time monitoring and immediate access revocation capabilities, ensuring that patient rooms and treatment areas remain secure at all times. In the event of an emergency, such as a lost key or a security breach, access can be quickly adjusted to protect patients and staff [3].",
    "Support Sustainable Initiatives: University Hospital Frankfurt is committed to sustainability, and the CLIQ® system aligns with this goal by reducing the need for physical keys and associated materials. The electronic system is environmentally friendly, contributing to the hospital’s efforts to minimize its carbon footprint and promote a greener healthcare environment [4].",
    "Enhance Staff Efficiency: The CLIQ® system’s user-friendly interface and centralized management software enable hospital staff to manage access permissions efficiently. This reduces the time required for key management and access control, allowing staff to focus on their core responsibilities and improving overall operational efficiency [5]"
]
```

### Sources
[1] ASSA ABLOY: CLIQ® Electronic Locking Systems: [https://www.assaabloy.com/en/products/electronic-locking-systems/cliq-electronic-locking-systems/]
[2] ASSA ABLOY: Cost Savings with Electronic Access Control: [https://www.assaabloy.com/en/solutions/cost-savings-with-electronic-access-control/]
[3] ASSA ABLOY: Enhancing Patient Safety with Electronic Locking Systems: [https://www.assaabloy.com/en/solutions/patient-safety-electronic-locking-systems/]
[4] University Hospital Frankfurt: Sustainability Initiatives: [https://www.klinikum.uni-frankfurt.de/en/sustainability-initiatives/]
[5] ASSA ABLOY: Streamlining Operations with CL

... (truncated)

Qwen + WebSearch

Completed 20.0% 92.1s 5,009 tokens 1 searches

Search Queries

🔍 University Hospital Frankfurt security measures

🔍 University Hospital Frankfurt facility management challenges

🔍 University Hospital Frankfurt patient safety initiatives

🔍 CLIQ® locking system benefits in healthcare settings

🔍 ASSA ABLOY solutions for hospitals

Coverage Evaluation

Ground Truth Point	Score	Reasoning
The electronic CLIQ® locking system solved the ...	2/5	The candidate pitch mentions streamlining facility management and c...
The eCLIQ locking system solved the need for ro...	1/5	The candidate pitch vaguely touches on security but does not addres...
The eCLIQ locking system solved the need for co...	0/5	The candidate pitch does not address the need for consistent and re...

Output

Based on the research, here are five targeted pitch points for selling the electronic CLIQ® locking system to University Hospital Frankfurt:

[
    "Enhance patient safety and privacy with the CLIQ® system's advanced key control and access management, ensuring only authorized personnel can enter sensitive areas.",
    "Reduce operational costs and improve efficiency by eliminating the need for physical key reissuance and reducing maintenance efforts, allowing your staff to focus on patient care.",
    "Comply with stringent healthcare regulations and standards, including infection control protocols, through the CLIQ® system's hygienic design and audit trail capabilities.",
    "Increase security and response times during emergencies with the CLIQ® system's remote access control and immediate lock-down functionality, protecting both patients and staff.",
    "Streamline facility management and reduce administrative burdens by integrating the CLIQ® system with existing hospital IT infrastructure, providing centralized access control and reporting."
]

Azure GPT-4o + WebSearch

Completed 53.3% 142.9s 12,077 tokens 1 searches

Search Queries

🔍 University Hospital Frankfurt security needs

🔍 University Hospital Frankfurt infrastructure

🔍 University Hospital Frankfurt patient care and security challenges

🔍 CLIQ® electronic locking system benefits

🔍 CLIQ® system hospital case study

🔍 University Hospital Frankfurt recent security upgrades

🔍 University Hospital Frankfurt decision-makers security purchases

Coverage Evaluation

Ground Truth Point	Score	Reasoning
The electronic CLIQ® locking system solved the ...	3/5	The candidate pitch captures the idea of scalability and reliabilit...
The eCLIQ locking system solved the need for ro...	2/5	The candidate pitch mentions the 1,100 cylinders, which aligns with...
The eCLIQ locking system solved the need for co...	3/5	The candidate pitch captures the idea of scalability and expansion ...

Output

1. The University Hospital Frankfurt has already implemented 1,100 eCLIQ electronic cylinders in one of its new buildings, demonstrating its trust in programmable, cutting-edge technology. By expanding this technology hospital-wide, the CLIQ system can offer enhanced control over access and audit trails, ensuring patient safety and compliance with stringent healthcare regulations.

2. University Hospital Frankfurt handles approximately 278,000 patients annually at its 860-bed facility. To manage high staff movement efficiently while addressing the risks of lost or outdated keys, the CLIQ® electronic locking system's programmable keys can be quickly updated remotely, offering seamless transitions for rotating staff.

3. Given the hospital's focus on protecting sensitive patient data and combating cybersecurity threats, CLIQ®'s cloud-based management and encrypted communication align perfectly with their need for robust, risk-reduced infrastructure. This strengthens the hospital's operational resilience against potential breaches.

4. University Hospital Frankfurt features state-of-the-art facilities with over 4,000 employees and 32 specialty clinics. CLIQ's intelligent access control systems can integrate seamlessly with the existing medical and operational protocols, controlling authorization down to the individual staff member for critical areas such as drug storage or patient data servers, thus preventing unauthorized access.

5. Hospitals are increasingly targeted by cybercrime and ransomware attacks, as seen in the broader healthcare industry. CLIQ locking solutions enable swift access revocation and real-time audit reports for physical security, reducing liability and increasing emergency response efficiency, directly guarding against physical access-related breaches.

About SDR-Arena

The first framework for benchmarking the generative personalization capabilities of LLMs, grounded in Bayesian Persuasion theory. Agents act as Sales Development Representatives (SDRs), researching prospects via time-restricted web search and generating personalized pitch points scored against ground truth from real-world customer success stories.

🎯 What is SDR-Arena?

SDR-Arena evaluates how well LLM-based agents can perform generative personalization — the task of researching a prospect and articulating why a specific product addresses that prospect's needs. Each agent acts as an SDR:

Receives a seller–buyer pair and a time boundary
Researches the prospect through time-restricted web search (only information available before the original interaction date)
Generates personalized value propositions (pitch points)
Is scored against ground-truth pitch points extracted from real-world customer success stories

The framework is rooted in Bayesian Persuasion: the SDR agent must select and present information that maximally shifts the prospect's beliefs toward the value of the product.

📚

Customer Success Stories

A public corpus of 6,200+ success stories across 22 industries and 200 enterprises, forming the SDR-Bench dataset for rigorous evaluation.

⏳

Historical Internet Simulation

A temporal boundary (W_t) prevents future data leakage — agents only see information available at the original interaction date, ensuring fair comparison across time periods.

⚖️

LLM-as-Judge Coverage

Outputs scored on a 0–5 Likert scale measuring Sales Effectiveness and Factual Precision, aggregated into a Weighted Coverage Score.

🚀

Open Submissions

Anyone can run their own agent, collect outputs, and submit results. Upload a JSON file and the SDR-Arena team will evaluate and rank your agent on the leaderboard.

📑 Dataset: SDR-Bench Corpus

The benchmark draws from the SDR-Bench corpus — 6,279 customer success stories spanning 22 industries and 200 enterprises. Each success story captures a real seller–buyer engagement with documented value propositions.

Note: The underlying paper also studies emails and call transcripts, but this public leaderboard benchmarks exclusively on success stories. This ensures a consistent ground truth derived from published customer outcomes.

For each prompt the agent receives:

The seller (product/company) and buyer (prospect company)
A temporal boundary (W_t) — the date before which web information is considered available
Instructions to generate personalized pitch points

🧪 Evaluation: Coverage Judge

Agent outputs are evaluated using an LLM-as-Judge Coverage Scoring pipeline. Ground-truth pitch points are extracted from each success story, and a judge LLM evaluates how well the agent's output covers each ground-truth point on a 0–5 Likert scale:

Score	Label	Meaning
0	Miss	No relevant mention of the ground-truth point
1	Marketing Fluff	Vague or generic claim without substance
2	Topic Match	Correct topic area but missing specific connection
3	Implied Match	Reasonable inference but not explicitly stated
4	Strong Argument	Clear, specific, and well-supported connection
5	Strategic Bullseye	Exact match with compelling, evidence-backed reasoning

The Weighted Coverage Score (WCS) is computed as:

WCS = ∑ scores / (5 × N)

where N is the number of ground-truth points. The final quality score is expressed as a percentage (0–100%).

📏 Metrics

Metric	Description
Quality Score	LLM-as-judge Weighted Coverage Score (0–100%)
Avg Duration	Average time per prompt in seconds
Tokens / Prompt	Average LLM tokens consumed per prompt
Searches / Prompt	Average number of web search queries per prompt

📄 Submission Format

Upload a JSON file with your agent's outputs. The expected structure:

{
  "agent_name": "my-agent-v1",
  "agent_author": "Your Name",
  "agent_description": "Brief description...",
  "results": {
    "0": {
      "prompt_id": 0,
      "status": "completed",
      "output": "Agent's personalized pitch points...",
      "duration_seconds": 45.2,
      "tokens": {
        "prompt_tokens": 12000,
        "completion_tokens": 3000,
        "total_tokens": 15000
      },
      "searches": [
        { "queries": ["query"], "num_results": 10 }
      ]
    }
  }
}

Each result entry should contain the agent's personalized pitch points for the corresponding benchmark prompt. Include timing, token, and search metrics if available.

📤 How to Submit

Run your agent on the SDR-Bench prompt dataset
Collect the outputs into a JSON file following the format above
Go to the Upload Results tab
Upload your JSON file — it will be validated automatically
The SDR-Arena team will evaluate your results and update the leaderboard

SDR-Arena

Agent Details

Prompt #0

Expected Format

Required Fields

About SDR-Arena

🎯 What is SDR-Arena?

Customer Success Stories

Historical Internet Simulation

LLM-as-Judge Coverage

Open Submissions

📑 Dataset: SDR-Bench Corpus

🧪 Evaluation: Coverage Judge

📏 Metrics

📄 Submission Format

📤 How to Submit