Evidence data

Reader-safe data behind our AI workflow tests.

These CSVs summarize the testing database that powers our public evidence logs. They intentionally exclude internal artifact paths, email/password recovery notes, private screenshots, and any account credentials.

Tools tracked

59

Meeting assistants, AI website builders, AI presentation tools, customer-support chatbots, and admin-routine tools in the current evidence database.

Test scenarios

6

Repeatable synthetic fixtures with answer keys or expected-output requirements.

Run records

58

Public-doc checks, signup/import recon, blockers, pending evidence-pack rows, and scored output runs.

Scored quality runs

6

Only complete generated-output evidence is scored; pending and setup-friction rows are not quality scores.

How to read the public data

Setup friction is not a recommendation.

Rows marked as setup friction document what blocked the test environment: signup challenges, OAuth-only login, unclear import routes, or file-picker limits. They do not measure the quality of a tool's generated transcript, summary, website, or export.

Quality scores require output artifacts.

We only publish a weighted score when a run has saved raw output, screenshots or text evidence, a completed run note, and rubric entries. Blank scores are deliberate, not missing endorsements.

Private details stay private.

The downloadable CSVs are sanitized from the internal operator database: no credential notes, no reset URLs, no account passwords, and no private screenshot paths. Public playbook pages carry the reader-facing context.

Current run-status mix

Status groupRunsMeaning
scored_output6Usable generated output exists and was scored against the scenario rubric.
setup_friction36Signup, upload, OAuth/SSO, security-filter, password-handling, or import-path friction was observed before usable fixture output existed.
availability_research10Official public docs/pricing/help pages were checked; hands-on output is still pending.
blocked_by_public_docs5Official docs made the current upload/private-fixture/no-spend test unsuitable.
pending1Waiting for a safe account/tool path or public-doc availability pass before reader-reproducible testing.

Live evidence logs using this data

AI website builder evidence log

Compares a local-service website fixture across AI website builders. Hostinger currently has a scored private-draft output; Squarespace, Durable, and Wix are setup-friction evidence only.

AI meeting follow-up evidence log

Tracks meeting-assistant availability, account/import friction, and upload limits before any final ranking. No meeting-assistant quality scores are published yet.

AI presentation tools evidence log

Tracks presentation-tool availability, signup/security blockers, credit-card trial boundaries, and deck-quality checks before any final ranking. No presentation-tool quality scores are published yet.

AI customer support chatbot evidence log

Tracks chatbot free/private testing boundaries, source-ingestion options, export caveats, and the 50-question FAQ accuracy checks required before any final ranking. No chatbot quality scores are published yet.

AI admin routine evidence log

Tracks prompt-only admin-routine candidates, connector/privacy boundaries, and the messy-inbox checks required before any final ranking. ChatGPT, Perplexity, Duck.ai, Gemini, and Mistral Le Chat / Vibe now have scored prompt-only baselines; Qwen Studio, Grok, Kimi K2, Meta AI, You.com, HuggingChat, and other gated/blocked tools remain unscored until complete raw outputs are saved.

Full methodology

Explains how scenarios, answer keys, rubric weighting, caveats, and last-tested dates shape every recommendation.