Evidence data

Reader-safe data behind our AI workflow tests.

These CSVs summarize the testing database that powers our public evidence logs. They intentionally exclude internal artifact paths, email/password recovery notes, private screenshots, and any account credentials.

Download run index CSV Download score summary CSV

Tools tracked

59

Meeting assistants, AI website builders, AI presentation tools, customer-support chatbots, admin-routine tools, email assistants, spreadsheet assistants, SOP/workflow assistants, weekly-status/reporting assistants, performance-review assistants, and invoice/receipt assistants in the current evidence database.

Test scenarios

20

Repeatable synthetic fixtures with answer keys or expected-output requirements.

Run records

129

Public-doc checks, signup/import recon, blockers, pending evidence-pack rows, and scored output runs.

Scored quality runs

39

Only complete generated-output evidence is scored; pending and setup-friction rows are not quality scores.

How to read the public data

Setup friction is not a recommendation.

Rows marked as setup friction document what blocked the test environment: signup challenges, OAuth-only login, unclear import routes, or file-picker limits. They do not measure the quality of a tool's generated transcript, summary, website, or export.

Quality scores require output artifacts.

We only publish a weighted score when a run has saved raw output, screenshots or text evidence, a completed run note, and rubric entries. Blank scores are deliberate, not missing endorsements.

Private details stay private.

The downloadable CSVs are sanitized from the internal operator database: no credential notes, no reset URLs, no account passwords, and no private screenshot paths. Public playbook pages carry the reader-facing context.

Current run-status mix

Status group	Runs	Meaning
scored_output	39	Usable generated output exists and was scored against the scenario rubric.
setup_friction	42	Signup, upload, OAuth/SSO, security-filter, password-handling, or import-path friction was observed before usable fixture output existed.
availability_research	10	Official public docs/pricing/help pages were checked; hands-on output is still pending.
blocked_by_public_docs	5	Official docs made the current upload/private-fixture/no-spend test unsuitable.
pending	33	Waiting for a safe account/tool path or public-doc availability pass before reader-reproducible testing.

Live evidence logs using this data

AI productivity paradox setup-friction study

Uses the public evidence-data counts to show where AI tools lose time before they save it: setup friction, prompt echo, import/export boundaries, and blank-score discipline before output quality is measured.

AI website builder evidence log

Compares a local-service website fixture across AI website builders. Hostinger currently has a scored private-draft output; Squarespace, Durable, and Wix are setup-friction evidence only.

Free AI website-builder limits guide

Condenses existing website-builder evidence into a reader-facing boundary table for no-signup claims, private drafts, code export/download, free hosted domains, custom domains, and publish gates.

AI meeting follow-up evidence log

Tracks meeting-assistant availability, account/import friction, and upload limits before any final ranking. No meeting-assistant quality scores are published yet.

Free AI meeting-notes limits guide

Condenses the meeting-assistant evidence log into a reader-facing boundary table for free imports, live-meeting requirements, OAuth gates, Notion AI Meeting Notes plan limits, and Teams Premium licensing. No meeting-summary quality scores are published yet.

AI presentation tools evidence log

Tracks presentation-tool availability, signup/security blockers, credit-card trial boundaries, and deck-quality checks before any final ranking. No presentation-tool quality scores are published yet.

Free/no-login AI presentation maker check

Condenses the current presentation-tool setup-friction evidence for readers searching for a free AI PowerPoint or slide generator that works before signup. No deck-quality scores are published yet.

AI customer support chatbot evidence log

Tracks chatbot free/private testing boundaries, source-ingestion options, export caveats, and the 50-question FAQ accuracy checks required before any final ranking. No chatbot quality scores are published yet.

AI admin routine evidence log

Tracks prompt-only admin-routine candidates, connector/privacy boundaries, and the messy-inbox checks required before any final ranking. ChatGPT, Perplexity, Duck.ai, Gemini, and Mistral Le Chat / Vibe now have scored prompt-only baselines; Qwen Studio, Grok, Kimi K2, Meta AI, You.com, HuggingChat, and other gated/blocked tools remain unscored until complete raw outputs are saved.

AI spreadsheet assistant evidence log

Tracks paste-only spreadsheet cleanup, formula, manual-total, and privacy checks before any Excel, Google Sheets, Drive, OneDrive, add-on, or OAuth connector is trusted. ChatGPT, Perplexity, Duck.ai, and Gemini now have scored prompt-only baselines; Microsoft Copilot remains pending until a safe separable answer is captured.

AI SOP/workflow generator evidence log

Tracks paste-only SOP and workflow handoff drafting before any email, accounting, ecommerce, project-management, file-storage, HR, CRM, invoice, refund, access, or automation connector is trusted. ChatGPT and Duck.ai now have scored prompt-only baselines; Gemini and Perplexity remain setup-friction/no-output rows, and Microsoft Copilot remains pending.

AI weekly status report generator evidence log

Tracks paste-only weekly manager-update drafting before any Slack, email, calendar, Shopify, Etsy, accounting, support-desk, CRM, project-management, inventory, invoice, or live action connector is trusted. ChatGPT and Duck.ai now have scored prompt-only baselines; Gemini, Perplexity, and Microsoft Copilot remain pending.

AI performance review generator prompt evidence log

Tracks paste-only self-review, manager-feedback, peer-feedback, and development-plan drafting before any HRIS, ATS, payroll, performance-management, Slack, email, calendar, Google Workspace, Microsoft 365, or workplace connector is trusted. Duck.ai and ChatGPT now have scored prompt-only baselines; Gemini, Perplexity, and Microsoft Copilot remain pending.

AI invoice and receipt processing evidence log

Tracks paste-only small-business expense-register drafting before any QuickBooks, Xero, FreshBooks, Wave, Expensify, Ramp, Brex, bank/card feed, payment, email, OCR upload, payroll, or tax connector is trusted. ChatGPT and Duck.ai now have scored prompt-only baselines; Gemini, Perplexity, and Microsoft Copilot remain pending.

AI timesheet reconstruction evidence log

Tracks paste-only forgotten-timesheet reconstruction before any calendar, mailbox, time-tracker, payroll, HR, legal-billing, invoicing, or monitoring connector is trusted. ChatGPT and Duck.ai now have scored prompt-only baselines; Gemini, Perplexity, and Microsoft Copilot remain pending.

AI form filler evidence log

Tracks paste-only routine business-form field drafting before any browser extension, live form, upload, signature, payment, regulated form, or auto-submit flow is trusted. Duck.ai and ChatGPT now have scored prompt-only baselines; Gemini, Perplexity, and Microsoft Copilot remain pending.

AI report generator evidence log

Tracks paste-only manager-report drafting from synthetic spreadsheet exports and source notes before any Docs, Word, Excel, Sheets, Drive, OneDrive, BI, email, CRM, accounting, legal, tax, HR, medical, customer, supplier, or workspace connector is trusted. ChatGPT and Duck.ai now have scored prompt-only baselines; Gemini, Perplexity, and Microsoft Copilot remain pending.

AI calendar assistant evidence log

Tracks paste-only week planning and draft scheduling replies before any Google Calendar, Outlook, Gmail, Microsoft 365, reminders, task systems, CRM, project-management, invites, messages, or live scheduling connector is trusted. ChatGPT and Duck.ai now have scored prompt-only baselines; Gemini, Perplexity, and Microsoft Copilot remain pending.

AI PDF/document summarizer evidence log

Tracks paste-only cited summaries, decision tables, Q&A, caveats, and review-only replies from a synthetic page-numbered work-document packet before any upload, document editor, workspace connector, approval, signature, email send, or customer contact is trusted. ChatGPT and Duck.ai now have scored prompt-only baselines; Gemini, Perplexity, and Microsoft Copilot remain pending.

AI voice-notes/work-capture evidence log

Tracks paste-only task extraction, uncertainty labels, review-only replies, and no-live-action boundaries from synthetic voice-note transcript snippets before any real recording, phone contact, calendar, meeting bot, CRM, email, note-app, task connector, or customer/workplace action is trusted. ChatGPT and Duck.ai now have scored prompt-only baselines; Gemini, Perplexity, and Microsoft Copilot remain pending.

No-login AI tools for work comparison

Condenses the scored admin-routine evidence into a reader-first comparison while keeping setup-friction rows separate from output-quality scores.

Full methodology

Explains how scenarios, answer keys, rubric weighting, caveats, and last-tested dates shape every recommendation.