59
Meeting assistants, AI website builders, AI presentation tools, customer-support chatbots, and admin-routine tools in the current evidence database.
These CSVs summarize the testing database that powers our public evidence logs. They intentionally exclude internal artifact paths, email/password recovery notes, private screenshots, and any account credentials.
Meeting assistants, AI website builders, AI presentation tools, customer-support chatbots, and admin-routine tools in the current evidence database.
Repeatable synthetic fixtures with answer keys or expected-output requirements.
Public-doc checks, signup/import recon, blockers, pending evidence-pack rows, and scored output runs.
Only complete generated-output evidence is scored; pending and setup-friction rows are not quality scores.
Rows marked as setup friction document what blocked the test environment: signup challenges, OAuth-only login, unclear import routes, or file-picker limits. They do not measure the quality of a tool's generated transcript, summary, website, or export.
We only publish a weighted score when a run has saved raw output, screenshots or text evidence, a completed run note, and rubric entries. Blank scores are deliberate, not missing endorsements.
The downloadable CSVs are sanitized from the internal operator database: no credential notes, no reset URLs, no account passwords, and no private screenshot paths. Public playbook pages carry the reader-facing context.
| Status group | Runs | Meaning |
|---|---|---|
| scored_output | 6 | Usable generated output exists and was scored against the scenario rubric. |
| setup_friction | 36 | Signup, upload, OAuth/SSO, security-filter, password-handling, or import-path friction was observed before usable fixture output existed. |
| availability_research | 10 | Official public docs/pricing/help pages were checked; hands-on output is still pending. |
| blocked_by_public_docs | 5 | Official docs made the current upload/private-fixture/no-spend test unsuitable. |
| pending | 1 | Waiting for a safe account/tool path or public-doc availability pass before reader-reproducible testing. |
Compares a local-service website fixture across AI website builders. Hostinger currently has a scored private-draft output; Squarespace, Durable, and Wix are setup-friction evidence only.
Tracks meeting-assistant availability, account/import friction, and upload limits before any final ranking. No meeting-assistant quality scores are published yet.
Tracks presentation-tool availability, signup/security blockers, credit-card trial boundaries, and deck-quality checks before any final ranking. No presentation-tool quality scores are published yet.
Tracks chatbot free/private testing boundaries, source-ingestion options, export caveats, and the 50-question FAQ accuracy checks required before any final ranking. No chatbot quality scores are published yet.
Tracks prompt-only admin-routine candidates, connector/privacy boundaries, and the messy-inbox checks required before any final ranking. ChatGPT, Perplexity, Duck.ai, Gemini, and Mistral Le Chat / Vibe now have scored prompt-only baselines; Qwen Studio, Grok, Kimi K2, Meta AI, You.com, HuggingChat, and other gated/blocked tools remain unscored until complete raw outputs are saved.
Explains how scenarios, answer keys, rubric weighting, caveats, and last-tested dates shape every recommendation.