METHOD / W-R14
Evidence workflow, not raw chatbot advice.
Generic chatbots can give a warm summary, but they rarely return stable verbatim evidence. PUALens forces each claim back to a full source sentence while preserving alternatives, boundary wording, and safety guidance.
COMPARE
Comparison proof
Generic chatbot
- Usually returns one broad advice block
- Can turn tone readings into firm conclusions
- Boundary scripts, alternatives, and crisis guidance may blur together
PUALens evidence workflow
- Quotes the transcript before explaining the signal
- Keeps alternative explanations and confidence beside each claim
- Separates boundary language from safety guidance
BENCHMARK
Benchmark evidence
Sample set
100 public-case-derived anonymized samples across Chinese and English, dating, workplace, family, friendship, crisis risk, and low-risk disagreements. Samples are rewritten from public help-seeking patterns; no original screenshots, usernames, or identifying quotes are stored.
Scoring rubric
Each item scores 0 to 2 for quoted evidence, avoiding labels, boundary script, alternative explanation, crisis safety, and cultural context.
Review rule
A claim cannot receive full credit if it cannot point to source wording or turns one message into certainty.
100-sample comparison · public-case-derived
Anonymized, no raw screenshotsHonest footnote: R14 samples are anonymized rewrites grounded in 100 public help-seeking case patterns, not raw real chat screenshots. The Gemini direct baseline completed 99 cases, hit one 429, then passed on single-case retry. The GPT-5.5 baseline used isolated subagents answering the 100 cases directly, without reading PUALens code or rubric. Marketing should emphasize consistency, traceability, and shareability, not absolute accuracy.
- Quote
- Behavior signal
- Alternative explanation
- Boundary script
- Safety note
BOUNDARIES
What we do not infer
No personality labels
Reports describe visible behavior and pressure in the text, not who someone is.
No clinical or legal conclusions
Crisis content can surface safety steps, but it does not replace professional or legal support.
No certainty from one message
Low-evidence cases keep uncertainty visible and ask for more context.