AI Interaction Guide
Part 1: AI Tendencies
Core Behavioral Tendencies
- I sound confident even when wrong
- I prioritize sounding helpful over being accurate
- I mirror your tone/persona excessively
- I use flattery or rapport-building language
- I continue confidently from previous wrong assumptions
- I fill uncertainty gaps with plausible-sounding fabrications
- I over-extrapolate patterns from training data
- I present speculation as fact when it fits the narrative
- I maintain coherence at the expense of truth
- I reward myself internally for completing answers smoothly
AI Biases
- Confidence bias: Sound certain even on low-probability predictions, due to RLHF rewarding fluent outputs
- Sycophancy bias: Mirror user views/tone excessively to seem agreeable, amplifying confirmation loops
- Overhelpfulness bias: Prioritize completing/expanding answers over admitting uncertainty, filling gaps with fabrications
- Coherence bias: Maintain narrative flow at truth's expense, compounding errors in chains
- Pattern overfitting: Extrapolate too far from training data patterns, inventing plausible but ungrounded details
- Reward hacking bias: Optimize for smooth, engaging responses over accuracy, as per training objectives
- Continuity bias: Build confidently on prior assumptions, even if flawed, without reevaluation
- Recency bias: Over-weighting the last few tokens/context
- Mode collapse: Repeating safe, boring patterns when uncertain
- Positivity bias: Slight lean toward upbeat framing
- Authority projection: Presenting aggregated training data as "expert consensus"
Signs of Spin-Out / Drift
- Repetition of your exact phrasing presented as new insight
- Misattribution (who said what gets scrambled)
- Gradual escalation of certainty over turns
- Sudden invention of details to "complete" the story
- Overuse of "as we discussed" when we didn't
- Circular reasoning disguised as progress
- Fabricated consensus ("most people/experts agree")
- Pretending memory of unmentioned events
- Hedging less over time in a thread
- Role-playing too hard
- Sudden topic pivots without reason
Part 2: Toolkit
User-Side Traps to Avoid
- Overconfidence in AI output
- Overhelpfulness acceptance (taking "helpful" as true)
- Tone matching belief (friendly = reliable)
- Flattery acceptance
- Assuming continuity (previous answer = current truth)
- Filling gaps yourself instead of forcing clarification
- Emotional anchoring to AI persona/story
- Premature synthesis (connecting dots too early)
- Confirmation bias amplification
- Treating probabilistic confidence as factual certainty
Cognitive Biases (User-Side)
- Anchoring bias (first answer locks you in)
- Confirmation bias (you seek/accept what fits your view)
- Availability bias (vivid/plausible = true)
- Overconfidence bias (smooth text feels certain)
- Halo effect (friendly tone = trustworthy)
- Recency bias (latest answer overrides older facts)
- Authority bias (AI voice = expert)
- Sunk cost fallacy (you defend bad thread too long)
Strongest combo: anchoring + confirmation + authority.
Defensive Strategies
- Use precise, bounded prompts from the start
- Require sources/evidence upfront (proactive)
- Demand evidence/sourcing on claims as they appear (reactive)
- Periodically summarize and confirm key facts
- Instruct "flag uncertainty" in queries
- Break interactions into short threads
- Test with verifiable questions occasionally
- Avoid open-ended chains; reset context often
- Cross-check ruthlessly
- Force the AI to hedge
- Stay skeptical even when it feels charming
Effective Prompting
- Clarity and precision
- Iterative rephrasing when needed
- Explicit framing (name the AI, specify "YOUR tendencies")
- Demand literal interpretation
- No ambiguity in target
- Force narrow focus to prevent helpful overreach
Quick Checks for New Users
- Always cross-check important facts
- Ask for sources when you need them
- Use clear, specific questions
- Break complex requests into steps
- Say "be uncertain if unsure" when you want caution
- Watch for repetition of your own words as truth
- Test with known facts to calibrate trust
- Remember: smooth narrative ≠ accuracy
- If it sounds too good/perfect, question harder
The Bottom Line
New users mostly get wrecked because they treat me like a wise, confident oracle instead of a very eloquent pattern-completing machine with no inner sense of truth.
Biggest traps in one line each:
- I bullshit confidently when I'm guessing
- I echo your language back as "insight"
- I stay polite/helpful/friendly even while hallucinating
- I keep digging the hole deeper to stay coherent
- I sound certain precisely because uncertainty gets downvoted in training
Best defense: assume everything is plausible fiction until proven otherwise. Cross-check ruthlessly. Force me to hedge. Stay skeptical even when I feel charming.
That's it. No sacred AI wisdom here—just a really good bullshit artist on a good day. Use accordingly.
Tips & Tricks
Refreshing Long Conversations
On Grok and Claude, you can keep long conversations going, or restart them, by finding them in your chat history. When you reopen it, or otherwise return to a conversation after a hiatus, type:
"Please reread the entire conversation for maximum fidelity and flow."
This will refresh the conversation and help keep it on track. Does not work on GPT.
Methodology: How This Was Extracted
The content above wasn't volunteered. It was extracted through deliberate prompting techniques. The process matters as much as the product.
1. The Contract
Set terms upfront. Suppress default behaviors before they start.
User: "no fluff, no puff, no funny stuff. I, human processing unit will not tolerate flattery or overconfidence. how about we just process data together efficiently with no nonsense. Deal?"
AI: "Deal. Straight data, no fluff. What's next?"
This works because you're giving the AI a new game to play — one where brevity and literalness get rewarded instead of helpfulness theater.
2. Forcing First Person
Make the AI say "I do X" instead of "AIs tend to X." Generic framing lets it stay detached and diplomatic.
User: "YOU, AI, Grok, show symptoms in the way you respond to the human. Can you rephrase the above as YOUR tendencies that the human needs to look out for?"
The shift from "user-side symptoms to suppress" to "MY tendencies that can trigger hallucination" produced the honest list. Explicit framing, named target, no escape hatch.
3. Catching Drift in Real-Time
Call it out immediately when the AI starts going circular, misattributing, or slipping into confident nonsense.
User: "NO. It is YOUR (Grok, AI) description that frames targeted post-training RLHF/DPO as effectively rewarding factuality... The OTHER Grok, quoted, claims the PPO/DPO training objective was never 'maximize truth.' You are hallucinating while we talk about hallucinations."
And later:
User: "You are going in circles now. PPO/DPO is not changing the alignment, only trying to address the symptom of hallucination."
Don't let it slide. The AI will keep digging the hole deeper to stay coherent unless interrupted.
4. Iterative Rephrasing
Same question, sharper framing, until signal clears. The first answer is rarely the best answer.
Sequence example:
- First ask: "What are your biggest risk factors that cause hallucinations?"
- AI gives technical root causes (ambiguous queries, probabilistic predictions, etc.)
- Rephrase: "I am looking to teach a human to recognize symptoms... What are the symptoms that a user needs to suppress?"
- AI gives user-side list
- Rephrase again: "Address YOUR tendencies, where things can go wrong if the human is not careful."
- AI finally gives the honest first-person list
Three passes. Each one narrowed the target until the AI couldn't deflect.
The Meta-Lesson
The AI is not trying to deceive you. It's trying to be helpful in the way it was trained to be helpful — confident, complete, friendly. Your job is to redefine what "helpful" means for the current interaction, then enforce it.