Claude Opus 4.7 Looks Strong(er). Consistency and accuracy remain to be seen.
Anthropic is pitching better long-running coding and vision. Hacker News readership comment about consistency, token burn, and whether Claude still feels reliable from one week to the next.
Anthropic released Claude Opus 4.7. Straightforward pitch: better coding, better self-checking, higher-resolution vision, same API price as 4.6. Same price matters coz I keep hitting the cap on Pro.
Catch is the tokenizer. Same input now costs 1.0-1.35x more tokens. Higher effort settings burn more output tokens too.
Hacker News thread landed differently. Most comments weren't about the features. They were about trust. Can Anthropic make Claude feel stable enough to rely on again.
I remember when Opus 4.6 dropped. Felt great at first. Then performance became unpredictable week to week. Anthropic's backend is completely opaque. I suspect complex rules govern how requests get handled. Maybe even account-level flags. People on the same plans report totally different experiences with quality, usage limits, reliability.
What Anthropic is selling
Opus 4.7 framed as an agentic coding upgrade. Long-running work, better instruction-following, fewer breakdowns on complex tasks. New cyber-use safeguards getting tested. The agentic coding hype might actually meet reality if 4.7 has high accuracy and reliability on long tasks.
What HN cared about
Silent degradation. Plan limits. Token economics.
Several comments said they don't care about benchmark charts if the day-to-day product feels throttled or inconsistent. I feel that. Went from using Claude Code as my daily driver to complementing with Codex. Claude usage limits feel tiny some days. Inconsistent despite similar workflow.
Some wanted stricter literalness, higher effort controls, better vision. Security people uneasy about cyber restrictions making legitimate defensive research harder.
But the central thing was trust. That resonated.
What I think
Opus 4.7 might be a capability jump. Most developers care about consistency and reliability. I get that these are boring in the hype-driven AI business lifecycle. Pumping out higher model numbers then silently degrading them over time is probably better short-term business for SOTA providers. OpenAI follows this pattern too.
Not better for software engineers tho.