The Overlooked Power of Prompt Repetition in Large Language Models

A new technique that simply repeats prompts before sending them to large language models can significantly boost accuracy without increasing latency, according to recent research. This approach leverages the architectural limitations of transformer models to improve performance on non-reasoning task...

In the relentless pursuit of squeezing more intelligence from large language models, researchers have often turned to increasingly complex strategies. Chain-of-thought reasoning, emotional framing, and multi-shot prompting have become standard tools in the quest for better answers. However, a recent study by Google Research suggests that sometimes, the simplest solutions are the most effective.

The research reveals that repeating a prompt—literally duplicating it before sending it to the model—can dramatically improve accuracy on tasks that don’t require deep reasoning. This technique, dubbed 'prompt repetition,' works by exploiting how transformer models process information from left to right, creating a blind spot where critical details can be missed in a single pass.

When a prompt is repeated, the second iteration gains access to all the tokens from the first, effectively allowing it to 'look back' and resolve ambiguities. This bidirectional attention within the same query eliminates errors that would otherwise occur when the model tries to answer without full context. The result? A 76% improvement in accuracy on non-reasoning benchmarks, with no losses recorded across 47 tests.

The implications for enterprise AI systems are substantial. Unlike traditional optimizations that add cost or latency, prompt repetition offers a 'free lunch'—it doesn’t slow down generation and doesn’t increase token usage. For lightweight models like Gemini 2.0 Flash Lite, this can mean jumping from 21.33% accuracy to nearly perfect retrieval (97.33%) on tasks like identifying the 25th name in a list of 50.

However, this isn’t a universal fix. When combined with chain-of-thought reasoning, the gains vanish, suggesting that reasoning models already perform a form of internal repetition. For applications where speed and direct answers matter more than step-by-step derivation, prompt repetition could become a standard, invisible optimization in AI pipelines.

The technique also introduces new considerations for security. If repeating a prompt clarifies benign instructions, it may do the same for malicious ones. Security teams will need to adapt red-teaming protocols to account for 'repeated injection' attacks, while simultaneously exploring how repeating system prompts could reinforce safety guardrails.

For now, the takeaway is clear: when faced with a model that struggles to retrieve details or follow instructions, the solution might not be more sophisticated prompting. Sometimes, saying it again—twice—is all that’s needed.

TECHOLAM

The Overlooked Power of Prompt Repetition in Large Language Models

Key takeaways