Your LLM is a confused oracle

Nov 26, 2025

We show that the forecasting accuracy of LLMs depends on what you ask and how you ask

5 Comments

The recency bias example with the S&P 500 really resonates here - it perfectly ilustrates why blindly adding context can actully hurt reasoning. That overconfident YES when the model should've stuck with mean reversion logic is such a powerful failure mode to highlight. Makes me wonder if there's a way to weight historical context more competitively against fresh headlines?

Vishnu Vardhan

Dec 2

what was the prompt used for these models tho? that probably played the biggest factor for all of these predictions , if the models were asked to imitate an expert and be skeptical like a human would be after watching news for a while , the results could be very different

Reply (2)

Chinmay

Jan 2

Hey, sorry for the late update, all our prompts are mentioned in the appendix of the paper

Kal

Jan 2

In the paper, the prompt structure: the model had to list reasons for and against, aggregate the reasons, make a decision, output a probability and then refine it. While this is definitely a good prompt, I’m curious if more aggression in the prompt template like actor-critique, or debating would improve performance of llms with news.

I wonder if there’s compute constraint in experimenting w diff prompts. There was no mention of their prompt building methodology in the paper

Reply (1)

Chinmay

Jan 2

Actually there were no compute constraints, as we were using openrouter for the same.

Our Prompts are a modification of manifold's trading bot prompt, with some things added/removed to fit to our usecase.

You can try out different prompts for sure, but running it on 200 questions, each with a high token budget is expensive, and also leads to forgetting in formatting. We did try with ablations on nudging the model towards more aggressive/passive predictions via the prompt, but the results were not decisive enough to check whats exactly going wrong.

Would try this out for sure as my schedule clears up!

Thanks

Lossfunk Letters

Your LLM is a confused oracle