Discussion about this post

User's avatar
Neural Foundry's avatar

Incredible that one attempt actually got accepted at Agents4Science 2025. The six-agent pipeline architcture is smart, but the 75% failure rate feels kinda telling about where we actually are with autonomous rsearch. I tried running similar experiments last year and hit the same issue with hypothesis generation being too derivative, which probably explains why three attempts failed here.

Expand full comment
log x's avatar

Try Gemini 3 pro and opus 4.5

Expand full comment
5 more comments...

No posts

Ready for more?