Can LLMs find bugs in large codebases?
We bet your LLM can find a bug in a snippet of code. But how about 25 pages of code? We propose a new 'needle in a haystack' analysis called 'Bug in the Code Stack' that tests how well LLMs can find bugs in large codebases.