What exactly would you checksum? All intermediate states that weren’t committed, and all test run parameters and outputs? If so, how would you use that to detect an LLM? The current agentic LLM tools also do several edits and run tests for the thing they’re writing, then edit more until their tests work.
So the presence of test runs and intermediate states isn’t really indicative of a human writing code and I’m skeptical that distinguishing between steps a human would do and steps an LLM would do is any easier or quicker than distinguishing based on the end result.
What exactly would you checksum? All intermediate states that weren’t committed, and all test run parameters and outputs? If so, how would you use that to detect an LLM? The current agentic LLM tools also do several edits and run tests for the thing they’re writing, then edit more until their tests work.
So the presence of test runs and intermediate states isn’t really indicative of a human writing code and I’m skeptical that distinguishing between steps a human would do and steps an LLM would do is any easier or quicker than distinguishing based on the end result.