Cosine AI is a lab focused on human reasoning, aiming to develop artificial intelligence capable of reasoning like a human. Its latest product, Genie, achieved the world's highest score on the SWE-Bench, a software engineering benchmark, significantly outperforming its competitors.Genie enables superior software development by embedding human reasoning into the training data.
Genie announced that it has built the world's strongest AI Programming Agents product. It scored 30.08% on the SWE-Bench assessment and 50.67% on SWE-Lite. It perfectly mimics the cognitive processes, logic, and workflow of a human engineer.
Genie's design goal was to make it "autonomous" and able to act logically on what it sees. To accomplish this, the dataset needs to be able to represent this logical action, including finding the prerequisite information needed to perform a task in an unknown code base.
Genie's reasoning features four main processes - planning, retrieving, writing, and running code - achieve higher performance by simulating human behavior rather than the behavior of the underlying language model.
Genie is also trained using a self-improvement methodology that improves performance by using data generated by the model itself, which results in a significant increase in the model's responsiveness in the face of errors.