Code Generation
Benchmarks & Datasets
Coding Ability
HumanEval
Website | HuggingFace | Chen et al. 2021
Introduced in 2021 by OpenAI, along with the "Codex" model.
MBPP
Website | HuggingFace | Austin et al. 2021
The Mostly Basic Python Problems dataset,
"consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry level programmers, covering programming fundamentals, standard library functionality, and so on. Each problem consists of a task description, code solution and 3 automated test cases."
EvalPlus / HumanEval+
- Adds 81x more tests for HumanEval
- Tools for working with the test inputs and outputs
MultiPL-E
Website | HuggingFace | Cassano et al. 2022
Extends HumanEval and MBPP into 18 additional languages.