Code Generation

Benchmarks & Datasets


Coding Ability

HumanEval

Website | HuggingFace | Chen et al. 2021

Introduced in 2021 by OpenAI, along with the "Codex" model.


MBPP

Website | HuggingFace | Austin et al. 2021

The Mostly Basic Python Problems dataset,

"consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry level programmers, covering programming fundamentals, standard library functionality, and so on. Each problem consists of a task description, code solution and 3 automated test cases."


EvalPlus / HumanEval+

Website | Liu et. al. 2023

  • Adds 81x more tests for HumanEval
  • Tools for working with the test inputs and outputs

MultiPL-E

Website | HuggingFace | Cassano et al. 2022

Extends HumanEval and MBPP into 18 additional languages.