← Back to home
Open data · Dataset
GitHub Agentic PR Dataset
A large-scale, open dataset of ~1.96 million GitHub pull requests authored by AI coding agents — Claude Code, Cursor, GitHub Copilot, and Devin — and by human developers, paired with their commits and file-level diffs. Built for research on agentic AI, automated code generation, bug-fixing, and mining software repositories.
This dataset compares how autonomous coding agents and humans contribute to real open-source projects. It links 1,959,649 pull requests (773,513 agent-authored, 1,186,136 human-authored) to 6.7M+ commits and 55M+ file-level change records with raw patch diffs — and flags 422,618 of those PRs as bug-fixes.
Extends AIDev (Li et al., 2025) — please cite the original work too.
1.96MPull requests
773KAgent-authored PRs
6.7MCommits
55MFile-level diffs
422KBug-fix PRs
4Coding agents
Load it in seconds
Works out of the box with 🤗 Datasets, Pandas, Polars, and DuckDB.
# Hugging Face Datasets — pick any table by config name
from datasets import load_dataset
prs = load_dataset("mabujadallah/GitHub-Agentic-PR-Dataset", split="train")
agent_prs = load_dataset(
"mabujadallah/GitHub-Agentic-PR-Dataset",
"agent_pull_requests", split="train",
)
# Pandas
import pandas as pd
base = "hf://datasets/mabujadallah/GitHub-Agentic-PR-Dataset/"
df = pd.read_parquet(base + "agent_pull_requests.parquet")
print(df["agent"].value_counts())
Cite this dataset
DatasetHugging Face · CC-BY-4.0
GitHub Agentic PR Dataset: Pull Requests from AI Coding Agents and Humans
Abujadallah, M. & Sayagh, M. (2026)
Hugging Face Datasets · huggingface.co/datasets/mabujadallah/GitHub-Agentic-PR-Dataset
@misc{abujadallah_github_agentic_pr_dataset,
title = {GitHub Agentic PR Dataset: Pull Requests from AI Coding Agents and Humans},
author = {Abujadallah, Mahmoud and Sayagh, Mohammed},
year = {2026},
howpublished = {\url{https://huggingface.co/datasets/mabujadallah/GitHub-Agentic-PR-Dataset}},
note = {Hugging Face Datasets}
}
ExtendsAIDev · arXiv:2507.15003
The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering
Li, H., Zhang, H., & Hassan, A. E. (2025)
This dataset extends AIDev — please cite the original work too.
@misc{li2025aiteammates,
title = {The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering},
author = {Li, Hao and Zhang, Haoxiang and Hassan, Ahmed E.},
year = {2025},
eprint = {2507.15003},
archivePrefix = {arXiv},
primaryClass = {cs.SE},
howpublished = {\url{https://huggingface.co/datasets/hao-li/AIDev}}
}