Welcome to Git-Pandas Documentation

Git-Pandas is a powerful Python library that transforms Git repository data into pandas DataFrames, making it easy to analyze and visualize your codebase’s history, contributors, and development patterns.

Cumulative Blame Visualization

Quick Start

Install Git-Pandas using pip:

pip install git-pandas

Basic Usage

Analyze a single repository:

from gitpandas import Repository
repo = Repository('/path/to/repo')
commits_df = repo.commit_history()
blame_df = repo.blame()

Analyze multiple repositories:

from gitpandas import ProjectDirectory
project = ProjectDirectory('/path/to/project')
project_info = project.general_information()

Key Features

  • Repository Analysis: Extract commit history, file changes, and blame information

  • Project Insights: Calculate bus factor, development time, and contributor metrics

  • GitHub Integration: Analyze GitHub profiles and repository metrics

  • Visualization Tools: Built-in plotting utilities for common Git analytics

  • Performance Optimization: Optional caching support for memory-intensive operations

Core Components

The library is built around two main components:

  • Repository: A wrapper around a single Git repository

  • ProjectDirectory: A collection of Git repositories for aggregate analysis

For detailed information about these components, see the Repository and Project Directory documentation.

Documentation Contents

Additional Resources

License

This project is BSD licensed (see LICENSE.md)

Detailed Documentation

Currently, the two main sources of documentation are the repository and project pages, which have the Sphinx docs from those two classes, as well as instructions on how to create the objects. For detailed examples, check out the use-cases page.

Contents:

Indices and tables