Use Cases and Examples

Git-Pandas provides a powerful interface for analyzing Git repositories using pandas DataFrames. This guide demonstrates common use cases and provides practical examples.

Basic Repository Analysis

Repository Attributes

Get basic information about a repository:

from gitpandas import Repository
repo = Repository('/path/to/repo')

# Get repository name
print(repo._repo_name())

# Check if repository is bare
print(repo.is_bare())

# Get all tags
print(repo.tags())

# Get all branches
print(repo.branches())

# Get all revisions
print(repo.revs())

# Get blame information
print(repo.blame(include_globs=['*.py']))

Commit History Analysis

Analyze commit patterns and history:

# Get commit history
commits_df = repo.commit_history()

# Get file change history
changes_df = repo.file_change_history()

# Filter by file extension
python_changes = repo.file_change_history(include_globs=['*.py'])

# Filter by directory
src_changes = repo.file_change_history(include_globs=['src/*'])

Project-Level Analysis

Multiple Repository Analysis

Analyze multiple repositories simultaneously:

from gitpandas import ProjectDirectory

# Create project from multiple repositories
project = ProjectDirectory([
    'git://github.com/user/repo1.git',
    'git://github.com/user/repo2.git'
])

# Get aggregated metrics
print(project.general_information())

# Calculate bus factor
print(project.bus_factor())

# Get file change rates
print(project.file_change_rates())

# Generate punchcard data
print(project.punchcard())

Advanced Analysis

Cumulative Blame Analysis

Track code ownership over time:

# Get cumulative blame
blame_df = repo.cumulative_blame()

# Plot cumulative blame
import matplotlib.pyplot as plt
blame_df.plot(x='date', y='loc', title='Cumulative Blame Over Time')
plt.show()

Bus Factor Analysis

Analyze project sustainability:

# Calculate bus factor
bus_factor = project.bus_factor()

# Get detailed contributor metrics
contributors_df = project.contributor_metrics()

# Analyze file ownership
ownership_df = project.file_ownership()

Performance Optimization

Using Caching

Optimize performance with caching:

# Enable in-memory caching
repo = Repository('/path/to/repo', cache=True)

# Use Redis for persistent caching
repo = Repository(
    '/path/to/repo',
    cache=True,
    cache_backend='redis',
    redis_url='redis://localhost:6379/0'
)

Visualization Examples

Commit Patterns

Visualize commit patterns:

# Generate punchcard data
punchcard_df = repo.punchcard()

# Plot commit patterns
import matplotlib.pyplot as plt
punchcard_df.plot(kind='heatmap', title='Commit Patterns')
plt.show()

File Change Analysis

Visualize file changes:

# Get file change history
changes_df = repo.file_change_history()

# Plot changes over time
changes_df.plot(x='date', y='changes', title='File Changes Over Time')
plt.show()

Best Practices

  • Use caching for expensive operations

  • Filter data early to improve performance

  • Leverage pandas operations for analysis

  • Consider memory usage with large repositories

  • Use appropriate visualization tools

For more examples and detailed API documentation, see the Repository and Project Directory pages.