Use Cases and Examples
Git-Pandas provides a powerful interface for analyzing Git repositories using pandas DataFrames. This guide demonstrates common use cases and provides practical examples.
Basic Repository Analysis
Repository Attributes
Get basic information about a repository:
from gitpandas import Repository
repo = Repository('/path/to/repo')
# Get repository name
print(repo._repo_name())
# Check if repository is bare
print(repo.is_bare())
# Get all tags
print(repo.tags())
# Get all branches
print(repo.branches())
# Get all revisions
print(repo.revs())
# Get blame information
print(repo.blame(include_globs=['*.py']))
Commit History Analysis
Analyze commit patterns and history:
# Get commit history
commits_df = repo.commit_history()
# Get file change history
changes_df = repo.file_change_history()
# Filter by file extension
python_changes = repo.file_change_history(include_globs=['*.py'])
# Filter by directory
src_changes = repo.file_change_history(include_globs=['src/*'])
Project-Level Analysis
Multiple Repository Analysis
Analyze multiple repositories simultaneously:
from gitpandas import ProjectDirectory
# Create project from multiple repositories
project = ProjectDirectory([
'git://github.com/user/repo1.git',
'git://github.com/user/repo2.git'
])
# Get aggregated metrics
print(project.general_information())
# Calculate bus factor
print(project.bus_factor())
# Get file change rates
print(project.file_change_rates())
# Generate punchcard data
print(project.punchcard())
Advanced Analysis
Cumulative Blame Analysis
Track code ownership over time:
# Get cumulative blame
blame_df = repo.cumulative_blame()
# Plot cumulative blame
import matplotlib.pyplot as plt
blame_df.plot(x='date', y='loc', title='Cumulative Blame Over Time')
plt.show()
Bus Factor Analysis
Analyze project sustainability:
# Calculate bus factor
bus_factor = project.bus_factor()
# Get detailed contributor metrics
contributors_df = project.contributor_metrics()
# Analyze file ownership
ownership_df = project.file_ownership()
Performance Optimization
Using Caching
Optimize performance with caching:
# Enable in-memory caching
repo = Repository('/path/to/repo', cache=True)
# Use Redis for persistent caching
repo = Repository(
'/path/to/repo',
cache=True,
cache_backend='redis',
redis_url='redis://localhost:6379/0'
)
Visualization Examples
Commit Patterns
Visualize commit patterns:
# Generate punchcard data
punchcard_df = repo.punchcard()
# Plot commit patterns
import matplotlib.pyplot as plt
punchcard_df.plot(kind='heatmap', title='Commit Patterns')
plt.show()
File Change Analysis
Visualize file changes:
# Get file change history
changes_df = repo.file_change_history()
# Plot changes over time
changes_df.plot(x='date', y='changes', title='File Changes Over Time')
plt.show()
Best Practices
Use caching for expensive operations
Filter data early to improve performance
Leverage pandas operations for analysis
Consider memory usage with large repositories
Use appropriate visualization tools
For more examples and detailed API documentation, see the Repository and Project Directory pages.