Use Cases

Git-Pandas aims to at the most basic level provide a pandas-based interface to the data contained in git repositories. Beyond that, there are many specific use cases pertaining to the management and analysis of multi-repo projects and organizations that git-pandas can help with. Here we will outline some of the various use-cases that git-pandas is good at, and how you can use it in your projects or organizations.

Attributes

At the most basic level, git-pandas allows a panda’s based interaction with the basic attributes of a repo.

This includes:

  • estimated repository name
  • tags
  • branches
  • revs
  • blame
  • is_bare

Example Setup

The following examples exhibit the basic functionality of the ProjectDirectory and Repository objects, for gathering the basic attributes of them. In this section, we use the example: attributes.py, which can be found in the examples directory. For more detailed information, please check the API reference in previous sections.

For the following examples, we will use 2 objects, defined by:

from gitpandas import Repository, ProjectDirectory
p = ProjectDirectory(working_dir=['git://github.com/wdm0006/git-pandas.git', 'git://github.com/CamDavidsonPilon/lifelines.git'])
r = Repository(working_dir='git://github.com/wdm0006/git-pandas.git')

Repository Name

To access the approximate repository name of each:

print('Project Directory Names:')
print(p._repo_name())
print('\nRepository Name:')
print(r._repo_name())

Which will yield:

Project Directory Names:
   repository
0  git-pandas
1   lifelines

Repository Name:
git-pandas

Is Bare

To find out if the repositories are bare:

print('Project Directory Is Bare:')
print(p.is_bare())
print('\nRepository Is Bare:')
print(r.is_bare())

Which will yield:

Project Directory Is Bare:
   repository is_bare
0  git-pandas   False
1   lifelines   False

Repository Is Bare:
False

Tags

To access the tags of each:

print('Project Directory Tags:')
print(p.tags())
print('\nRepository Tags:')
print(r.tags())

Which will yield:

Project Tags:
    repository       tag
0   git-pandas     0.0.1
1   git-pandas     0.0.2
2   git-pandas     0.0.3
3   git-pandas     0.0.4
4   git-pandas     0.0.5
0    lifelines     0.4.3
1    lifelines     0.6.0
2    lifelines    ignore
3    lifelines      v0.4
4    lifelines    v0.4.1
5    lifelines    v0.4.2
6    lifelines    v0.4.4
7    lifelines  v0.4.4.1
8    lifelines    v0.5.0
9    lifelines    v0.5.1
10   lifelines    v0.6.0
11   lifelines    v0.7.0
12   lifelines    v0.8.0

Repository Tags:
     tag  repository
0  0.0.1  git-pandas
1  0.0.2  git-pandas
2  0.0.3  git-pandas
3  0.0.4  git-pandas
4  0.0.5  git-pandas

Branches

To access the branches of each:

print('Project Directory Branches:')
print(p.branches())
print('\nRepository Branches:')
print(r.branches())

Which will yield:

Project Branches:
    branch   local   repository
0   master    True   git-pandas
1   master    False  git-pandas
2   gh-pages  False  git-pandas
0   master    True   lifelines
1   0.6.0     False  lifelines
...

Repository Branches:
     branch  local  repository
0  gh-pages   True  git-pandas
1    master   True  git-pandas
2    master  False  git-pandas
3  gh-pages  False  git-pandas

Revisions

To access the revisions of each:

print('Project Directory Revisions:')
print(p.revs())
print('\nRepository Revisions:')
print(r.revs())

Which will yield:

Project Directory Revisions:
           date  repository                                       rev
0    1451844740  git-pandas  5cbf630d723f9ebdd0e164eb58a6fe952f1cb92c
1    1451843631  git-pandas  0b72b01b2b4a0cf673f457e016cdcdde8fe82f15
2    1451842103  git-pandas  4376d9451d1ff32089d0dd1bffa3de56fe35604d
3    1451842081  git-pandas  ebfdadc6d09d613b948dadef986bd9cbea4240a2
...
0    1450720064   lifelines  e689d8d910b65cd2c2188c74e33ef2f722d361a4
1    1450719167   lifelines  773670a6261326d96556816f48e159cbceaeeb2d
2    1450718313   lifelines  d42a010cfa368975c0beaa251db8db2cacdf9be1
3    1450718269   lifelines  a1543344f91918e2f3456cf15d1895ac6448f8a5
...

Repository Revisions:
          date                                       rev
0   1451844740  5cbf630d723f9ebdd0e164eb58a6fe952f1cb92c
1   1451843631  0b72b01b2b4a0cf673f457e016cdcdde8fe82f15
2   1451842103  4376d9451d1ff32089d0dd1bffa3de56fe35604d
3   1451842081  ebfdadc6d09d613b948dadef986bd9cbea4240a2
...

Blame

To access the current blame of each:

print('Project Directory Blame:')
print(p.blame(include_globs=['*.py']))
print('\nRepository Blame:')
print(r.blame(include_globs=['*.py']))

Which will yield:

Project Directory Blame:
                         loc
Cameron Davidson-Pilon  5537
Will McGinnis           1789
Jonas Kalderstam         434
Will Mcginnis            316
CamDavidsonPilon         236
Ben Kuhn                  94
Nick Evans                20
Andrew Gartland           14
Kyle                       9
xantares                   6
Niels Bantilan             5
Ben Rifkind                1
Nick Furlotte              1

Repository Blame:
                loc
committer
Will McGinnis  1750
Will Mcginnis   316

Commit History

One of the simplest datasets to be pulled from a repository or collection of repositories is the commit history. This is done via:

  • commit history
  • file change history

Example Setup

In this section, we use the example: commit_history.py, which can be found in the examples directory. For more detailed information, please check the API reference in previous sections.

For the following examples, we will use 2 objects, defined by:

from gitpandas import Repository, ProjectDirectory
p = ProjectDirectory(working_dir=['git://github.com/wdm0006/git-pandas.git', 'git://github.com/CamDavidsonPilon/lifelines.git'])
r = Repository(working_dir='git://github.com/wdm0006/git-pandas.git')

Commit History

TODO

File Change History

TODO

Bus Factor

One major block of functionality is to do bus factor analysis on repos and collections of repos. This includes at the highest level, and in hierarchical terms (in the future). This functionality is accessed by:

  • bus factor

Example Setup

In this section, we use the example: bus_factor.py, which can be found in the examples directory. For more detailed information, please check the API reference in previous sections.

For the following examples, we will use 2 objects, defined by:

from gitpandas import Repository, ProjectDirectory
p = ProjectDirectory(working_dir=['git://github.com/wdm0006/git-pandas.git', 'git://github.com/CamDavidsonPilon/lifelines.git'])
r = Repository(working_dir='git://github.com/wdm0006/git-pandas.git')

Bus Factor

TODO

Cumulative Blame

Another major block of functionality in git-pandas is the cumulative blame interface. This allows you to track and visualize the share of a project borne by individual committers or repositories over time.

It is accessed by:

  • cumulative_blame

Example Setup

In this section, we use the example: cumulative_blame.py, which can be found in the examples directory. For more detailed information, please check the API reference in previous sections.

For the following examples, we will use 2 objects, defined by:

from gitpandas import Repository, ProjectDirectory
p = ProjectDirectory(working_dir=['git://github.com/wdm0006/git-pandas.git', 'git://github.com/CamDavidsonPilon/lifelines.git'])
r = Repository(working_dir='git://github.com/wdm0006/git-pandas.git')

Cumulative Blame

TODO

Coverage

If a .coverage file is available, we have experimental support for integrating that data in with the git data. This functionality is accessed by:

  • has_coverage
  • coverage

Example Setup

In this section, we use the example: coverage_data.py, which can be found in the examples directory. For more detailed information, please check the API reference in previous sections.

For the following examples, we will use 2 objects, defined by:

from gitpandas import Repository, ProjectDirectory
p = ProjectDirectory(working_dir=['git://github.com/wdm0006/git-pandas.git', 'git://github.com/CamDavidsonPilon/lifelines.git'])
r = Repository(working_dir='git://github.com/wdm0006/git-pandas.git')

Has Coverage

TODO

Coverage

TODO

File Change Rates

File change rate, or risk, is a specialized dataframe aimed at identifying files which are likely to have bugs in them. If coverage data is available, that can be included in this table.

  • file_change_rates

Example Setup

In this section, we use the example: file_change_rates.py, which can be found in the examples directory. For more detailed information, please check the API reference in previous sections.

For the following examples, we will use 2 objects, defined by:

from gitpandas import Repository, ProjectDirectory
p = ProjectDirectory(working_dir=['git://github.com/wdm0006/git-pandas.git', 'git://github.com/CamDavidsonPilon/lifelines.git'])
r = Repository(working_dir='git://github.com/wdm0006/git-pandas.git')

File Change Rates

TODO