Dataset API

Example:

from dgitcore import api

# Load/upload profile, load plugins etc.
api.initialize()
repo = api.datasets.lookup('pingali',
                           'simple-regression-rawdata')
r = repo.get_resource('demo-input.csv')
df = pd.read_csv(r['localfullpath'])
...
dgitcore.api.lookup(username, reponame)[source]

Lookup a repo based on username reponame

dgitcore.api.list_repos(remote=False)[source]

List repos

Parameters:remote: Flag
dgitcore.api.shellcmd(repo, args)[source]

Run a shell command within the repo’s context

Parameters:

repo: Repository object

args: Shell command

dgitcore.api.log(repo, args=[])[source]

Log of the changes executed until now

Parameters:

repo: Repository object

args: Arguments to git command

dgitcore.api.show(repo, args=[])[source]

Show commit details

Parameters:

repo: Repository object

args: Arguments to git command

dgitcore.api.push(repo, args=[])[source]

Push changes to the backend

Parameters:

repo: Repository object

args: Arguments to git command

dgitcore.api.pull(repo, args=[])[source]

Pull changes from the backend

Parameters:

repo: Repository object

args: Arguments to git command

dgitcore.api.commit(repo, args=[])[source]

Commit changes to the data repository

Parameters:

repo: Repository object

args: Arguments to git command

dgitcore.api.stash(repo, args=[])[source]

Stash the changes

Parameters:

repo: Repository object

args: Arguments to git command

dgitcore.api.drop(repo, args=[])[source]

Drop the repository (new to dgit)

Parameters:

repo: Repository object

args: Arguments to git command

dgitcore.api.status(repo, args=[])[source]

Show status of the repo

Parameters:

repo: Repository object (result of lookup)

details: Show internal details of the repo

args: Parameters to be passed to git status command

dgitcore.api.post(repo, args=[])[source]

Post to metadata server

Parameters:repo: Repository object (result of lookup)
dgitcore.api.clone(url)[source]

Clone a URL. Examples include:

Parameters:url: URL of the repo
dgitcore.api.init(username, reponame, setup, force=False, options=None, noinput=False)[source]

Initialize an empty repository with datapackage.json

Parameters:

username: Name of the user

reponame: Name of the repo

setup: Specify the ‘configuration’ (git only, git+s3 backend etc)

force: Force creation of the files

options: Dictionary with content of dgit.json, if available.

noinput: Automatic operation with no human interaction

dgitcore.api.diff(repo, args=[])[source]

Diff between versions

Parameters:

repo: Repository object

args: Arguments to git command

dgitcore.api.remote(repo, args=[])[source]

Show remote

Parameters:

repo: Repository object

args: Arguments to git command

dgitcore.api.add(repo, args, targetdir, execute=False, generator=False, includes=[], script=False, source=None)[source]

Add files to the repository by explicitly specifying them or by specifying a pattern over files accessed during execution of an executable.

Parameters:

repo: Repository

args: files or command line

(a) If simply adding files, then the list of files that must be added (including any additional arguments to be passed to git (b) If files to be added are an output of a command line, then args is the command lined

targetdir: Target directory to store the files

execute: Args are not files to be added but scripts that must be run.

includes: patterns used to select files to

script: Is this a script?

generator: Is this a generator

source: Link to the original source of the data

dgitcore.api.validate(repo, validator_name=None, filename=None, rulesfiles=None, args=[])[source]

Validate the content of the files for consistency. Validators can look as deeply as needed into the files. dgit treats them all as black boxes.

Parameters:

repo: Repository object

validator_name: Name of validator, if any. If none, then all validators specified in dgit.json will be included.

filename: Pattern that specifies files that must be processed by the validators selected. If none, then the default specification in dgit.json is used.

rules: Pattern specifying the files that have rules that validators will use

show: Print the validation results on the terminal

Returns:

status: A list of dictionaries, each with target file processed, rules file applied, status of the validation and any error message.

dgitcore.api.auto_init(autofile, force_init=False)[source]

Initialize a repo-specific configuration file to execute dgit

Parameters:

autofile: Repo-specific configuration file (dgit.json)

force_init: Flag to force to re-initialization of the configuration file

dgitcore.api.auto_get_repo(autooptions, debug=False)[source]

Automatically get repo

Parameters:autooptions: dgit.json content
dgitcore.api.transform(repo, name=None, filename=None, force=False, args=[])[source]

Materialize queries/other content within the repo.

Parameters:

repo: Repository object

name: Name of transformer, if any. If none, then all transformers specified in dgit.json will be included.

filename: Pattern that specifies files that must be processed by the generators selected. If none, then the default specification in dgit.json is used.

dgitcore.api.plugins_get_mgr()[source]

Get the global plugin manager

dgitcore.api.plugins_load()[source]

Load plugins from various sources:

  • dgit/plugins
  • dgit_extensions package
dgitcore.api.plugins_show(what=None, name=None, version=None, details=False)[source]

Show details of available plugins

Parameters:

what: Class of plugins e.g., backend

name: Name of the plugin e.g., s3

version: Version of the plugin

details: Show details be shown?