Dataset API¶
Example:
from dgitcore import api
# Load/upload profile, load plugins etc.
api.initialize()
repo = api.datasets.lookup('pingali',
'simple-regression-rawdata')
r = repo.get_resource('demo-input.csv')
df = pd.read_csv(r['localfullpath'])
...
-
dgitcore.api.
shellcmd
(repo, args)[source]¶ Run a shell command within the repo’s context
Parameters: repo: Repository object
args: Shell command
-
dgitcore.api.
log
(repo, args=[])[source]¶ Log of the changes executed until now
Parameters: repo: Repository object
args: Arguments to git command
-
dgitcore.api.
show
(repo, args=[])[source]¶ Show commit details
Parameters: repo: Repository object
args: Arguments to git command
-
dgitcore.api.
push
(repo, args=[])[source]¶ Push changes to the backend
Parameters: repo: Repository object
args: Arguments to git command
-
dgitcore.api.
pull
(repo, args=[])[source]¶ Pull changes from the backend
Parameters: repo: Repository object
args: Arguments to git command
-
dgitcore.api.
commit
(repo, args=[])[source]¶ Commit changes to the data repository
Parameters: repo: Repository object
args: Arguments to git command
-
dgitcore.api.
stash
(repo, args=[])[source]¶ Stash the changes
Parameters: repo: Repository object
args: Arguments to git command
-
dgitcore.api.
drop
(repo, args=[])[source]¶ Drop the repository (new to dgit)
Parameters: repo: Repository object
args: Arguments to git command
-
dgitcore.api.
status
(repo, args=[])[source]¶ Show status of the repo
Parameters: repo: Repository object (result of lookup)
details: Show internal details of the repo
args: Parameters to be passed to git status command
-
dgitcore.api.
post
(repo, args=[])[source]¶ Post to metadata server
Parameters: repo: Repository object (result of lookup)
-
dgitcore.api.
clone
(url)[source]¶ Clone a URL. Examples include:
- git@github.com:pingali/dgit.git
- https://github.com:pingali/dgit.git
- s3://mybucket/git/pingali/dgit.git
Parameters: url: URL of the repo
-
dgitcore.api.
init
(username, reponame, setup, force=False, options=None, noinput=False)[source]¶ Initialize an empty repository with datapackage.json
Parameters: username: Name of the user
reponame: Name of the repo
setup: Specify the ‘configuration’ (git only, git+s3 backend etc)
force: Force creation of the files
options: Dictionary with content of dgit.json, if available.
noinput: Automatic operation with no human interaction
-
dgitcore.api.
diff
(repo, args=[])[source]¶ Diff between versions
Parameters: repo: Repository object
args: Arguments to git command
-
dgitcore.api.
remote
(repo, args=[])[source]¶ Show remote
Parameters: repo: Repository object
args: Arguments to git command
-
dgitcore.api.
add
(repo, args, targetdir, execute=False, generator=False, includes=[], script=False, source=None)[source]¶ Add files to the repository by explicitly specifying them or by specifying a pattern over files accessed during execution of an executable.
Parameters: repo: Repository
args: files or command line
(a) If simply adding files, then the list of files that must be added (including any additional arguments to be passed to git (b) If files to be added are an output of a command line, then args is the command lined
targetdir: Target directory to store the files
execute: Args are not files to be added but scripts that must be run.
includes: patterns used to select files to
script: Is this a script?
generator: Is this a generator
source: Link to the original source of the data
-
dgitcore.api.
validate
(repo, validator_name=None, filename=None, rulesfiles=None, args=[])[source]¶ Validate the content of the files for consistency. Validators can look as deeply as needed into the files. dgit treats them all as black boxes.
Parameters: repo: Repository object
validator_name: Name of validator, if any. If none, then all validators specified in dgit.json will be included.
filename: Pattern that specifies files that must be processed by the validators selected. If none, then the default specification in dgit.json is used.
rules: Pattern specifying the files that have rules that validators will use
show: Print the validation results on the terminal
Returns: status: A list of dictionaries, each with target file processed, rules file applied, status of the validation and any error message.
-
dgitcore.api.
auto_init
(autofile, force_init=False)[source]¶ Initialize a repo-specific configuration file to execute dgit
Parameters: autofile: Repo-specific configuration file (dgit.json)
force_init: Flag to force to re-initialization of the configuration file
-
dgitcore.api.
auto_get_repo
(autooptions, debug=False)[source]¶ Automatically get repo
Parameters: autooptions: dgit.json content
-
dgitcore.api.
transform
(repo, name=None, filename=None, force=False, args=[])[source]¶ Materialize queries/other content within the repo.
Parameters: repo: Repository object
name: Name of transformer, if any. If none, then all transformers specified in dgit.json will be included.
filename: Pattern that specifies files that must be processed by the generators selected. If none, then the default specification in dgit.json is used.