Command line
The ds-format Python package provides a command ds for reading, writing and
modifying data files.
The command should be run in a shell such as Bash. zsh (the default shell on
macOS) is not well supported due to the fact that curly brackets ({, }) are
interpreted as special characters in this shell. zsh can be used if curly
brackets are escaped with a backslash character (\).
On Unix-like operating systems, manual pages are available for the commands
with man ds and man ds cmd. Note that you might have to add the manual
page path (usually $HOME/.local/share/man/) to the MANPATH environment
variable in order to access the manual pages.
Synopsis
ds
Tool for reading, writing and modifying dataset files.
Usage:
ds [cmd [args]] [options] [read_options] [write_options]
ds --help [cmd]
ds --version
The command line interface is based on the PST format. In all commands, variable, dimension and attribute names are interpreted as glob patterns, unless the -F option is enabled. Note that the pattern has to be enclosed in quotes in order to prevent the shell from interpreting the glob.
Arguments:
- cmd: Command to execute or show help for. If omitted,
dsis a shorthand for the commandls, with a difference that files with the same name as any available command cannot be listed. Available commands are listed below. - args: Command arguments and options.
Options:
-F: Interpret variable, dimension and attribute names as fixed strings instead of glob patterns.--help: Show this help message or help for a command if cmd is supplied.-j: Print command output as JSON instead of PST.-m: Moderate error handling mode. Report a warning on missing variables, dimensions and attributes. Overrides the DS_MODE environment variable.--noindent: Disable output indentation.-s: Strict error handling mode. Handle missing variables, dimensions and attributes as errors. Overrides the DS_MODE environment variable.-t: Soft error handling mode. Ignore missing variables, dimensions and attributes. Overrides the DS_MODE environment variable.-v: Be verbose. Print more detailed information and error messages.--version: Print the version number and exit.
Available commands:
attrs: Print attributes in a dataset.cat: Print variable data.dim: Print dimension size.dims: Print dimensions of a dataset or a variable.ls: List variables.merge: Merge files along a dimension.meta: Print dataset metadata.rename: Rename variables and attributes.rename_dim: Rename a dimension.rm: Remove variables or attributes.select: Select and subset variables.set: Set variable data, dimensions and attributes in an existing or new dataset.size: Print variable size.stats: Print variable statistics.type: Print variable type.
Supported input formats:
- CSV/TSV:
.csv,.tsv,.tab - DS:
.ds - HDF5:
.h5,.hdf5,.hdf - JSON:
.json - NetCDF4:
.nc,.nc4,.nc3,.netcdf
Supported output formats:
- CSV/TSV:
.csv,.tsv,.tab - DS:
.ds - HDF5:
.h5,.hdf5,.hdf - JSON:
.json - NetCDF4:
.nc,.nc4,.netcdf
Read options:
at:selector: At selector as{var:value …}or{var:{value…}…}, where value is the value of the variable var to select. The dimension indexes corresponding the variable are constrained so that a variable value closest to the value is selected.between:selector: Between selector as{var: {start end}…}, where start is the start value of the variable var, and end is the end value. The dimension indexes corresponding to the variable are constrained so that variable values in the range are selected. If the value isnone, the range start or end is unlimited. The range start is inclusive (closed), and the end is exclusive (open).range:selector: Range selector as{dim: {start end}…}, where start is the start index of the dimension dim, and end is the end index. If the index isnone, the range is from the start or to the end of the dimension, respectively. Negative index values are counted from the end of the dimension. The range start is inclusive (closed), and the end is exclusive (open).sel:selector: Selector as dim:idx pairs, where dim is a dimension name and idx is an index or a list of indexes as{i…}.
Write options:
calendar:value: CF-Conventions calendar to use for time variables when writing NetCDF4 and HDF5 files.time_units:value: CF-Conventions units to use for time variables when writing NetCDF4 and HDF5 files.
Environment variables:
- DS_MODE: Error handling mode. If “strict”, handle missing variables, dimensions and attributes as errors. If “moderate”, report a warning. If “soft”, ignore missing items.
Commands
| Command | Description |
|---|---|
| attrs | Print attributes in a dataset. |
| cat | Print variable data. |
| dim | Print dimension size. |
| dims | Print dimensions of a dataset or a variable. |
| ls | List variables. |
| merge | Merge files along a dimension. |
| meta | Print dataset metadata. |
| rename | Rename variables and attributes. |
| rename_dim | Rename a dimension. |
| rm | Remove variables or attributes. |
| select | Select and subset variables. |
| set | Set variable data, dimensions and attributes in an existing or new dataset. |
| size | Print a variable size. |
| stats | Print variable statistics. |
| type | Print a variable type. |
attrs
Print attributes in a dataset.
Usage: ds attrs [options] [var] [attr] [--] input
The output is formatted as PST.
Arguments:
- var: Variable name or
noneto print a dataset attribute attr. If omitted, print all dataset attributes. - attr: Attribute name.
- input: Input file.
- options: See help for ds for global options.
Examples:
Print dataset attributes in dataset.nc.
$ ds attrs dataset.nc
title: "Temperature data"
Print attributes of a variable temperature in dataset.nc.
$ ds attrs temperature dataset.nc
long_name: temperature units: celsius
Print a dataset attribute title.
$ ds attrs none title dataset.nc
"Temperature data"
Print an attribute units of a variable temperature.
$ ds attrs temperature units dataset.nc
celsius
cat
Print variable data.
Usage: ds cat [options] var… [--] input
Data are printed by the first index, one item per line, formatted as PST-formatted. If multiple variables are selected, items at a given index from all variables are printed on the same line as an array. The first line is a header containing a list of variables. Missing values are printed as empty rows (if printing one single dimensional variable) or as none.
Arguments:
- var: Variable name.
- input: Input file.
- options: See help for ds for global options.
Options:
-h: Print human-readable values (time as ISO 8601).--jd: Convert time variables to Julian date (see Aquarius Time).-n: Do not print header.
Examples:
Print temperature values in dataset.nc.
$ ds cat temperature dataset.nc
temperature
16.000000
18.000000
21.000000
Print time and temperature values in dataset.nc.
$ ds cat time temperature dataset.nc
time temperature
1 16.000000
2 18.000000
3 21.000000
dim
Print dimension size.
Usage: ds dim [options] dim [--] input
Arguments:
- dim: Dimension name.
- input: Input file.
- options: See help for ds for global options.
Examples:
Print size of the dimension time in dataset dataset.nc.
$ ds dim time dataset.nc
3
dims
Print dimensions of a dataset or a variable.
Usage: ds dims [options] [var] [--] input
Arguments:
- var: Variable to print dimensions of.
- input: Input file.
- options: See help for ds for global options.
Options:
-z,--size: If var is defined, print the size of dimensions as an object instead of an array of dimensions. The order is not guaranteed.
Examples:
Print dimensions of a dataset.
$ ds dims dataset.nc
time
Print dimensions of the variable temperature.
$ ds dims temperature dataset.nc
time
ls
List variables.
Usage: ds [ls] [options] [var]… [--] input
Lines in the output are formatted as PST.
Arguments:
- var: Variable name to list.
- input: Input file.
- options: See help for ds for global options.
Options:
-l: Print a detailed list of variables (name, type and an array of dimensions), preceded with a line with dataset dimensions.a:attrs: Print variable attributes after the variable name and dimensions. attrs can be a string or an array.
Examples:
Print a list of variables in dataset.nc.
$ ds ls dataset.nc
temperature
time
Print a detailed list of variables in dataset.nc.
$ ds ls -l dataset.nc
time: 3
temperature float64 { time }
time int64 { time }
Print a list of variables with an attribute units.
$ ds ls dataset.nc a: units
temperature celsius
time s
Print a list of variables with attributes long_name and units.
$ ds ls dataset.nc a: { long_name units }
temperature temperature celsius
time time s
Print all variables matching a glob “temp*” in dataset.nc.
$ ds ls 'temp*' dataset.nc
temperature
merge
Merge datasets along a dimension.
Usage: ds merge [options] dim [--] input… output
Merge datasets along a dimension dim. If the dimension is not defined in the dataset, merge along a new dimension dim. If new is none and dim is not new, variables without the dimension dim are set with the first occurrence of the variable. If new is not none and dim is not new, variables without the dimension dim are merged along a new dimension new. If variables is not none, only those variables are merged along a new dimension, and other variables are set to the first occurrence of the variable. Variables which are merged along a new dimension and are not present in all datasets have their subsets corresponding to the datasets where they are missing filled with missing values. Dataset and variable metadata are merged sequentially from all datasets, with metadata from later datasets overriding metadata from the former ones.
Arguments:
- dim: Name of a dimension to merge along.
- input: Input file.
- output: Output file.
- options: See help for ds for global options.
Options:
new:value: Name of a new dimension ornone.variables:{value…}|none: Variables to merge along a new dimension ornonefor all variables.jd:value: Iftrue, convert time to Julian date when merging time variables with unequal units. Iffalse, merge time variables as is. Default:true.
Examples:
Write example data to dataset1.nc.
$ ds set { time none time { 1 2 3 } long_name: time units: s } { temperature none time { 16. 18. 21. } long_name: temperature units: celsius } title: "Temperature data" none dataset1.nc
Write example data to dataset2.nc.
$ ds set { time none time { 4 5 6 } long_name: time units: s } { temperature none time { 23. 25. 28. } long_name: temperature units: celsius } title: "Temperature data" none dataset2.nc
Merge dataset1.nc and dataset2.nc and write the result to dataset.nc.
$ ds merge time dataset1.nc dataset2.nc dataset.nc
Print time and temperature variables in dataset.nc.
$ ds cat time temperature dataset.nc
time temperature
1 16.000000
2 18.000000
3 21.000000
4 23.000000
5 25.000000
6 28.000000
meta
Print dataset metadata.
Usage: ds meta [options] [var] [--] input
The output is formatted as PST.
Arguments:
- input: Input file.
- var: Variable name to print metadata for or “.” to print dataset metadata. If not specified, print metadata for the whole file.
- options: See help for ds for global options.
Examples:
Print metadata of dataset.nc.
$ ds meta dataset.nc
.: {{
title: "Temperature data"
}}
time: {{
long_name: time
units: s
.dims: { time }
.size: { 3 }
}}
temperature: {{
long_name: temperature
units: celsius
.dims: { time }
.size: { 3 }
}}
rename
Rename variables and attributes.
Usage:
ds [options] rename vars [--] input output
ds [options] rename var attrs [--] input output
ds [options] rename { var attrs }… [--] input output
Arguments:
- var: Variable name, or an array of variable names whose attributes to rename, or
noneto rename dataset attributes. - vars: Pairs of old and new variable names as oldvar
:newvar. If newvar isnone, remove the variable. - attrs: Pairs of old and new attribute names as oldattr
:newattr. If newattr isnone, remove the attribute. - input: Input file.
- output: Output file.
- options: See help for ds for global options. Note that with this command options can only be supplied before the command name or at the end of the command line.
Examples:
Rename variables time to newtime and temperature to newtemperature in dataset.nc and save the output in output.nc.
$ ds rename time: newtime temperature: newtemperature dataset.nc output.nc
Rename a dataset attribute title to newtitle in dataset.nc and save the output in output.nc.
$ ds rename none title: newtitle dataset.nc output.nc
Rename an attribute units of a variable temperature to newunits in dataset.nc and save the output in output.nc.
$ ds rename temperature units: newunits dataset.nc output.nc
rename_dim
Rename a dimension.
Usage:
ds [options] rename_dim dims [--] input output
Arguments:
- dims: Pairs of old and new dimension names as olddim
:newdim. - input: Input file.
- output: Output file.
- options: See help for ds for global options. Note that with this command options can only be supplied before the command name or at the end of the command line.
Examples:
Rename dimension time to newtime in dataset.nc and save the output in output.nc.
$ ds -l dataset.nc
time: 3
temperature
time
$ ds rename_dim time: newtime dataset.nc output.nc
$ ds -l output.nc
newtime: 3
temperature
time
rm
Remove variables or attributes.
Usage:
ds rm [options] var [--] input output
ds rm [options] var attr [--] input output
Arguments:
- var: Variable name, an array of variable names or
noneto remove a dataset attribute. - attr: Attribute name or an array of attribute names.
- input: Input file.
- output: Output file.
- options: See help for ds for global options.
Examples:
Remove a variable temperature in dataset.nc and save the output in output.nc.
$ ds rm temperature dataset.nc output.nc
Remove variables time and temperature in dataset.nc and save the output in output.nc.
$ ds rm { time temperature } dataset.nc output.nc
Remove a dataset attribute title in dataset.nc and save the output in output.nc.
$ ds rm none title dataset.nc output.nc
Remove an attribute units of a variable temperature in dataset.nc and save the output in output.nc.
$ ds rm temperature units dataset.nc output.nc
select
Select and subset variables.
Usage: ds [options] select [var…] [sel] [--] input output
select can also be used to convert between different file formats (ds select input output).
Arguments:
- var: Variable name.
- sel: The same as the
selread option (see help for ds). - input: Input file.
- output: Output file.
- options: See help for ds for global options. Note that with this command options can only be supplied before the command name or at the end of the command line.
Examples:
Write data to dataset.nc.
$ ds set { time none time { 1 2 3 } long_name: time units: s } { temperature none time { 16. 18. 21. } long_name: temperature units: celsius } title: "Temperature data" none dataset.nc
List variables in dataset.nc.
$ ds dataset.nc
temperature
time
Select variable temperature from dataset.nc and write to temperature.nc.
$ ds select temperature dataset.nc temperature.nc
List variables in temperature.nc.
$ ds temperature.nc
temperature
Subset by time index 0 and write to 0.nc.
$ ds select time: 0 dataset.nc 0.nc
Print variables time and temperature in 0.nc.
$ ds cat time temperature 0.nc
time temperature
1 16.000000
Convert dataset.nc to JSON.
$ ds select dataset.nc dataset.json
$ cat dataset.json
{"time": [1, 2, 3], "temperature": [16.0, 18.0, 21.0], ".": {".": {"title": "Temperature data"}, "time": {"long_name": "time", "units": "s", ".dims": ["time"], ".size": [3]}, "temperature": {"long_name": "temperature", "units": "celsius ".dims": ["time"], ".size": [3]}}}
set
Set variable data, dimensions and attributes in an existing or new dataset.
Usage:
ds [options] set ds_attrs [--] input output
ds [options] set var [type [dims [data]]] [attrs]… [--] input output
ds [options] set { var [type [dims [data]]] [attrs]… }… ds_attrs [--] input output
Arguments:
- var: Variable name.
- type: Variable type (
str), ornoneto keep the original type if data is not supplied or autodetect based on data if data is supplied. One of:float32andfloat64(32-bit and 64-bit floating-point number, resp.),int8,int16,int32andint64(8-bit, 16-bit, 32-bit and 64-bit integer, resp.),uint8,uint16,uint32anduint64(8-bit, 16-bit, 32-bit and 64-bit unsigned integer, resp.),bool(boolean),str(string) andunicode(Unicode). - dims: Variable dimension name (if single), an array of variable dimensions (if multiple),
noneto keep original dimension or autogenerate if a new variable, or{ }to autogenerate new dimension names. - data: Variable data. This can be a PST-formatted scalar or an array.
nonevalues are interpreted as missing values. - attrs: Variable attributes or dataset attributes if var is
noneas attr:value pairs. - ds_attrs: Dataset attributes as attr
:value pairs. - input: Input file or
nonefor a new file to be created. - output: Output file.
- options: See help for ds for global options. Note that with this command options can only be supplied before the command name or at the end of the command line.
Examples:
Write variables time and temperature to dataset.nc.
$ ds set { time none time { 1 2 3 } long_name: time units: s } { temperature none time { 16. 18. 21. } long_name: temperature units: celsius } title: "Temperature data" none dataset.nc
Set data of a variable temperature to an array of 16.0, 18.0, 21.0 in dataset.nc and save the output in output.nc.
$ ds set temperature none none { 16. 18. 21. } dataset.nc output.nc
Set a dimension of a variable temperature to time, data to an array of 16.0, 18.0, 21.0, its attribute long_name to “temperature” and units to “celsius” in dataset.nc and save the output in output.nc.
$ ds set temperature none time { 16. 18. 21. } long_name: temperature units: celsius dataset.nc output.nc
Set multiple variables in dataset.nc and save the output in output.nc.
$ ds set { time none time { 1 2 3 } long_name: time units: s } { temperature none time { 16. 18. 21. } long_name: temperature units: celsius } title: "Temperature data" dataset.nc output.nc
Set a dataset attribute newtitle to New title in dataset.nc and save the output in output.nc.
$ ds set newtitle: "New title" dataset.nc output.nc
Set an attribute newunits of a variable temperature to K in dataset.nc and save the output in output.nc.
$ ds set temperature newunits: K dataset.nc output.nc
size
Print a variable size.
Usage: ds size [options] var [–] input
Arguments:
- var: Variable to print the size of.
- input: Input file.
- options: See help for ds for global options.
Examples:
Print the size of a variable temperature in a dataset dataset.nc.
$ ds size temperature dataset.nc
3
stats
Print variable statistics.
Usage: ds stats [options] var [--] input
NaNs are ignored in all statistics except for count. The output is formatted as PST.
Arguments:
- var: Variable name.
- input: Input file.
- options: See help for ds for global options.
Output description:
count: Number of array elements.max: Maximum value.mean: Sample mean.median: Sample median.min: Minimum value.std: Standard deviation.p68: 68% confidence interval calculated using percentiles.p95: 95% confidence interval calculated using percentiles.p99: 99% confidence interval calculated using percentiles.
Examples:
Print statistics of variable temperature in dataset.nc.
$ ds stats temperature dataset.nc
count: 3 min: 16.000000 max: 21.000000 mean: 18.333333 median: 18.000000 std: 2.054805 p68: { 16.640000 20.040000 } p95: { 16.100000 20.850000 } p99: { 16.020000 20.970000 }
type
Print a variable type.
Usage: ds type [options] var [--] input
Arguments:
- var: Variable to print the type of.
- input: Input file.
- options: See help for ds for global options.
Examples:
Print the type of a variable temperature in a dataset dataset.nc.
$ ds type temperature dataset.nc
float64