split_dataset package¶

Submodules¶

split_dataset.blocks module¶

class split_dataset.blocks.BlockIterator(blocks, slices=True)[source]¶: Bases: object

class split_dataset.blocks.Blocks(shape_full: Tuple, shape_block: Optional[Tuple] = None, dim_split: Optional[int] = None, blocks_number: Optional[int] = None, padding: Union[int, Tuple] = 0, crop: Optional[Tuple] = None)[source]¶

Bases: object

Blocks have two indexing systems:

linear:
cartesian: gives the position of the block in the general block tiling.

block_containing_coords(coords)[source]¶

Find the linear index of a block containing the given coordinates

Parameters: coords – a tuple of the coordinates
Returns

static block_to_slices(block)[source]¶

blocks_to_take(start_take, end_take)[source]¶: Find which blocks to take to cover the range: :param start_take: starting points in the N dims (tuple) :param end_take: ending points in the N dims (tuple) :return: tuple of tuples with the extremes of blocks to take in N dims;

starting index of data in the first block; ending index of data in the last block.

cartesian_to_linear(ca_idx)[source]¶

Convert block cartesian index in linear index. Example: in a 3D stack split in 2x2x3 blocks

self.cartesian_to_linear0,0,0) = 0 # first block bs.cartesian_to_linear(1,1,2) = 11 # last block

Parameters: ca_idx – block cartesian index (tuple of ints)
Returns: block linear index (int)

centres()[source]¶

property crop¶

drop_dim(dim_to_drop)[source]¶

Return a new BlockSplitter object with a dimension dropped, useful for getting spatial from spatio-temporal blocks.

Parameters: dim_to_drop – dimension to be dropped (int)
Returns: new BlockSplitter object

linear_to_cartesian(lin_idx)[source]¶

Convert block linear index into cartesian index. Example: in a 3D stack split in 2x2x3 blocks,

self.linear_to_cartesian(0) = (0,0,0) # first block bs.linear_to_cartesian(11) = (1,1,2) # last block :param lin_idx: block linear index (int) :return: block cartesian index (tuple of ints)

property n_blocks¶

property n_dims¶

neighbour_blocks(i_block, dims=None)[source]¶: Return neighbouring blocks across given dimensions :param i_block: :param dims: :return:

property padding¶

serialize()[source]¶: Returns a dictionary with a complete description of the BlockSplitter, e.g. to save its structure as json file. :return:

property shape_block¶

property shape_full¶

slices(as_tuples=False)[source]¶

update_block_structure()[source]¶: Update the Blocks structure, e.g. when block shape or padding are changed.

update_stack_dims()[source]¶: Update stack dimensions and cropping, if shape_full or cropping is changed. :return:

split_dataset.split_dataset module¶

class split_dataset.split_dataset.EmptySplitDataset(root, name, *args, resolution=None, **kwargs)[source]¶

Bases: split_dataset.blocks.Blocks

Class to initialize an empty dataset for which we have to save metadata after filling its blocks.

finalize()[source]¶

save_block_data(n, data, verbose=False)[source]¶: Optional method to save data in a block. Often we don’t use it, as we directly save data in the parallelized function. Might be good to find ways of centralizing saving here? :param n: n of the block we are saving in; :param data: data to be pured in the block; :param verbose: :return:

class split_dataset.split_dataset.SplitDataset(root, prefix=None)[source]¶

Bases: split_dataset.blocks.Blocks

Manages datasets split over multiple h5 file across arbitrary dimensions. To do so, uses the BlockSplitter class functions, and define blocks as files.

apply_crop(crop)[source]¶: Take out the data with a crop

as_dask()[source]¶: Function to create a Dask array from a split dataset. :param dataset: SplitDataset object :return: Dask array

property data_key¶: To migrate smoothly to removal of stack_ND key in favour of only stack

split_dataset.split_dataset.save_to_split_dataset(data, root_name, block_size=None, crop=None, padding=0, prefix='', compression='blosc')[source]¶: Function to save block of data into a split_dataset.

Module contents¶

Top-level package for Split Dataset.