elektronn2.training.parallelisation module¶
-
class
elektronn2.training.parallelisation.
BackgroundProc
(target, dtypes=None, shapes=None, n_proc=1, target_args=(), target_kwargs={}, profile=False)[source]¶ Bases:
elektronn2.training.parallelisation.SharedMem
Data structure to manage repeated background tasks by reusing a fixed number of initially created background process with the same arguments at every time. (E.g. retrieving an augmented batch) Remember to call
BackgroundProc.shutdown
after use to avoid zombie process and RAM clutter.Parameters: - dtypes – list of dtypes of the target return values
- shapes – list of shapes of the target return values
- n_proc (int) – number of background procs to use
- target (callable) – target function for background proc. Can even be a method of an object, if object data is read-only (then data will not be copied in RAM and the new process is lean). If several procs use random modules, new seeds must be created inside target because they have the same random state at the beginning.
- target_args (tuple) – Proc args (constant)
- target_kwargs (dict) – Proc kwargs (constant)
- profile (Bool) – Whether to print timing results in to stdout
Examples
Use case to retrieve batches from a data structure
D
:>>> data, label = D.getbatch(2, strided=False, flip=True, grey_augment_channels=[0]) >>> kwargs = {'strided': False, 'flip': True, 'grey_augment_channels': [0]} >>> bg = BackgroundProc([np.float32, np.int16], [data.shape,label.shape], D.getbatch, n_proc=2, target_args=(2,), target_kwargs=kwargs, profile=False) >>> for i in range(100): >>> data, label = bg.get()
Bases:
elektronn2.training.parallelisation.SharedMem
FIFO Queue to process np.ndarrays in the background (also pre-loading of data from disk)
procs must accept list of
mp.Array
and make itemsnp.ndarray
usingSharedQ.shm2ndarray
, for this the shapes are required as too. The target requires the signature:>>> target(mp_arrays, shapes, *args, **kwargs)
Whereas mp_array and shape are automatically added internally
All parameters are optional:
Parameters: - n_proc (int) – If larger than 0, a message is printed if to few processes are running
- profile (Bool) – Whether to print timing results in terminal
Examples
Automatic use:
>>> Q = SharedQ(n_proc=2) >>> Q.startproc(target=, shape= args=, kwargs=) >>> Q.startproc(target=, shape= args=, kwargs=) >>> for i in range(5): >>> Q.startproc(target=, shape= args=, kwargs=) >>> item = Q.get() # starts as many new jobs as to maintain n_proc >>> dosomethingelse(item) # processes work in background to pre-fetch data for next iteration
This gets the first results in the queue and blocks until the corresponding proc has finished. If a n_proc value is defined this then new procs must be started before to avoid a warning message.
Starts a new process
procs must accept list of
mp.Array
and make itemsnp.ndarray
usingSharedQ.shm2ndarray
, or this the shapes are required as too. The target requires the signature:target(mp_arrays, shapes, *args, **kwargs)
Whereas mp_array and shape are automatically added internally