Parallel processing with aa
aa has the facility to run multiple parts of your analysis at the same time in parallel. It uses coarse grain parallelism: different instances of modules execute simultaneously, but there is no attempt to subdivide single modules. A multiple subject analysis speeds up by a factor equal to the number of subjects or higher. The precise speed increase depends on the number of jobs you are allocated, which is determined by the memory, processor and Matlab license load on the Linux system.
Each module executes on part of the data, as specified by its “domain”. So, for example, a module with the domain “subject” is run independently and in parallel on each subject. A module with the domain “session” is run separately on each session.
In early 2013 the parallel engine was substantially rewritten. Previously, domains were restricted to “study”, “subject” and “session”. Following the re-writing, new domains can be added fairly easily with a little coding. The domain structure is defined by a tree structure stored in the XML file aap_parameters_defaults.xml, and the properties of this domain through extensions to two files (aas_getN_bydomain, aas_getdirectory_bydomain). Extensions implemented so far support hyperalignment searchlights, and split-session cross validation folds.
aa has been used with a multiple parallel engines – Condor (native implementation) and Torque using Fieldtrip’s matlab qsub command. To choose, in your User Script add one of:
aap.options.wheretoprocess='localsingle'; % single threaded running on local machine aap.options.wheretoprocess='matlab_pct'; % use matlab's parallel computing toolbox aap.options.wheretoprocess='condor'; % use Condor cluster system aap.options.wheretoprocess='qsub'; % uses CBSU's code aap.options.wheretoprocess='aws'; % run on Amazon Web Services (enquire if you're interested)
Multiple modules are run simultaneously where possible. Within a module, there is no parallel execution. Part of the AA module definition specifies whether a module is run once per study, once per subject, or once per session. This affects parallel scheduling as shown in the table.
|Domain||When run in parallel||Benefit|
|Session||Always||Any time there are multiple sessions|
|Subject||When multiple subjects are being processed||Any time there are multiple subjects|
|Study||If multiple study-level stages are marked as executing simultanously||Not in standard recipes at present|
Most processing stages wait for the previous stage to complete before executing. However, some stages can execute before this. For example, realignment and tsdiffana can both execute together as soon as the dicom-to-nifti conversion of the EPIs is complete.
In aa version 4, the order of parallel processing, and which items may execute simultaneously, is calculated using the data streams. Where one module takes data from another, it must wait for it to complete. Otherwise, there is no interaction, and no need for it to wait.
You no need to specify “tobecompletedfirst” fields in the XML or your user script, as each module automatically connects to the previous one that exported the relevant data type. Where you wish one set of code to execute before another, we recommend the use of branches in the XML (see “branched analyses”).
However, if you different streams to a module to come from different prior input (say to compare the EPIs before and after realignment) then you can qualify the stream names, like this. Instead of
<inputstreams> <stream><name>epi</name></stream> </inputstreams>
do lines like this…
<inputstreams> <stream><name>aamod_realign.epi</name></stream> </inputstreams>
Again, tasks with the same dependencies can be run in parallel. More importantly, the exact form of the dependency depends on the domain of each of the stages:
- If a stage is executed once-per-study, it will wait for all subjects/sessions from the stage it is dependent on to completed.
- If it is executed once-per-subject, each subject will be executed as soon as all of the sessions from this subject of the stage to be completed are available.
- If it is executed once-per-session, it will execute as soon as the session is completed from the stage it is dependent on.
You will get the best performance if your worker jobs are distributed across machines, and if those machines have low load. If you already have many SPM jobs open and have a limited number of Matlab licenses (one necessary for each machine you are using), you will be restricted to this selection. This makes it more likely that your workers will be allocated to the same machine, and that this machine will not be the least loaded available. You will get better performance in general if you clear out your old jobs with the following command before starting SPM to run a parallel job:
You may wrap up your own code as an aa module, which has a low overhead (approximately 10 additional lines). aa will then happily schedule them to run in parallel.
When writing modules, if possible, it is good practice to make them execute at the session rather than subject level, as this allows greater parallism. For this reason, aamod_smoooth and aamod_normwrite have been modified to run once per session rather than once per subject.