WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.
Tested under Python 3.x
The Python Joblib.Parallel construct is a very interesting tool to spread computation across multiple cores.
It's in cases when you need to loop over a large iterable object (list, pandas Dataframe, etc) and you think that your taks is cpu-intensive.
In rough terms, it spawns multiple Python processes and handles each part of the iterable in a separate process. Then it joins everything at the end.
Simplest possible example
from math import sqrt from joblib import Parallel, delayed # single-core code sqroots_1 = [sqrt(i ** 2) for i in range(10)] # parallel code sqroots_2 = Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(10))
A more complex example (process a large XML file)
The function must return a value
In order to update the example above to use any function, just define it and use its name:
# Python XML Processing Module import xml.etree.ElementTree as ET from joblib import Parallel, delayed FILE = 'path/to/your/file' tree = ET.parse(FILE) dataset = tree.getroot() def process_node(xml_node): # extract some information from # the xml node return 'node information' # n_jobs=1 means: use all available cores element_information = Parallel(n_jobs=-1)(delayed(process_node)(node) for node in dataset)