Advanced Mapping

Binder

So far we’ve built our maps using the top-level mapping functions. These functions are useful for tutorials, but don’t give us the full flexibility that we might need when working with arbitrary Python functions. They’re also sometimes inconvenient to use, especially if you don’t like typing the names of your functions over and over. The tools described in this tutorial fix those problems.

Starmap

Back in Working With Files, we noted that htmap.map was only able to handle functions that took a single argument. To work with a function that took two arguments, we needed to use htmap.build_map to build up the map inside a loop.

Sometimes, you don’t want to loop. htmap.starmap provides the flexibility to completely specify the positional and keyword arguments for every component without needing an explicit for-loop.

Unfortunately, that looks like this:

[1]:
import htmap

def power(x, p = 1):
        return x ** p
[2]:
starmap = htmap.starmap(
    func = power,
    args = [
        (1,),
        (2,),
        (3,),
    ],
    kwargs = [
        {'p': 1},
        {'p': 2},
        {'p': 3},
    ],
)

print(list(starmap))  # [1, 4, 27]
Created map proper-short-stream with 3 components
[1, 4, 27]

A slightly more pleasant but less obvious way to construct the arguments would be like this:

[3]:
starmap = htmap.starmap(
    func = power,
    args = ((x,) for x in range(1, 4)),
    kwargs = ({'p': p} for p in range(1, 4)),
)

print(list(starmap))  # [1, 4, 27]
Created map light-soggy-idea with 3 components
[1, 4, 27]

But that isn’t really a huge improvement. Sometimes you’ll need the power and compactness of starmap, but we recommend htmap.build_map for general use.

Mapped Functions

If you’re tired of typing htmap.map all the time, create a htmap.MappedFunction using the htmap.mapped decorator:

[4]:
@htmap.mapped
def double(x):
    return 2 * x

print(double)
MappedFunction(func = <function double at 0x7f750c0653b0>, map_options = {})

The resulting MappedFunction has methods that correspond to all the mapping functions, but with the function already filled in.

For example:

[5]:
doubled = double.map(range(10))

print(list(doubled))  # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
Created map coy-burst-area with 10 components
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

The real utility of mapped functions is that they can carry default map options, which are inherited by any maps created from them. For example, if we know that a certain function will always need a large amount of memory and disk space, we can specify it for any map like this:

[6]:
@htmap.mapped(
    map_options = htmap.MapOptions(
        request_memory = '200MB',
        request_disk = '1GB',
    )
)
def big_list(_):
    big = list(range(1_000_000))  # imagine this is way bigger...
    return big

Now our request_memory and request_disk will be set for each map, without needing to specify it in the MapOptions of each individual map call. We can still override the setting for a certain map by manually passing htmap.MapOptions.

See htmap.MapOptions for some notes about how these inherited map options behave.

Non-Primitive Function Arguments

So far we’ve mostly limited our mapped function arguments to Python primitives like integers or strings. However, HTMap will work with almost any Python object. For example, we can use a custom class as a function argument. Maybe we have some data on penguins, and we want to write a Penguin class to encapsulate that data:

[7]:
class Penguin:
    def __init__(self, name, height, weight):
        self.name = name
        self.height = height
        self.weight = weight

    def analyze(self):
        return f'{self.name} is {self.height} inches tall and weighs {self.weight} pounds'

    def eat(self):
        print('mmm, yummy fish')

    def fly(self):
        raise TypeError("penguins can't fly!")
[8]:
penguins = [
    Penguin('Gwendolin', height = 73, weight = 51),
    Penguin('Gweniffer', height = 59, weight = 43),
    Penguin('Gary', height = 64, weight = 52),
]
[9]:
map = htmap.map(
    lambda p: p.analyze(),  # an anonmyous function; see https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions
    penguins,
    tag = 'penguin-stats',
)

map.wait(show_progress_bar = True)
penguin-stats:   0%|          | 0/3 [00:00<?, ?component/s]
Created map penguin-stats with 3 components
penguin-stats: 100%|##########| 3/3 [00:03<00:00,  1.00s/component]
[10]:
for stats in map:
    print(stats)
Gwendolin is 73 inches tall and weighs 51 pounds
Gweniffer is 59 inches tall and weighs 43 pounds
Gary is 64 inches tall and weighs 52 pounds

Specialized data structures like numpy arrays and pandas dataframes can also be used as function arguments. When in doubt, just try it!


In the next tutorial we’ll finally address the most important part of programming: what to do when things go wrong!