Basic Mapping¶
Tags¶
In the previous tutorial, we used HTMap like this:
[1]:
import htmap
def double(x):
return 2 * x
[2]:
mapped = htmap.map(double, range(10))
print(mapped)
doubled = list(mapped)
print(doubled)
Created map dark-puny-robe with 10 components
Map(tag = dark-puny-robe)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
In particular, we used the htmap.map function to create our map. This function creates an object that behaves a lot like the iterator returned by the built-in map function. To get our output, we iterated over it using list
.
You may have noticed that the map has a tag associated with it. HTMap generated this tag for us because we didn’t provide one, and because we didn’t provide one, marked the map as transient, as opposed to persistent. Transient maps are for quick tests where we don’t care too much about organization. Persistent maps are for longer-running maps where we want to keep our work organized by giving things real names. If you don’t plan on using your map for more than one session, you can probably get away with a transient map. If you’re going to step away from the computer and come back, we recommend giving it a real tag.
The map we created above is transient:
[3]:
print(mapped.is_transient)
True
To create a persistent map, we need to give our map our map a tag:
[4]:
another_map = htmap.map(double, range(10), tag = 'dbl')
print(another_map)
print(another_map.is_transient)
Created map dbl with 10 components
Map(tag = dbl)
False
We can also “retag” a map to give it a new tag. If you tag a transient map, it becomes persistent.
[5]:
mapped.retag('a-new-tag')
print(mapped)
print(mapped.is_transient)
Map(tag = a-new-tag)
False
Working with Maps¶
The object that was returned by htmap.map is a htmap.Map. It gives us a window into the map as it is running, and lets us use the output once the map is finished.
For example, we can print the status of the map:
[6]:
stringified = htmap.map(str, range(10), tag = 'str')
print(stringified.status())
Created map str with 10 components
Map str (10 components): HELD = 0 | ERRORED = 0 | IDLE = 10 | RUNNING = 0 | COMPLETED = 0
We can wait for the map to finish:
[7]:
stringified.wait(show_progress_bar = True)
str: 100%|##########| 10/10 [00:09<00:00, 1.11component/s]
There are many ways to iterate over maps:
[8]:
print(list(stringified))
for d in stringified:
print(d)
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
0
1
2
3
4
5
6
7
8
9
If we ever lose our reference to it, we can grab a new reference to it using htmap.load, giving it the tag of the map we want:
[9]:
new_ref = htmap.load('str')
print(new_ref)
print(new_ref == stringified)
print(new_ref is stringified) # maps are singletons
Map(tag = str)
True
True
Maps can be recovered from an entirely different Python interpreter session as well. Suppose you close Python and go on vacation. You come back and you want to look at your map again, but you’ve forgotten what you called it. Just ask HTMap for a list of your tags:
[10]:
print(htmap.get_tags())
('dbl', 'str', 'a-new-tag')
Ok, well, technically it was a tuple, but we’ll have to live with it.
HTMap can also print a pretty table showing the status of your maps:
[11]:
htmap.map(str, range(5)) # new transient map
print(htmap.status())
Created map breezy-happy-hand with 5 components
Tag HELD ERRORED IDLE RUNNING COMPLETED Local Data Max Memory Max Runtime Total Runtime
a-new-tag 0 0 0 0 10 63.9 KB 41.0 MB 0:00:00 0:00:00
dbl 0 0 0 0 10 63.9 KB 41.0 MB 0:00:00 0:00:00
str 0 0 0 0 10 63.5 KB 41.0 MB 0:00:00 0:00:00
* breezy-happy-hand 0 0 5 0 0 19.8 KB 0.0 B 0:00:00 0:00:00
Note that transient maps have a *
in front of their tags.
The status message tells us about how many components of our map are in each of the five most common component states:
Idle - component is waiting to run
Running - component is currently executing remotely
Completed - component is finished executing and output is available
Held - HTCondor has noticed a problem with the component and is not letting it run
Errored - there was an error in your code, and HTMap has brought back error information
The status of each component of your map is available using the map attribute component_statuses
:
[12]:
print(new_ref.component_statuses)
[<ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>]
We’ll discuss what to do about held and errored components and how to interact with component statuses in the Error Handling tutorial.
Tags are unique: if we try to create another map with a tag we’ve already used, it will fail:
[13]:
new_map = htmap.map(double, range(10), tag = 'dbl')
---------------------------------------------------------------------------
TagAlreadyExists Traceback (most recent call last)
<ipython-input-13-397c48e54a47> in <module>
----> 1 new_map = htmap.map(double, range(10), tag = 'dbl')
~/htmap/htmap/mapping.py in map(func, args, map_options, tag)
86 func,
87 args_and_kwargs,
---> 88 map_options = map_options,
89 )
90
~/htmap/htmap/mapping.py in create_map(tag, func, args_and_kwargs, map_options)
276
277 tags.raise_if_tag_is_invalid(tag)
--> 278 tags.raise_if_tag_already_exists(tag)
279
280 logger.debug(f'Creating map {tag} ...')
~/htmap/htmap/tags.py in raise_if_tag_already_exists(tag)
59 """Raise a :class:`htmap.exceptions.TagAlreadyExists` if the ``tag`` already exists."""
60 if tag_file_path(tag).exists():
---> 61 raise exceptions.TagAlreadyExists(f'The requested tag "{tag}" already exists. Load the Map with htmap.load("{tag}"), or remove it using htmap.remove("{tag}").')
62
63
TagAlreadyExists: The requested tag "dbl" already exists. Load the Map with htmap.load("dbl"), or remove it using htmap.remove("dbl").
As the error message indicates, if we want to re-use the tag dbl
, we need to remove the old map first:
[14]:
old_map = htmap.load('dbl')
old_map.remove()
htmap.Map.remove deletes all traces of the map. It can never be recovered. Be careful when using it!
The module-level shortcut htmap.remove lets you skip the intermediate step of getting the actual Map, if you don’t already have it.
Now we can re-use the map ID:
[15]:
new_map = htmap.map(double, range(10), tag = 'dbl')
new_map.wait(show_progress_bar = True)
print(list(new_map))
dbl: 0%| | 0/10 [00:00<?, ?component/s]
Created map dbl with 10 components
dbl: 100%|##########| 10/10 [00:07<00:00, 1.42component/s]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
Map Builders¶
So far we’ve been avoiding any functions that needed to be mapped over keyword arguments, or that had more than one positional argument. htmap.map
is not really the ideal tool for working with functions that have more than one argument, and it does not support varying more than one argument at all.
A much more ergonomic way to build up a complex map is a map builder. A map builder lets you build a map via individual function calls. Call htmap.build_map as a context manager to get the builder, then call the builder as if it were the mapped function itself:
[16]:
def power(base, exponent):
return base ** exponent
with htmap.build_map(power) as pow_builder:
for base in range(1, 5): # bases are 1, 2, 3, 4
for exponent in range(1, 4): # exponents are 1, 2, 3
pow_builder(base, exponent)
powered = pow_builder.map
print(list(powered)) # 1^1, 1^2, 1^3, 2^1, 2^2, 2^3, 3^1 ...
Created map harsh-happy-ring with 12 components
[1, 1, 1, 2, 4, 8, 3, 9, 27, 4, 16, 64]
The map builder catches the function calls and turns them into a map. The map is created when the with
block ends, at which point you can grab the actual htmap.Map from the builder’s map
attribute.
In the next tutorial, we’ll see how to tell HTMap to bring a local file along to the execute node.