HTMap¶
HTMap is a library that wraps the process of mapping Python function calls out to an HTCondor pool. It provides tools for submitting, managing, and processing the output of arbitrary functions.
Our goal is to provide as transparent an interface as possible to high-throughput computing resources so that you can spend more time thinking about your own code, and less about how to get it running on a cluster.
Running a map over a Python function is as easy as
import htmap
def double(x):
return 2 * x
doubled = list(htmap.map(double, range(10)))
print(doubled)
# [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
If you’re just getting started, jump into the first tutorial: First Steps.
Happy mapping!
- Installation
Installing HTMap
Note
Bug reports and feature requests should go on our GitHub issue tracker.
- Tutorials
Tutorials on using HTMap.
- Dependency Management
Information about how to manage your what your code depends on (e.g., other Python packages).
- API Reference
Public API documentation.
- CLI Reference
Use of the HTMap CLI.
- Using HTCondor with HTMap
Tips on using HTMap with HTCondor
- Tips and Tricks
Useful tips & tricks on the API.
- FAQ
These questions are asked, sometimes frequently.
- Settings
Documentation for the various settings.
- Version History
New features, bug fixes, and known issues by version.
- Contributing and Developing
How to contribute to HTMap, how to set up a development environment, how HTMap works under the hood, etc.
Installation¶
On Unix/Linux systems, running
pip install htmap
from the command line should suffice.On Windows, there’s an added dependency of HTCondor (to get access to the HTCondor Python bindings). After that, use the
pip install --no-deps
.
The introductory tutorials can be run on Binder, requiring no setup on your part.
Basic usage only requires installation of HTMap “submit-side”. Anything more advanced like checkpointing or output file transfers will require installation on the execute nodes. For more information and to ensure your code will run correctly execute-side see Dependency Management.
You may need to append --user
to the pip
command if you do not have
permission to install packages directly into the
Python installation you are using.
Recent versions of pip
will do this automatically when necessary.
Tutorials¶
Attention
The most convenient way to go through these tutorials is through Binder, which requires no setup on your part:
- First Steps
If this is your first time using HTMap, start here!
- Basic Mapping
An introduction to the basics of HTMap.
- Working with Files
Sending additional files with your maps.
- Map Options
How to tell the pool what to do with your map.
- Advanced Mapping
More (and better) ways to create maps.
- Error Handling
What do when something goes wrong.
First Steps¶
Setup¶
The fastest and easiest way to make sure you have a working setup (as described below) is to go through these tutorials on Binder
The second-easiest way is to run the tutorials in a Docker container on your computer. Run
docker run -p 8888:8888 htcondor/htmap-tutorials
and follow the instructions it gives you to get into the Jupyter environment. Then go to tutorials/first-steps.ipynb
in the file browser and open it to get back to this point.
Alternatively, you might want to immediately start running HTMap on your HTCondor pool. This tutorial assumes that you’ve already installed HTMap on your HTCondor pool’s submit node, or have access to HTMap through a JupyterHub server connected to an HTCondor pool or similar. See How do I install HTMap? for details!
This tutorial also assumes that you’re working in a Jupyter Notebook. It will work just as well in the Python REPL. Later, once you get a hang things, you’ll be ready to use HTMap in scripts as well. Either way, you’ll need to be on a computer that can submit jobs to an HTCondor pool.
This tutorial assumes that you have already set up your dependency management, as described in Dependency Management. If your HTCondor pool supports Docker, you’ll be good to go with the default settings.
The tutorials in this series are written inside Juypter Notebooks. If you click the “View page source” link in the upper right corner, you’ll be able to grab the raw .ipynb
file yourself and step through it along with the tutorial.
The Problem¶
Suppose you’ve been given the task of writing a function that doubles numbers, like this:
[1]:
def double(x):
return 2 * x
If you want to double a list of numbers, you might do something like
[2]:
doubled = [double(x) for x in range(10)]
print(doubled)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
or we can use the built-in function map()
, which applies a function to each element of an iterable (like a list):
[3]:
mapped = map(double, range(10))
print(mapped)
doubled = list(mapped)
print(doubled)
<map object at 0x7f7ae8393390>
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
In both cases, doubled
is the list [0, 2, 4, ...]
. The reason we need the list
call is that map
actually returns an iterator over the results, not the results themselves. So you need to iterate over it to get the output, which is what list
does: iterate over its input and put the elements in a list.
Now suppose that, for some reason, you want to double a lot of numbers. So many numbers that you can’t bear to do all the work on your own computer. It takes days to multiply all the numbers, and if your program crashes halfway through, you lose all of of your progress and have to start over. You’re losing sleep, and your boss is breathing down your neck because they need those numbers doubled now.
Luckily, you remember that you have access to an HTCondor high-throughput computing pool. Since each of your function calls is isolated from all the others, the computers in the pool don’t need to talk to each other at all, and you can achieve a huge speedup. The pool can run your code on hundreds or thousands of computers simultaneously, storing the inputs and outputs for you and recovering from individual errors gracefully. It’s the perfect solution.
The problem is: how do you get your code running in the pool?
The Solution¶
With HTMap, it’s like this:
[4]:
import htmap
mapped = htmap.map(double, range(10))
print(mapped)
doubled = list(mapped)
print(doubled)
Created map super-busy-dog with 10 components
Map(tag = super-busy-dog)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
It may take some time for the second print
to run. During that time, the individual components of your map are being run out on the cluster on execute nodes. Once they all finish, you’ll get the list of numbers back. As you can see, the output is identical to what you would get from running the function locally.
In the next tutorial we’ll start digging into the extra features that HTMap provides on top of this basic functionality.
Basic Mapping¶
Tags¶
In the previous tutorial, we used HTMap like this:
[1]:
import htmap
def double(x):
return 2 * x
[2]:
mapped = htmap.map(double, range(10))
print(mapped)
doubled = list(mapped)
print(doubled)
Created map dark-puny-robe with 10 components
Map(tag = dark-puny-robe)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
In particular, we used the htmap.map function to create our map. This function creates an object that behaves a lot like the iterator returned by the built-in map function. To get our output, we iterated over it using list
.
You may have noticed that the map has a tag associated with it. HTMap generated this tag for us because we didn’t provide one, and because we didn’t provide one, marked the map as transient, as opposed to persistent. Transient maps are for quick tests where we don’t care too much about organization. Persistent maps are for longer-running maps where we want to keep our work organized by giving things real names. If you don’t plan on using your map for more than one session, you can probably get away with a transient map. If you’re going to step away from the computer and come back, we recommend giving it a real tag.
The map we created above is transient:
[3]:
print(mapped.is_transient)
True
To create a persistent map, we need to give our map our map a tag:
[4]:
another_map = htmap.map(double, range(10), tag = 'dbl')
print(another_map)
print(another_map.is_transient)
Created map dbl with 10 components
Map(tag = dbl)
False
We can also “retag” a map to give it a new tag. If you tag a transient map, it becomes persistent.
[5]:
mapped.retag('a-new-tag')
print(mapped)
print(mapped.is_transient)
Map(tag = a-new-tag)
False
Working with Maps¶
The object that was returned by htmap.map is a htmap.Map. It gives us a window into the map as it is running, and lets us use the output once the map is finished.
For example, we can print the status of the map:
[6]:
stringified = htmap.map(str, range(10), tag = 'str')
print(stringified.status())
Created map str with 10 components
Map str (10 components): HELD = 0 | ERRORED = 0 | IDLE = 10 | RUNNING = 0 | COMPLETED = 0
We can wait for the map to finish:
[7]:
stringified.wait(show_progress_bar = True)
str: 100%|##########| 10/10 [00:09<00:00, 1.11component/s]
There are many ways to iterate over maps:
[8]:
print(list(stringified))
for d in stringified:
print(d)
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
0
1
2
3
4
5
6
7
8
9
If we ever lose our reference to it, we can grab a new reference to it using htmap.load, giving it the tag of the map we want:
[9]:
new_ref = htmap.load('str')
print(new_ref)
print(new_ref == stringified)
print(new_ref is stringified) # maps are singletons
Map(tag = str)
True
True
Maps can be recovered from an entirely different Python interpreter session as well. Suppose you close Python and go on vacation. You come back and you want to look at your map again, but you’ve forgotten what you called it. Just ask HTMap for a list of your tags:
[10]:
print(htmap.get_tags())
('dbl', 'str', 'a-new-tag')
Ok, well, technically it was a tuple, but we’ll have to live with it.
HTMap can also print a pretty table showing the status of your maps:
[11]:
htmap.map(str, range(5)) # new transient map
print(htmap.status())
Created map breezy-happy-hand with 5 components
Tag HELD ERRORED IDLE RUNNING COMPLETED Local Data Max Memory Max Runtime Total Runtime
a-new-tag 0 0 0 0 10 63.9 KB 41.0 MB 0:00:00 0:00:00
dbl 0 0 0 0 10 63.9 KB 41.0 MB 0:00:00 0:00:00
str 0 0 0 0 10 63.5 KB 41.0 MB 0:00:00 0:00:00
* breezy-happy-hand 0 0 5 0 0 19.8 KB 0.0 B 0:00:00 0:00:00
Note that transient maps have a *
in front of their tags.
The status message tells us about how many components of our map are in each of the five most common component states:
Idle - component is waiting to run
Running - component is currently executing remotely
Completed - component is finished executing and output is available
Held - HTCondor has noticed a problem with the component and is not letting it run
Errored - there was an error in your code, and HTMap has brought back error information
The status of each component of your map is available using the map attribute component_statuses
:
[12]:
print(new_ref.component_statuses)
[<ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>, <ComponentStatus.COMPLETED: 'COMPLETED'>]
We’ll discuss what to do about held and errored components and how to interact with component statuses in the Error Handling tutorial.
Tags are unique: if we try to create another map with a tag we’ve already used, it will fail:
[13]:
new_map = htmap.map(double, range(10), tag = 'dbl')
---------------------------------------------------------------------------
TagAlreadyExists Traceback (most recent call last)
<ipython-input-13-397c48e54a47> in <module>
----> 1 new_map = htmap.map(double, range(10), tag = 'dbl')
~/htmap/htmap/mapping.py in map(func, args, map_options, tag)
86 func,
87 args_and_kwargs,
---> 88 map_options = map_options,
89 )
90
~/htmap/htmap/mapping.py in create_map(tag, func, args_and_kwargs, map_options)
276
277 tags.raise_if_tag_is_invalid(tag)
--> 278 tags.raise_if_tag_already_exists(tag)
279
280 logger.debug(f'Creating map {tag} ...')
~/htmap/htmap/tags.py in raise_if_tag_already_exists(tag)
59 """Raise a :class:`htmap.exceptions.TagAlreadyExists` if the ``tag`` already exists."""
60 if tag_file_path(tag).exists():
---> 61 raise exceptions.TagAlreadyExists(f'The requested tag "{tag}" already exists. Load the Map with htmap.load("{tag}"), or remove it using htmap.remove("{tag}").')
62
63
TagAlreadyExists: The requested tag "dbl" already exists. Load the Map with htmap.load("dbl"), or remove it using htmap.remove("dbl").
As the error message indicates, if we want to re-use the tag dbl
, we need to remove the old map first:
[14]:
old_map = htmap.load('dbl')
old_map.remove()
htmap.Map.remove deletes all traces of the map. It can never be recovered. Be careful when using it!
The module-level shortcut htmap.remove lets you skip the intermediate step of getting the actual Map, if you don’t already have it.
Now we can re-use the map ID:
[15]:
new_map = htmap.map(double, range(10), tag = 'dbl')
new_map.wait(show_progress_bar = True)
print(list(new_map))
dbl: 0%| | 0/10 [00:00<?, ?component/s]
Created map dbl with 10 components
dbl: 100%|##########| 10/10 [00:07<00:00, 1.42component/s]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
Map Builders¶
So far we’ve been avoiding any functions that needed to be mapped over keyword arguments, or that had more than one positional argument. htmap.map
is not really the ideal tool for working with functions that have more than one argument, and it does not support varying more than one argument at all.
A much more ergonomic way to build up a complex map is a map builder. A map builder lets you build a map via individual function calls. Call htmap.build_map as a context manager to get the builder, then call the builder as if it were the mapped function itself:
[16]:
def power(base, exponent):
return base ** exponent
with htmap.build_map(power) as pow_builder:
for base in range(1, 5): # bases are 1, 2, 3, 4
for exponent in range(1, 4): # exponents are 1, 2, 3
pow_builder(base, exponent)
powered = pow_builder.map
print(list(powered)) # 1^1, 1^2, 1^3, 2^1, 2^2, 2^3, 3^1 ...
Created map harsh-happy-ring with 12 components
[1, 1, 1, 2, 4, 8, 3, 9, 27, 4, 16, 64]
The map builder catches the function calls and turns them into a map. The map is created when the with
block ends, at which point you can grab the actual htmap.Map from the builder’s map
attribute.
In the next tutorial, we’ll see how to tell HTMap to bring a local file along to the execute node.
Working with Files¶
High-throughput computing often involves analyzing data stored in files. For many simple cases, HTMap can automatically work with files that you specify as arguments of your function without (much) special treatment.
Let’s start with “Hello world!” example:
[1]:
from pathlib import Path
def read_file(path: Path):
return path.read_text()
This function takes in a pathlib.Path, reads it, and returns its contents. Let’s make a file and see how it works:
[2]:
hi_path = Path.cwd() / 'hi.txt'
print(hi_path)
hi_path.write_text('Hello world!')
/home/jovyan/tutorials/hi.txt
[2]:
12
[3]:
print(read_file(hi_path))
Hello world!
(pathlib has a steeper learning curve than os.path
, but it’s well worth the effort!)
Now, let’s start mapping. In this case, the map call is barely different than the original function call, but we need to set up the inputs correctly. The trick is that, instead of a pathlib.Path
, we need to use a htmap.TransferPath. htmap.TransferPath
is a drop-in replacement for pathlib.Path
in every way, except for HTMap’s special treatment of it.
HTMap will detect that we used an htmap.TransferPath
in a map as long as it is an argument or keyword argument of the function, or stored in a primitive container (list
, dict
, set
, tuples
) and automatically transfer the named file to wherever the function executes.
[4]:
import htmap
bye_path = htmap.TransferPath.cwd() / 'bye.txt'
bye_path.write_text('Have a nice day!')
[4]:
16
[5]:
map = htmap.map(read_file, [bye_path])
print(map.get(0)) # map.get will wait until the result is ready
Created map puny-thin-echo with 1 components
Have a nice day!
Multiple Files¶
To see how we can transfer a container full of files, let’s write a simple clone of the unix cat
program, which concatenates files. It takes a single argument which is a list of files to be concatenated, and returns the concatenated files as a string.
[6]:
def cat(files):
file_contents = (file.read_text() for file in files)
return ''.join(file_contents)
Let’s write some test files…
[7]:
cwd = htmap.TransferPath.cwd()
paths = [
cwd / 'start.txt',
cwd / 'middle.txt',
cwd / 'end.txt',
]
parts = [
'The quick brown ',
'fox jumps over ',
'the lazy dog!',
]
for path, part in zip(paths, parts):
path.write_text(part)
… and run a map!
[8]:
m = htmap.map(cat, [paths]) # this creates a single map component with the list of paths as the argument
print(m.get(0))
Created map red-bland-tub with 1 components
The quick brown fox jumps over the lazy dog!
If the “output” of your map function needs to be a file instead of a Python object (or you produce files that you need back submit-side for whatever reason), you’ll want to look at the Output Files recipe once you’re done with the tutorials.
In the next tutorial we’ll learn how to tell HTCondor about what resources our map components require, as well as another HTCondor configuration they need.
Map Options¶
Requesting Resources¶
The most common kind of map option you’ll probably need to work with are the ones for requests resources. HTMap makes fairly conservative default choices about the resources required by your map components. If your function needs a lot of resources, such as memory or disk space, you will need to communicate this to HTMap.
Suppose we need to transfer a huge input file that we need to read into memory, so we need a lot of memory and disk space available on the execute node. We’ll request 200 MB of RAM, 10 GB of disk space, and send our input file.
[1]:
from pathlib import Path
import htmap
def read_huge_file(file):
contents = Path(file).read_text()
# do stuff
return contents # we'll just return the contents here, but imagine this is the result of processing
[2]:
huge_file = htmap.TransferPath.cwd() / 'huge_file.txt'
huge_file.write_text('only a few words, but use your imagination')
[2]:
42
(Don’t panic! write_text()
returns the number of bytes written.)
And here’s our map call:
[3]:
processed = htmap.map(
read_huge_file,
[huge_file],
map_options = htmap.MapOptions(
request_memory = '100MB',
request_disk = '1GB',
),
)
print(processed.get(0))
Created map breezy-thick-beak with 1 components
only a few words, but use your imagination
request_memory
and request_disk
were passed as single strings. Since they are single strings, they will be treated as fixed options and applied to every component. The other kind of option is variadic, which lets you specify some option for each component of the map individually. For exampe, if we wanted a different amount of RAM for each component, we could pass a list of strings to request_memory
, one for each component:
[4]:
multiple = htmap.map(
read_huge_file,
[huge_file, huge_file, huge_file],
map_options = htmap.MapOptions(
request_memory = ['10MB', '20MB', '30MB'],
request_disk = '1GB',
),
)
print(list(multiple))
Created map tall-soft-stream with 3 components
['only a few words, but use your imagination', 'only a few words, but use your imagination', 'only a few words, but use your imagination']
The Kitchen Sink¶
HTMap also supports arbitrary HTCondor submit descriptors, like you would see in a submit file. Just pass them as keyword arguments to a htmap.MapOptions, keeping in mind that you can use standard ClassAd interpolation and that the same fixed/variadic behavior applies.
If that didn’t make sense, don’t worry about it! The whole point of HTMap is to avoid needing to know too much about submit descriptors.
The next tutorial discusses more convenient and flexible way of defining your maps.
Advanced Mapping¶
So far we’ve built our maps using the top-level mapping functions. These functions are useful for tutorials, but don’t give us the full flexibility that we might need when working with arbitrary Python functions. They’re also sometimes inconvenient to use, especially if you don’t like typing the names of your functions over and over. The tools described in this tutorial fix those problems.
Starmap¶
Back in Working With Files, we noted that htmap.map was only able to handle functions that took a single argument. To work with a function that took two arguments, we needed to use htmap.build_map to build up the map inside a loop.
Sometimes, you don’t want to loop. htmap.starmap provides the flexibility to completely specify the positional and keyword arguments for every component without needing an explicit for
-loop.
Unfortunately, that looks like this:
[1]:
import htmap
def power(x, p = 1):
return x ** p
[2]:
starmap = htmap.starmap(
func = power,
args = [
(1,),
(2,),
(3,),
],
kwargs = [
{'p': 1},
{'p': 2},
{'p': 3},
],
)
print(list(starmap)) # [1, 4, 27]
Created map proper-short-stream with 3 components
[1, 4, 27]
A slightly more pleasant but less obvious way to construct the arguments would be like this:
[3]:
starmap = htmap.starmap(
func = power,
args = ((x,) for x in range(1, 4)),
kwargs = ({'p': p} for p in range(1, 4)),
)
print(list(starmap)) # [1, 4, 27]
Created map light-soggy-idea with 3 components
[1, 4, 27]
But that isn’t really a huge improvement. Sometimes you’ll need the power and compactness of starmap
, but we recommend htmap.build_map for general use.
Mapped Functions¶
If you’re tired of typing htmap.map
all the time, create a htmap.MappedFunction using the htmap.mapped decorator:
[4]:
@htmap.mapped
def double(x):
return 2 * x
print(double)
MappedFunction(func = <function double at 0x7f750c0653b0>, map_options = {})
The resulting MappedFunction
has methods that correspond to all the mapping functions, but with the function already filled in.
For example:
[5]:
doubled = double.map(range(10))
print(list(doubled)) # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
Created map coy-burst-area with 10 components
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
The real utility of mapped functions is that they can carry default map options, which are inherited by any maps created from them. For example, if we know that a certain function will always need a large amount of memory and disk space, we can specify it for any map like this:
[6]:
@htmap.mapped(
map_options = htmap.MapOptions(
request_memory = '200MB',
request_disk = '1GB',
)
)
def big_list(_):
big = list(range(1_000_000)) # imagine this is way bigger...
return big
Now our request_memory
and request_disk
will be set for each map, without needing to specify it in the MapOptions
of each individual map
call. We can still override the setting for a certain map by manually passing htmap.MapOptions.
See htmap.MapOptions for some notes about how these inherited map options behave.
Non-Primitive Function Arguments¶
So far we’ve mostly limited our mapped function arguments to Python primitives like integers or strings. However, HTMap will work with almost any Python object. For example, we can use a custom class as a function argument. Maybe we have some data on penguins, and we want to write a Penguin
class to encapsulate that data:
[7]:
class Penguin:
def __init__(self, name, height, weight):
self.name = name
self.height = height
self.weight = weight
def analyze(self):
return f'{self.name} is {self.height} inches tall and weighs {self.weight} pounds'
def eat(self):
print('mmm, yummy fish')
def fly(self):
raise TypeError("penguins can't fly!")
[8]:
penguins = [
Penguin('Gwendolin', height = 73, weight = 51),
Penguin('Gweniffer', height = 59, weight = 43),
Penguin('Gary', height = 64, weight = 52),
]
[9]:
map = htmap.map(
lambda p: p.analyze(), # an anonmyous function; see https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions
penguins,
tag = 'penguin-stats',
)
map.wait(show_progress_bar = True)
penguin-stats: 0%| | 0/3 [00:00<?, ?component/s]
Created map penguin-stats with 3 components
penguin-stats: 100%|##########| 3/3 [00:03<00:00, 1.00s/component]
[10]:
for stats in map:
print(stats)
Gwendolin is 73 inches tall and weighs 51 pounds
Gweniffer is 59 inches tall and weighs 43 pounds
Gary is 64 inches tall and weighs 52 pounds
Specialized data structures like numpy
arrays and pandas
dataframes can also be used as function arguments. When in doubt, just try it!
In the next tutorial we’ll finally address the most important part of programming: what to do when things go wrong!
Error Handling¶
Holds¶
In previous tutorials we mentioned that HTMap is able to track the status of your components and inform you about something called a “hold”. A hold occurs when HTCondor notices something wrong about your map component. Perhaps an input file is missing, or your component tried to use a file that didn’t exist.
The last one is easy to force, so let’s do it and see what happens:
[1]:
import htmap
@htmap.mapped
def foo(_): # _ is a perfectly legal argument name, often used to mean "I don't actually use it"
return "I didn't get held!"
[2]:
path = htmap.TransferPath('this-file-does-not-exist.txt')
will_get_held = foo.map(
[path],
)
Created map angry-husky-law with 1 components
We know that the component will fail, but HTMap won’t know about it until we try to look at the output:
[3]:
print(will_get_held.get(0))
---------------------------------------------------------------------------
MapComponentHeld Traceback (most recent call last)
<ipython-input-3-68dfbf32680e> in <module>
----> 1 print(will_get_held.get(0))
~/htmap/htmap/maps.py in _protect(self, *args, **kwargs)
43 if not self.exists:
44 raise exceptions.MapWasRemoved(f'Cannot call {method} for map {self.tag} because it has been removed')
---> 45 return method(self, *args, **kwargs)
46
47 return _protect
~/htmap/htmap/maps.py in get(self, component, timeout)
390 If ``None``, wait forever.
391 """
--> 392 return self._load_output(component, timeout = timeout)
393
394 def __getitem__(self, item: int) -> Any:
~/htmap/htmap/maps.py in _load_output(self, component, timeout)
341 raise IndexError(f'Tried to get output for component {component}, but map {self.tag} only has {len(self)} components')
342
--> 343 self._wait_for_component(component, timeout)
344
345 status_and_result = htio.load_objects(self._output_file_path(component))
~/htmap/htmap/maps.py in _wait_for_component(self, component, timeout)
307 break
308 elif component_status is state.ComponentStatus.HELD:
--> 309 raise exceptions.MapComponentHeld(f'Component {component} of map {self.tag} is held: {self.holds[component]}')
310
311 if timeout is not None and (time.time() >= start_time + timeout):
MapComponentHeld: Component 0 of map angry-husky-law is held: [13] Error from slot1_6@1bea834c10a5: SHADOW at 172.17.0.2 failed to send file(s) to <172.17.0.2:33571>: error reading from /home/jovyan/tutorials/this-file-does-not-exist.txt: (errno 2) No such file or directory; STARTER failed to receive file(s) from <172.17.0.2:9618>
Yikes! HTMap has raised an exception to inform us that a component of our map got held. It also tells us why HTCondor held the component: error reading from /home/jovyan/tutorials/this-file-does-not-exist: (errno 2) No such file or directory; STARTER failed to receive file(s) from <172.17.0.2:9618>
.
This time around the hold reason is pretty clear: a local file that HTCondor expected to exist didn’t. We could fix the problem by creating the file, and then releasing the map, which tells HTCondor to try again:
[4]:
path.touch() # this creates an empty file
Now the map will run successfully. We tell HTMap to “release” the hold, allowing the map to continue running.
[5]:
will_get_held.release()
print(will_get_held.get(0))
I didn't get held!
Debugging holds¶
Unfortunately, holds will often not be so easy to resolve. Sometimes they are simply ephemeral errors that can be resolved by releasing the map without changing anything. But sometimes you’ll need to talk to your HTCondor pool administrator to figure out what’s going wrong.
Sometimes these errors are caused by additional parameters specified in your ~/.htmaprc
file. Are you sure ~/.htmaprc
has the intended parameters?
If you’re feeling really adventurous, look at files in the directory ~/.htmap/
. The standard output and error files are contained within this directory. This might help solve your problem.
Execution Errors¶
HTMap can also detect Python exceptions that occur during component execution. To see this in action, let’s define a function where a component will have a problem:
[6]:
@htmap.mapped
def inverse(x):
return 1 / x
When x = 0
, inverse(x)
will fail with a ZeroDivisionError
. If we run it locally, the error will halt execution and drop a traceback into our laps:
[7]:
inverse(0)
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-7-7538d73c586c> in <module>
----> 1 inverse(0)
~/htmap/htmap/mapped.py in __call__(self, *args, **kwargs)
50 def __call__(self, *args, **kwargs):
51 """Call the function as normal, locally."""
---> 52 return self.func(*args, **kwargs)
53
54 def map(
<ipython-input-6-769ac4dfb4b6> in inverse(x)
1 @htmap.mapped
2 def inverse(x):
----> 3 return 1 / x
ZeroDivisionError: division by zero
The traceback has a lot of critically-useful information in it. In fact, it tells us exactly the line that raised the error (remember that tracebacks should be read in reverse - the last block of source code is where the error began).
HTMap is able to transport this kind of information back from an executing component, but like the regular output of a map we won’t see it until we try to load up the output for the failed component. We’ll make a one-component map to demonstrate what happens:
[8]:
bad_map = inverse.map([0])
bad_map.get(0)
Created map fair-sly-drone with 1 components
---------------------------------------------------------------------------
MapComponentError Traceback (most recent call last)
<ipython-input-8-d23b8117e4db> in <module>
1 bad_map = inverse.map([0])
----> 2 bad_map.get(0)
~/htmap/htmap/maps.py in _protect(self, *args, **kwargs)
43 if not self.exists:
44 raise exceptions.MapWasRemoved(f'Cannot call {method} for map {self.tag} because it has been removed')
---> 45 return method(self, *args, **kwargs)
46
47 return _protect
~/htmap/htmap/maps.py in get(self, component, timeout)
390 If ``None``, wait forever.
391 """
--> 392 return self._load_output(component, timeout = timeout)
393
394 def __getitem__(self, item: int) -> Any:
~/htmap/htmap/maps.py in _load_output(self, component, timeout)
348 return next(status_and_result)
349 elif status == 'ERR':
--> 350 raise exceptions.MapComponentError(f'Component {component} of map {self.tag} encountered error while executing. Error report:\n{self._load_error(component).report()}')
351 else:
352 raise exceptions.InvalidOutputStatus(f'Output status {status} is not valid')
MapComponentError: Component 0 of map fair-sly-drone encountered error while executing. Error report:
========== Start error report for component 0 of map fair-sly-drone ==========
Landed on execute node 1bea834c10a5 (172.17.0.2) at 2020-05-21 17:45:40.954824
Python executable is /opt/conda/bin/python3 (version 3.7.6)
with installed packages
alembic==1.4.2
async-generator==1.10
attrs==19.3.0
backcall==0.1.0
bleach==3.1.4
blinker==1.4
brotlipy==0.7.0
certifi==2020.4.5.1
certipy==0.1.3
cffi==1.14.0
chardet==3.0.4
click==7.1.2
click-didyoumean==0.0.3
cloudpickle==1.4.1
colorama==0.4.3
conda==4.8.2
conda-package-handling==1.6.0
cryptography==2.9.2
cursor==1.3.4
decorator==4.4.2
defusedxml==0.6.0
entrypoints==0.3
halo==0.0.29
htchirp==1.0
htcondor==8.9.6
-e git+https://github.com/htcondor/htmap.git@e0fd6de94fcad0295ae674e5479fac51cf57f34f#egg=htmap
idna==2.9
importlib-metadata==1.6.0
ipykernel==5.2.1
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1588362967322/work
ipython-genutils==0.2.0
jedi==0.17.0
Jinja2==2.11.2
json5==0.9.0
jsonschema==3.2.0
jupyter-client==6.1.3
jupyter-core==4.6.3
jupyter-telemetry==0.0.5
jupyterhub==1.1.0
jupyterlab==2.1.1
jupyterlab-server==1.1.1
log-symbols==0.0.14
Mako==1.1.0
MarkupSafe==1.1.1
mistune==0.8.4
nbconvert==5.6.1
nbformat==5.0.6
nbstripout==0.3.7
notebook==6.0.3
oauthlib==3.0.1
pamela==1.0.0
pandocfilters==1.4.2
parso==0.7.0
pexpect==4.8.0
pickleshare==0.7.5
prometheus-client==0.7.1
prompt-toolkit==3.0.5
ptyprocess==0.6.0
pycosat==0.6.3
pycparser==2.20
pycurl==7.43.0.5
Pygments==2.6.1
PyJWT==1.7.1
pyOpenSSL==19.1.0
pyrsistent==0.16.0
PySocks==1.7.1
python-dateutil==2.8.1
python-editor==1.0.4
python-json-logger==0.1.11
pyzmq==19.0.0
requests==2.23.0
ruamel-yaml==0.15.80
ruamel.yaml.clib==0.2.0
Send2Trash==1.5.0
six==1.14.0
spinners==0.0.24
SQLAlchemy==1.3.16
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
toml==0.10.0
tornado==6.0.4
tqdm==4.46.0
traitlets==4.3.3
urllib3==1.25.9
wcwidth==0.1.9
webencodings==0.5.1
zipp==3.1.0
Scratch directory contents are
/home/jovyan/.condor/local/execute/dir_461/.chirp.config
/home/jovyan/.condor/local/execute/dir_461/_htmap_user_transfer
/home/jovyan/.condor/local/execute/dir_461/.job.ad
/home/jovyan/.condor/local/execute/dir_461/_condor_stderr
/home/jovyan/.condor/local/execute/dir_461/.machine.ad
/home/jovyan/.condor/local/execute/dir_461/func
/home/jovyan/.condor/local/execute/dir_461/_condor_stdout
/home/jovyan/.condor/local/execute/dir_461/0.in
/home/jovyan/.condor/local/execute/dir_461/_htmap_transfer
/home/jovyan/.condor/local/execute/dir_461/_htmap_do_output_transfer
/home/jovyan/.condor/local/execute/dir_461/_htmap_transfer_plugin_cache
/home/jovyan/.condor/local/execute/dir_461/condor_exec.exe
/home/jovyan/.condor/local/execute/dir_461/.update.ad
Exception and traceback (most recent call last):
File "<ipython-input-6-769ac4dfb4b6>", line 3, in inverse
return 1 / x
Local variables:
x = 0
ZeroDivisionError: division by zero
=========== End error report for component 0 of map fair-sly-drone ===========
Neat! This traceback is, unfortunately, harder to read than the other one. We need to ignore everything above MapComponentError: component 0 of map <tag> encountered error while executing. Error report:
- it’s just about the internal error that HTMap is raising to propagate the error to us. The real error is the stuff below ========= Start error report for component 0 of map <tag> =========
.
Since we’re trying to debug remotely, HTMap has gathered some metadata about the HTCondor “execute node” where the component was running. First it tell us where it is and when the component started executing. Next, the report tells us about the Python environment that was used to execute your function, including a list of installed packages. We also get a listing of the contents of the working directory - in this example, because we didn’t add any extra input files, it’s just a bunch of files that HTCondor and HTMap are using.
The meat of the error is the last thing in the error report. We get roughly the same information that we got in the local traceback, but we also get a printout of the local variables in each stack frame.
Since the local HTMap error is raised as soon as it finds a bad component, you may find it convenient to look at all of the error reports for your map (hopefully not too many!). htmap.Map.error_reports provides exactly this functionality:
[9]:
worse_map = inverse.map([0, 0, 0])
worse_map.wait(errors_ok = True) # wait for all of the components to hit the error
for report in worse_map.error_reports():
print(report + '\n')
Created map firm-vast-oven with 3 components
========== Start error report for component 0 of map firm-vast-oven ==========
Landed on execute node 1bea834c10a5 (172.17.0.2) at 2020-05-21 17:45:44.454503
Python executable is /opt/conda/bin/python3 (version 3.7.6)
with installed packages
alembic==1.4.2
async-generator==1.10
attrs==19.3.0
backcall==0.1.0
bleach==3.1.4
blinker==1.4
brotlipy==0.7.0
certifi==2020.4.5.1
certipy==0.1.3
cffi==1.14.0
chardet==3.0.4
click==7.1.2
click-didyoumean==0.0.3
cloudpickle==1.4.1
colorama==0.4.3
conda==4.8.2
conda-package-handling==1.6.0
cryptography==2.9.2
cursor==1.3.4
decorator==4.4.2
defusedxml==0.6.0
entrypoints==0.3
halo==0.0.29
htchirp==1.0
htcondor==8.9.6
-e git+https://github.com/htcondor/htmap.git@e0fd6de94fcad0295ae674e5479fac51cf57f34f#egg=htmap
idna==2.9
importlib-metadata==1.6.0
ipykernel==5.2.1
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1588362967322/work
ipython-genutils==0.2.0
jedi==0.17.0
Jinja2==2.11.2
json5==0.9.0
jsonschema==3.2.0
jupyter-client==6.1.3
jupyter-core==4.6.3
jupyter-telemetry==0.0.5
jupyterhub==1.1.0
jupyterlab==2.1.1
jupyterlab-server==1.1.1
log-symbols==0.0.14
Mako==1.1.0
MarkupSafe==1.1.1
mistune==0.8.4
nbconvert==5.6.1
nbformat==5.0.6
nbstripout==0.3.7
notebook==6.0.3
oauthlib==3.0.1
pamela==1.0.0
pandocfilters==1.4.2
parso==0.7.0
pexpect==4.8.0
pickleshare==0.7.5
prometheus-client==0.7.1
prompt-toolkit==3.0.5
ptyprocess==0.6.0
pycosat==0.6.3
pycparser==2.20
pycurl==7.43.0.5
Pygments==2.6.1
PyJWT==1.7.1
pyOpenSSL==19.1.0
pyrsistent==0.16.0
PySocks==1.7.1
python-dateutil==2.8.1
python-editor==1.0.4
python-json-logger==0.1.11
pyzmq==19.0.0
requests==2.23.0
ruamel-yaml==0.15.80
ruamel.yaml.clib==0.2.0
Send2Trash==1.5.0
six==1.14.0
spinners==0.0.24
SQLAlchemy==1.3.16
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
toml==0.10.0
tornado==6.0.4
tqdm==4.46.0
traitlets==4.3.3
urllib3==1.25.9
wcwidth==0.1.9
webencodings==0.5.1
zipp==3.1.0
Scratch directory contents are
/home/jovyan/.condor/local/execute/dir_492/.chirp.config
/home/jovyan/.condor/local/execute/dir_492/_htmap_user_transfer
/home/jovyan/.condor/local/execute/dir_492/.job.ad
/home/jovyan/.condor/local/execute/dir_492/_condor_stderr
/home/jovyan/.condor/local/execute/dir_492/.machine.ad
/home/jovyan/.condor/local/execute/dir_492/func
/home/jovyan/.condor/local/execute/dir_492/_condor_stdout
/home/jovyan/.condor/local/execute/dir_492/0.in
/home/jovyan/.condor/local/execute/dir_492/_htmap_transfer
/home/jovyan/.condor/local/execute/dir_492/_htmap_do_output_transfer
/home/jovyan/.condor/local/execute/dir_492/_htmap_transfer_plugin_cache
/home/jovyan/.condor/local/execute/dir_492/condor_exec.exe
/home/jovyan/.condor/local/execute/dir_492/.update.ad
Exception and traceback (most recent call last):
File "<ipython-input-6-769ac4dfb4b6>", line 3, in inverse
return 1 / x
Local variables:
x = 0
ZeroDivisionError: division by zero
=========== End error report for component 0 of map firm-vast-oven ===========
========== Start error report for component 1 of map firm-vast-oven ==========
Landed on execute node 1bea834c10a5 (172.17.0.2) at 2020-05-21 17:45:44.216714
Python executable is /opt/conda/bin/python3 (version 3.7.6)
with installed packages
alembic==1.4.2
async-generator==1.10
attrs==19.3.0
backcall==0.1.0
bleach==3.1.4
blinker==1.4
brotlipy==0.7.0
certifi==2020.4.5.1
certipy==0.1.3
cffi==1.14.0
chardet==3.0.4
click==7.1.2
click-didyoumean==0.0.3
cloudpickle==1.4.1
colorama==0.4.3
conda==4.8.2
conda-package-handling==1.6.0
cryptography==2.9.2
cursor==1.3.4
decorator==4.4.2
defusedxml==0.6.0
entrypoints==0.3
halo==0.0.29
htchirp==1.0
htcondor==8.9.6
-e git+https://github.com/htcondor/htmap.git@e0fd6de94fcad0295ae674e5479fac51cf57f34f#egg=htmap
idna==2.9
importlib-metadata==1.6.0
ipykernel==5.2.1
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1588362967322/work
ipython-genutils==0.2.0
jedi==0.17.0
Jinja2==2.11.2
json5==0.9.0
jsonschema==3.2.0
jupyter-client==6.1.3
jupyter-core==4.6.3
jupyter-telemetry==0.0.5
jupyterhub==1.1.0
jupyterlab==2.1.1
jupyterlab-server==1.1.1
log-symbols==0.0.14
Mako==1.1.0
MarkupSafe==1.1.1
mistune==0.8.4
nbconvert==5.6.1
nbformat==5.0.6
nbstripout==0.3.7
notebook==6.0.3
oauthlib==3.0.1
pamela==1.0.0
pandocfilters==1.4.2
parso==0.7.0
pexpect==4.8.0
pickleshare==0.7.5
prometheus-client==0.7.1
prompt-toolkit==3.0.5
ptyprocess==0.6.0
pycosat==0.6.3
pycparser==2.20
pycurl==7.43.0.5
Pygments==2.6.1
PyJWT==1.7.1
pyOpenSSL==19.1.0
pyrsistent==0.16.0
PySocks==1.7.1
python-dateutil==2.8.1
python-editor==1.0.4
python-json-logger==0.1.11
pyzmq==19.0.0
requests==2.23.0
ruamel-yaml==0.15.80
ruamel.yaml.clib==0.2.0
Send2Trash==1.5.0
six==1.14.0
spinners==0.0.24
SQLAlchemy==1.3.16
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
toml==0.10.0
tornado==6.0.4
tqdm==4.46.0
traitlets==4.3.3
urllib3==1.25.9
wcwidth==0.1.9
webencodings==0.5.1
zipp==3.1.0
Scratch directory contents are
/home/jovyan/.condor/local/execute/dir_487/.chirp.config
/home/jovyan/.condor/local/execute/dir_487/_htmap_user_transfer
/home/jovyan/.condor/local/execute/dir_487/.job.ad
/home/jovyan/.condor/local/execute/dir_487/_condor_stderr
/home/jovyan/.condor/local/execute/dir_487/.machine.ad
/home/jovyan/.condor/local/execute/dir_487/func
/home/jovyan/.condor/local/execute/dir_487/_condor_stdout
/home/jovyan/.condor/local/execute/dir_487/_htmap_transfer
/home/jovyan/.condor/local/execute/dir_487/1.in
/home/jovyan/.condor/local/execute/dir_487/_htmap_do_output_transfer
/home/jovyan/.condor/local/execute/dir_487/_htmap_transfer_plugin_cache
/home/jovyan/.condor/local/execute/dir_487/condor_exec.exe
/home/jovyan/.condor/local/execute/dir_487/.update.ad
Exception and traceback (most recent call last):
File "<ipython-input-6-769ac4dfb4b6>", line 3, in inverse
return 1 / x
Local variables:
x = 0
ZeroDivisionError: division by zero
=========== End error report for component 1 of map firm-vast-oven ===========
========== Start error report for component 2 of map firm-vast-oven ==========
Landed on execute node 1bea834c10a5 (172.17.0.2) at 2020-05-21 17:45:44.383019
Python executable is /opt/conda/bin/python3 (version 3.7.6)
with installed packages
alembic==1.4.2
async-generator==1.10
attrs==19.3.0
backcall==0.1.0
bleach==3.1.4
blinker==1.4
brotlipy==0.7.0
certifi==2020.4.5.1
certipy==0.1.3
cffi==1.14.0
chardet==3.0.4
click==7.1.2
click-didyoumean==0.0.3
cloudpickle==1.4.1
colorama==0.4.3
conda==4.8.2
conda-package-handling==1.6.0
cryptography==2.9.2
cursor==1.3.4
decorator==4.4.2
defusedxml==0.6.0
entrypoints==0.3
halo==0.0.29
htchirp==1.0
htcondor==8.9.6
-e git+https://github.com/htcondor/htmap.git@e0fd6de94fcad0295ae674e5479fac51cf57f34f#egg=htmap
idna==2.9
importlib-metadata==1.6.0
ipykernel==5.2.1
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1588362967322/work
ipython-genutils==0.2.0
jedi==0.17.0
Jinja2==2.11.2
json5==0.9.0
jsonschema==3.2.0
jupyter-client==6.1.3
jupyter-core==4.6.3
jupyter-telemetry==0.0.5
jupyterhub==1.1.0
jupyterlab==2.1.1
jupyterlab-server==1.1.1
log-symbols==0.0.14
Mako==1.1.0
MarkupSafe==1.1.1
mistune==0.8.4
nbconvert==5.6.1
nbformat==5.0.6
nbstripout==0.3.7
notebook==6.0.3
oauthlib==3.0.1
pamela==1.0.0
pandocfilters==1.4.2
parso==0.7.0
pexpect==4.8.0
pickleshare==0.7.5
prometheus-client==0.7.1
prompt-toolkit==3.0.5
ptyprocess==0.6.0
pycosat==0.6.3
pycparser==2.20
pycurl==7.43.0.5
Pygments==2.6.1
PyJWT==1.7.1
pyOpenSSL==19.1.0
pyrsistent==0.16.0
PySocks==1.7.1
python-dateutil==2.8.1
python-editor==1.0.4
python-json-logger==0.1.11
pyzmq==19.0.0
requests==2.23.0
ruamel-yaml==0.15.80
ruamel.yaml.clib==0.2.0
Send2Trash==1.5.0
six==1.14.0
spinners==0.0.24
SQLAlchemy==1.3.16
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
toml==0.10.0
tornado==6.0.4
tqdm==4.46.0
traitlets==4.3.3
urllib3==1.25.9
wcwidth==0.1.9
webencodings==0.5.1
zipp==3.1.0
Scratch directory contents are
/home/jovyan/.condor/local/execute/dir_488/.chirp.config
/home/jovyan/.condor/local/execute/dir_488/_htmap_user_transfer
/home/jovyan/.condor/local/execute/dir_488/.job.ad
/home/jovyan/.condor/local/execute/dir_488/_condor_stderr
/home/jovyan/.condor/local/execute/dir_488/.machine.ad
/home/jovyan/.condor/local/execute/dir_488/func
/home/jovyan/.condor/local/execute/dir_488/_condor_stdout
/home/jovyan/.condor/local/execute/dir_488/_htmap_transfer
/home/jovyan/.condor/local/execute/dir_488/2.in
/home/jovyan/.condor/local/execute/dir_488/_htmap_do_output_transfer
/home/jovyan/.condor/local/execute/dir_488/_htmap_transfer_plugin_cache
/home/jovyan/.condor/local/execute/dir_488/condor_exec.exe
/home/jovyan/.condor/local/execute/dir_488/.update.ad
Exception and traceback (most recent call last):
File "<ipython-input-6-769ac4dfb4b6>", line 3, in inverse
return 1 / x
Local variables:
x = 0
ZeroDivisionError: division by zero
=========== End error report for component 2 of map firm-vast-oven ===========
Unlike holds, you generally won’t want to re-run components that experienced errors (they’ll just fail again). Instead, an error is usually a signal that you’ve got a bug in your own code. Remove your map, debug the error locally, then create a new map.
Standard Output and Error¶
When handling trickier errors, you may need to look at the stdout
and stderr
from your map components. stdout
and stderr
are what you would see on the terminal if you executed your code locally - things like print
and exceptions normally display their information there. HTMap provides access to stdout
and stderr
for each component through the appropriately-named attributes of your maps:
[10]:
import sys
@htmap.mapped
def stdx(_):
print("Hi from stdout!") # stdout is the default
print("Hi from stderr!", file = sys.stderr)
m = stdx.map([None])
Created map quick-calm-stream with 1 components
[11]:
m.stdout.get(0) # get will wait for the stdout to become available, m.stdout[0] wouldn't
[11]:
Landed on execute node 1bea834c10a5 (172.17.0.2) at 2020-05-21 17:45:47.056114 as jovyan
Scratch directory contents before run:
|- .chirp.config
|- .job.ad
|- .machine.ad
|- .update.ad
|- 0.in
|- _condor_stderr
|- _condor_stdout
|- _htmap_do_output_transfer
|- * _htmap_transfer
|- * _htmap_transfer_plugin_cache
|- * _htmap_user_transfer
| \- * 0
|- condor_exec.exe
\- func
Python executable is /opt/conda/bin/python3 (version 3.7.6)
with installed packages
alembic==1.4.2
async-generator==1.10
attrs==19.3.0
backcall==0.1.0
bleach==3.1.4
blinker==1.4
brotlipy==0.7.0
certifi==2020.4.5.1
certipy==0.1.3
cffi==1.14.0
chardet==3.0.4
click==7.1.2
click-didyoumean==0.0.3
cloudpickle==1.4.1
colorama==0.4.3
conda==4.8.2
conda-package-handling==1.6.0
cryptography==2.9.2
cursor==1.3.4
decorator==4.4.2
defusedxml==0.6.0
entrypoints==0.3
halo==0.0.29
htchirp==1.0
htcondor==8.9.6
-e git+https://github.com/htcondor/htmap.git@e0fd6de94fcad0295ae674e5479fac51cf57f34f#egg=htmap
idna==2.9
importlib-metadata==1.6.0
ipykernel==5.2.1
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1588362967322/work
ipython-genutils==0.2.0
jedi==0.17.0
Jinja2==2.11.2
json5==0.9.0
jsonschema==3.2.0
jupyter-client==6.1.3
jupyter-core==4.6.3
jupyter-telemetry==0.0.5
jupyterhub==1.1.0
jupyterlab==2.1.1
jupyterlab-server==1.1.1
log-symbols==0.0.14
Mako==1.1.0
MarkupSafe==1.1.1
mistune==0.8.4
nbconvert==5.6.1
nbformat==5.0.6
nbstripout==0.3.7
notebook==6.0.3
oauthlib==3.0.1
pamela==1.0.0
pandocfilters==1.4.2
parso==0.7.0
pexpect==4.8.0
pickleshare==0.7.5
prometheus-client==0.7.1
prompt-toolkit==3.0.5
ptyprocess==0.6.0
pycosat==0.6.3
pycparser==2.20
pycurl==7.43.0.5
Pygments==2.6.1
PyJWT==1.7.1
pyOpenSSL==19.1.0
pyrsistent==0.16.0
PySocks==1.7.1
python-dateutil==2.8.1
python-editor==1.0.4
python-json-logger==0.1.11
pyzmq==19.0.0
requests==2.23.0
ruamel-yaml==0.15.80
ruamel.yaml.clib==0.2.0
Send2Trash==1.5.0
six==1.14.0
spinners==0.0.24
SQLAlchemy==1.3.16
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
toml==0.10.0
tornado==6.0.4
tqdm==4.46.0
traitlets==4.3.3
urllib3==1.25.9
wcwidth==0.1.9
webencodings==0.5.1
zipp==3.1.0
Running component 0
<function stdx at 0x146c42004680>
with args
(None,)
and kwargs
{}
----- MAP COMPONENT OUTPUT START -----
Hi from stdout!
----- MAP COMPONENT OUTPUT END -----
Finished executing component at 2020-05-21 17:45:47.256167
Scratch directory contents after run:
|- .chirp.config
|- .job.ad
|- .machine.ad
|- .update.ad
|- 0.in
|- _condor_stderr
|- _condor_stdout
|- * _htmap_current_checkpoint
|- _htmap_do_output_transfer
|- * _htmap_transfer
| \- 0.out
|- * _htmap_transfer_plugin_cache
|- * _htmap_user_transfer
| \- * 0
|- condor_exec.exe
\- func
Note that much of the same information from the error report is included in the component stdout
for convenience.
[12]:
m.stderr.get(0)
[12]:
Hi from stderr!
These attributes are both iterable sequences, which means that you can do something like this:
[13]:
@htmap.mapped
def err(x):
print(f"Hi from stderr! {x}", file = sys.stderr)
err_map = err.map(range(5))
err_map.wait(show_progress_bar = True)
for e in err_map.stderr:
print(e)
green-happy-year: 0%| | 0/5 [00:00<?, ?component/s]
Created map green-happy-year with 5 components
green-happy-year: 100%|##########| 5/5 [00:04<00:00, 1.25component/s]
Hi from stderr! 0
Hi from stderr! 1
Hi from stderr! 2
Hi from stderr! 3
Hi from stderr! 4
[ ]:
Advanced Tutorials¶
Note: these tutorial can not be run with Binder
- Docker Image Cookbook
How to build HTMap-compatible Docker images.
- Output Files
How to move arbitrary files back to the submit machine, or to other locations.
- Wrapping External Programs
How to send input and output to an external (i.e., non-Python) program from inside a mapped function.
- Checkpointing Maps
How to write a function that can continue from partial progress after being evicted.
- Using HTMap on the Open Science Grid
How to use HTMap on the Open Science Grid.
Docker Image Cookbook¶
Docker is, essentially, a way to send a self-contained computer called a container to another person. You define the software that goes into the container, and then anyone with Docker installed on their own computer (the “host”) can run your container and access the software inside without that sofware being installed on the host. This is an enormous advantage in distributed computing, where it can be difficult to ensure that software that your own software depends on (“dependencies”) are installed on the computers your code actually runs on.
To use Docker, you write a Dockerfile which tells Docker how to generate an image, which is a blueprint to construct a container. The Dockerfile is a list of instructions, such as shell commands or instructions for Docker to copy files from the build environment into the image. You then tell Docker to “build” the image from the Dockerfile.
For use with HTMap, you then upload this image to Docker Hub, where it can then be downloaded to execute nodes in an HTCondor pool. When your HTMap component lands on an execute node, HTCondor will download your image from Docker Hub and run your code inside it using HTMap.
The following sections describe, roughly in order of increasing complexity, different ways to build Docker images for use with HTMap. Each level of complexity is introduced to solve a more advanced dependency management problem. We recommend reading them in order until reach one that works for your dependencies (each section assumes knowledge of the previous sections).
More detailed information on how Dockerfiles work can be found in the Docker documentation itself This page only covers the bare minimum to get started with HTMap and Docker.
Attention
This recipe only covers using Docker for execute-side dependency management. You still need to install dependencies submit-side to launch your map in the first place!
Can I use HTMap’s default image?¶
HTMap’s default Docker image is htcondor/htmap-exec,
which is itself based on`continuumio/anaconda3 <https://hub.docker.com/r/continuumio/anaconda3/>`_.
It is based on Python 3 and has many useful packages pre-installed, such as numpy
, scipy
, and pandas
.
If your software only depends on packages included in the Anaconda distribution,
you can use HTMap’s default image and won’t need to create your own.
I depend on Python packages that aren’t in the Anaconda distribution¶
Attention
Before proceeding, install Docker on your computer and make an account on Docker Hub.
Let’s pretend that there’s a package called foobar
that your Python function depends on,
but isn’t part of the Anaconda distribution.
You will need to write your own Dockerfile to include this package in your Docker image.
Docker images are built in layers.
You always start a Dockerfile by stating which existing Docker image you’d like to use as your base layer.
A good choice is the same Anaconda image that HTMap uses as the default,
which comes with both the conda
package manager and the standard pip
.
Create a file named Dockerfile
and write this into it:
# Dockerfile
FROM continuumio/anaconda3:latest
ENV PATH=/opt/conda/bin/:${PATH}
RUN pip install --no-cache-dir htmap
ARG USER=htmap
RUN groupadd ${USER} \
&& useradd -m -g ${USER} ${USER}
USER ${USER}
Each line in the Dockerfile starts with a short, capitalized word which tells Docker what kind of build instruction it is.
FROM
means “start with this base image”.RUN
means “execute these shell commands in the container”.ARG
means “set build argument” - it acts like an environment variable that’s only set during the image build.
Lines that begin with a #
are comments in a Dockerfile.
The above lines say that we want to inherit from the image continuumio/anaconda3:latest
and build on top of it.
To be compatible with HTMap, we install htmap
via pip
.
We also set up a non-root user to do the execution, which is important for security.
Naming that user htmap
is arbitrary and has nothing to do with the htmap
package itself.
Now we need to tell Docker to run a shell command during the build to install foobar
by adding one more line to the bottom of the Dockerfile.
# Dockerfile
FROM continuumio/anaconda3:latest
ENV PATH=/opt/conda/bin/:${PATH}
RUN pip install --no-cache-dir htmap
ARG USER=htmap
RUN groupadd ${USER} \
&& useradd -m -g ${USER} ${USER}
USER ${USER}
# if foobar can be install via conda, use these lines
RUN conda install -y foobar \
&& conda clean -y --all
# if foobar can be installed via pip, use these lines
RUN pip install --no-cache-dir foobar
Some notes on the above:
If you need to install some packages via
conda
and some viapip
, you may need to use both types of lines.The
conda clean
and--no-cache-dir
instructions forconda
andpip
respectively just help keep the final Docker image as small as possible.The
-y
options for theconda
commands are the equivalent of answering “yes” to questions thatconda
asks on the command line, since the Docker build is non-interactive.A trailing
\
is a line continuation, so that first command is equivalent to runningconda install -y foobar && conda clean -y --all
, which is justbash
shorthand for “do both of these things”.
If you need install many packages, we recommend writing a requirements.txt
file (see the Python packaging docs)
and using
# Dockerfile
FROM continuumio/anaconda3:latest
ENV PATH=/opt/conda/bin/:${PATH}
RUN pip install --no-cache-dir htmap
ARG USER=htmap
RUN groupadd ${USER} \
&& useradd -m -g ${USER} ${USER}
USER ${USER}
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
The COPY
build instruction tells Docker to copy the file requirements.txt
(path relative to the build directory, explained below)
to the path requirements.txt
inside the image.
Relative paths inside the container work the same way they do in the shell; the image has a “working directory” that you can set using the WORKDIR
instruction.
Now that we have a Dockerfile, we can tell Docker to use it to build an image.
You’ll need to choose a descriptive name for the image, ideally something easy to type that’s related to your project (like qubits
or gene-analysis
).
Wherever you see <image>
below, insert that name.
You’ll also want to version your images by adding a “tag” after a :
, like <image>:v1
, <image>:v2
, <image>:v3
, etc.
You can use any string you’d like for the tag.
You’ll also need to know your Docker Hub username.
Wherever you see <username>
below, insert your username, and wherever you see <tag>
, insert your chosen version tag.
At the command line, in the directory that contains Dockerfile
, run
$ docker build -t <username>/<image>:<tag> .
You should see the output of the build process, hopefully ending with
Successfully tagged <username>/<image>:<tag>
<username>/<image>:<tag>
is the universal identifier for your image.
Now we need to push the image up to Docker Hub. Run
$ docker push <username>/<image>:<tag>
You’ll be asked for your credentials, and then all of the data for your image will be pushed up to Docker Hub. Once this is done, you should be able to use the image with HTMap. Change your HTMap settings (see DOCKER) to point to your new image, and launch your maps!
I don’t need most of the Anaconda distribution and want to use a lighter-weight base image¶
Instead of using the full Anaconda distribution, use a base Docker image that only includes the conda
package manager:
# Dockerfile
FROM continuumio/miniconda3:latest
ENV PATH=/opt/conda/bin/:${PATH}
RUN pip install --no-cache-dir htmap
ARG USER=htmap
RUN groupadd ${USER} \
&& useradd -m -g ${USER} ${USER}
USER ${USER}
From here, install your particular dependencies as above.
If you prefer to not use conda
, an even-barer-bones image could be produced from
# Dockerfile
FROM python:latest
RUN pip install --no-cache-dir htmap
ARG USER=htmap
RUN groupadd ${USER} \
&& useradd -m -g ${USER} ${USER}
USER ${USER}
We use python:latest
as our base image, so we don’t have conda
anymore.
I want to use a Python package that’s not on PyPI or Anaconda¶
Perhaps you’ve written a package yourself, or you want to use a package that is only available as source code on GitHub or a similar website.
There are multiple way to approach this, most of them roughly equivalent.
The first step for all of them is to write a setup.py
file for your package.
Some instructions for writing a setup.py
can be found here.
Once you have a working setup.py
, there are various ways to proceed, in reverse order of complexity:
Upload your package to PyPI and
pip install <package>
as in previous sections. This is the least flexible because you’ll need to upload to PyPI every time your update your package. If you don’t own the package, you shouldn’t do this!Upload your package to a publicly-accessible version control repository and use pip’s VCS support to install it (for example, if your package is on GitHub, something like
pip install git+https://github.com/<UserName>/<package>.git
).Use the
COPY
build instruction to copy your package directly into the Docker image, thenpip install <path/to/dir/containing/setup.py>
as aRUN
instruction. Note that your package will need to be in the Docker build context (see the docs for details).
I want to use a base image that doesn’t come with Python pre-installed¶
Say you have an existing Docker image that you need to use (maybe it includes non-Python dependencies that you aren’t sure how to install yourself).
You need to add Python to this image so that you can run your own code in it.
We recommend adding miniconda
to the image by adding these lines to your Dockerfile:
# Dockerfile
# see discussion below
FROM ubuntu:latest
RUN apt-get -y update \
&& apt-get install -y wget
# Docker build arguments
# use the Python version you need
# default to latest version of miniconda (which can then install any version of Python)
ARG PYTHON_VERSION=3.6
ARG MINICONDA_VERSION=latest
# set install location, and add the Python in that location to the PATH
ENV CONDA_DIR=/opt/conda
ENV PATH=${CONDA_DIR}/bin:${PATH}
# install miniconda and Python version specified in config
# (and ipython, which is nice for debugging inside the container)
RUN cd /tmp \
&& wget --quiet https://repo.continuum.io/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh \
&& bash Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -f -b -p $CONDA_DIR \
&& rm Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh \
&& conda install python=${PYTHON_VERSION} \
&& conda clean -y -all
After this, you can install HTMap and any other Python packages you need as in the preceeding sections.
Note that in this example we based the image on Ubuntu’s base image and installed wget
,
which we used to download the miniconda
installer.
Depending on your base image, you may need to use a different package manager
(for example, yum
) or different command-line file download tool (for example, curl
).
I want to build an image for use on the Open Science Grid¶
First, read through OSG’s Singularity documentation.
Based on that, our goal will be to build a Docker image and have OSG convert
it to a Singularity image that can be served by OSG.
The tricky part of this is that Docker’s ENV
instruction won’t carry over to
Singularity, which is the usual method of etting python3
on the PATH
inside the container.
To remedy this, we will create a special directory structure that Singularity
recognizes and uses to execute instructions with specified environments.
This is not a Singularity tutorial, so the simplest thing to do is copy the entire singularity.d directory that htmap-exec uses: https://github.com/htcondor/htmap/tree/master/htmap-exec/singularity.d
Anything you need to specify for your environment should be done in
singularity.d/env/90-environment.sh
.
This file will be “sourced” (run) when the image starts, before HTMap executes.
In your Dockerfile, you must copy this directory to the correct location inside the image:
# Dockerfile snippet
COPY <path/to/singularity.d> /.singularity.d
Note the path on the right: a hidden directory at the root of the filesystem.
This is just a Singularity convention.
The left path is just the location of the singularity.d
directory you made.
Note that if you FROM
an htmap-exec
image, this setup will already be embedded
in the image for you.
Output Files¶
If the “output” of your map function is a file, HTMap’s basic functionality will not be sufficient for you. As a toy example, consider a function which takes a string and a number, and writes out a file containing that string repeated that number of times, with a space between each repetition. The file itself will be the output of our function.
import htmap
import itertools
from pathlib import Path
@htmap.mapped
def repeat(string, number):
output_path = Path("repeated.txt")
with output_path.open(mode="w") as f:
f.write(" ".join(itertools.repeat(string, number)))
This would work great locally, producing a file named repeated.txt
in
the directory we ran the code from.
If this same code runs execute-side, the file will still be produced, but
HTMap won’t know that we care about the file.
In fact, the map will appear to be spectacularly useless:
with repeat.build_map() as mb:
mb("foo", 5)
mb("wiz", 3)
mb("bam", 2)
repeated = mb.map
print(list(repeated))
# [None, None, None]
There’s no sign of our output file!
(A function with no return
statement implicitly returns None
.)
We need to tell HTMap that we are producing an output file. We can do this by adding a call to an HTMap hook function in our mapped function after we create the file:
import htmap
import itertools
from pathlib import Path
@htmap.mapped
def repeat(string, number):
output_path = Path("repeated.txt")
with output_path.open(mode="w") as f:
f.write(" ".join(itertools.repeat(string, number)))
htmap.transfer_output_files(output_path) # identical, except for this line
The htmap.transfer_output_files()
function tells HTMap to move the files
at the given paths back to the submit machine for us.
We can then access those files using the Map.output_files
attribute,
which behaves like a sequence indexed by component numbers.
The elements of the sequence are pathlib.Path
pointing to the
directories containing the output files from each component, like so:
with repeat.build_map() as mb:
mb("foo", 5)
mb("wiz", 3)
mb("bam", 2)
repeated = mb.map
for component, base in enumerate(repeated.output_files):
path = base / "repeated.txt"
print(component, path.read_text())
# 0 foo foo foo foo foo
# 1 wiz wiz wiz
# 2 bam bam
Transferring Output to Other Places¶
You may need to transfer output to places that are not the submit machine.
HTMap can arrange this for you using the output_remaps
feature of
MapOptions
in combination with TransferPath
to specify
the destination of the output files.
In the below example, we have a function move_file
that just tells
HTMap to transfer whatever input it is given.
We give the path to an input file stored in a S3 bucket named my-bucket
on
some S3 server we can access, with some file (created and placed in the bucket
ahead of time) named in.txt
.
Our goal is to get that file back into the bucket, but renamed out.txt
.
To do so, we also create an output_file
destination, and tell HTMap to
“remap” the output transfer via the output_remaps
argument of
MapOptions
.
def move_file(input_path):
htmap.transfer_output_files(input_path)
bucket = htmap.TransferPath(
"my-bucket", protocol="s3", location="s3-server.example.com"
)
input_file = bucket / "in.txt"
output_file = bucket / "out.txt"
print(
input_file
) # TransferPath(path='my-bucket/in.txt', protocol='s3', location='s3-server.example.com')
print(
output_file
) # TransferPath(path='my-bucket/out.txt', protocol='s3', location='s3-server.example.com')
map = htmap.map(
move_file,
[input_file],
map_options=htmap.MapOptions(
request_memory="128MB",
request_disk="1GB",
output_remaps=[{input_file.name: output_file}],
),
)
After letting the map run, the output file will be in the bucket, and no
output file will have been sent back to the submit node
(i.e., m.output_files[0]
will be an empty directory).
Wrapping External Programs¶
HTMap can only map Python functions, but you might need to call an external program on the execute node. For example, you may need to use a particular Bash utility script, or run a piece of pre-compiled analysis software. In cases like this, the Python standard library’s subprocess module can be used to communicate with those programs.
For example, suppose you need to call the Dubious Barology Lyricon (dbl
) program, a pre-compiled C program that you have stored in your home directory at ~/dbl
.
It takes a single integer argument, and “returns” a single integer by printing it to standard output.
So a call to dbl
on the command line looks like
$ dbl 4
8
To use HTMap with dbl
, you could write a mapped function that looks something like
import subprocess
import htmap
@htmap.mapped(
map_options=htmap.MapOptions(
fixed_input_files="dbl",
)
)
def dbl(x):
process = subprocess.run(
["dbl", str(x)],
stdout=subprocess.PIPE, # use capture_output = True in Python 3.7+
)
if process.returncode != 0:
raise Exception("call to dbl failed!")
return_value = int(process.stdout)
return return_value
You’ll need to be careful with functions like this - check for failures in the programs you call, because HTMap will happily return nonsense if the call fails in some strange way. If we do a map, we’ll end up with the expected result:
result = dbl.map(range(10))
print(list(result)) # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
If you want to test this yourself, here’s the Dubious Barology Lyricon (really a simple bash
program):
#!/usr/bin/env bash
echo $((2 * $1))
If your external program outputs files, you may find the Output Files recipe helpful.
Checkpointing Maps¶
When running on opportunistic resources, HTCondor might “evict” your map components from the execute locations. Evicted components return to the queue and, without your intervention, restart from scratch. However, HTMap can preserve files across an eviction and make them available in the next run. You can use this to write a function that can resume from partial progress when it restarts.
The important thing for you to think about is that HTMap will always run your function from the start. That means that the general structure of a checkpointing function should look like this:
def function(inputs):
try:
...
# attempt to reload from a checkpoint file
except (
FileNotFoundError,
...,
): # catch any errors that indicate that the checkpoint doesn't exist, is corrupt, etc.
# initialize from input data
...
# do work
Your work must be written such that it doesn’t care where it starts.
Generally that means you’ll need to replace for
loops with while
loops.
For example, a simulation that proceeds in 100 steps like this:
import htmap
@htmap.mapped
def function(initial_state):
current_state = initial_state
for step in range(100):
current_state = evolve(current_state)
return current_state
would need to become something like
import htmap
@htmap.mapped
def function(initial_state):
try:
current_step, current_state = load_from_checkpoint(checkpoint_file)
except FileNotFoundError:
current_step, current_state = 0, initial_state
while current_step < 100:
current_state = evolve(current_state)
current_step += 1
if should_write_checkpoint:
write_checkpoint(current_step, current_state)
htmap.checkpoint(checkpoint_file) # important!
return current_state
Note the call to htmap.checkpoint()
.
This function takes the paths to the checkpoint file(s) that you’ve written and does the necessary behind-the-scenes handling to make them available if the component needs to restart.
If you don’t call this function, the files will not be available, and your checkpoint won’t work!
Concrete Example¶
Let’s work with a more concrete example. Here’s the function, along with some code to run it and prove that it checkpointed:
from pathlib import Path
import time
import htmap
@htmap.mapped
def counter(num_steps):
checkpoint_path = Path("checkpoint")
try:
step = int(checkpoint_path.read_text())
print("loaded checkpoint!")
except FileNotFoundError:
step = 0
print("starting from scratch")
while True:
time.sleep(1)
step += 1
print(f"completed step {step}")
if step >= num_steps:
break
checkpoint_path.write_text(str(step))
htmap.checkpoint(checkpoint_path)
return True
map = counter.map([30])
# wait for the component to start
while map.component_statuses[0] is not htmap.ComponentStatus.RUNNING:
print(map.component_statuses[0])
time.sleep(1)
# let it run for 10 seconds
print("component has started, letting it run...")
time.sleep(10)
# vacate it (force it off current execute resource)
map.vacate()
print("vacated map")
# wait until it starts up again and finishes
while map.component_statuses[0] is not htmap.ComponentStatus.COMPLETED:
print(map.component_statuses[0])
time.sleep(1)
# look at the function output and the stdout from execution
print(map[0])
print(map.stdout(0))
The function itself just sleeps for the given amount of time, but it does it in incremental steps so that we can checkpoint its progress.
We write checkpoints to a file named checkpoint
in the current working directory of the script when it executes.
We try to load the current step number (stored as text, so we need to convert it to an integer) from that file when we start, and if that fails we start from the beginning.
We write a checkpoint after each step, which is overkill (see the next section), but easy to implement for this short example.
The rest of the code (after the function definition) is just there to prove that the example works. If we run this script, we should see something like this:
IDLE
# many IDLE messages
IDLE
component has started, letting it run...
vacated map
RUNNING
IDLE
# more IDLE messages
IDLE
RUNNING
# many RUNNING messages
RUNNING
True # this is map[0]: it's True, not None, so the function finished successfully
# a bunch of debug information from the stdout of the component
----- MAP COMPONENT OUTPUT START -----
loaded checkpoint! # we did it!
completed step 10
completed step 11
completed step 12
completed step 13
completed step 14
completed step 15
completed step 16
completed step 17
completed step 18
completed step 19
completed step 20
completed step 21
completed step 22
completed step 23
completed step 24
completed step 25
completed step 26
completed step 27
completed step 28
completed step 29
completed step 30
----- MAP COMPONENT OUTPUT END -----
Finished executing component at 2019-01-20 08:34:31.130818
We successfully started from step 10! For a long-running computation, this could represent a significant amount of work. Long-running components on opportunistic resources might be evicted several times during their life, and without checkpointing, may never finish.
Checkpointing Strategy¶
You generally don’t need to write checkpoints very often.
We recommend writing a new checkpoint if a certain amount of time has elapsed, perhaps an hour.
For example, using the datetime
library:
import datetime
import htmap
def now():
return datetime.datetime.utcnow()
@htmap.mapped
def function(inputs):
latest_checkpoint_at = now()
# load from checkpoint or initialize
while not_done:
# do a unit of work
if now() > latest_checkpoint_at + datetime.timedelta(hours=1):
# write checkpoint
latest_checkpoint_at = now()
return result
Caveats¶
Checkpointing does introduce some complications with HTMap’s metadata tracking system. In particular, HTMap only tracks the runtime, stdout, and stderr of the last execution of each component. If your components are vacated and start again from a checkpoint, you’ll only see the execution time, standard output, and standard error from the second run. If you need that information, you should track it yourself inside your checkpoint files.
Using HTMap on the Open Science Grid¶
Running HTMap with the Open Science Grid (OSG)
requires some special configuration.
The OSG does not support Docker, and is also not amenable to HTMap’s own
Singularity delivery mechanism.
However, the OSG does still allow you to run your code inside a Singularity
container.
The .htmaprc
file snippet below sets up HTMap to use this support.
# .htmaprc
DELIVERY_METHOD = "assume"
[MAP_OPTIONS]
requirements = "HAS_SINGULARITY == TRUE"
"+ProjectName" = "\"<your project name>\""
"+SingularityImage" = "\"/cvmfs/singularity.opensciencegrid.org/<repo/tag:version>\""
The extra "
on the left are to escape the +
, which is not normally legal syntax,
and the extra \"
on the right are to ensure that the actual value is a string.
Note the two places inside < >
, where you must supply some information
You must specify your OSG project name, and you must specify which OSG-supplied
Singularity image to use.
For more information on what images are available, see the
OSG Singularity documentation.
HTMap’s own default image, htmap-exec
, is always available on the OSG.
For example, to use htmap-exec:v0.4.3
, you would set
"+SingularityImage" = "\"/cvmfs/singularity.opensciencegrid.org/htcondor/htmap-exec:v0.4.3\""
For advice on building your own image for the OSG, see I want to build an image for use on the Open Science Grid.
Using HTCondor with HTMap¶
HTMap is a Python wrapper over the underlying HTCondor API. That means the vast majority of the HTCondor functionality is available. This page is a brief overview of how HTMap uses HTCondor to run your maps. It may be helpful for debugging, or for cross-referencing your HTMap and HTCondor knowledge.
Component and Job States¶
Each HTMap map component is represented by an HTCondor job. Map components will usually be in one of four HTCondor job states:
Idle: the job/component has not started running yet; it is waiting to be assigned resources to execute on.
Running: the job/component is running on an execute machine.
Held: HTCondor has decided that it can’t run the job/component, but that you (the user) might be able to fix the problem. The job will try to run again if it released.
Completed: the job/component has finished running, and HTMap has collected its output. These jobs will likely leave the HTCondor queue soon.
For more detail, see the relevant HTCondor documentation:
Requesting Resources¶
The default resources provisioned for your map component can be limiting –
what if your job requires more memory or more disk space?
HTCondor jobs can request resources, and
HTMap supports those requests via MapOptions
.
MapOptions
accepts many of the same keys that condor_submit
accepts. Some of the more commonly requested resources are:
request_memory
. Possible values are like"1MB
for 1MB, or"2GB"
for 2GB of memory.request_cpus
. Possible values are like"1"
for 1 CPU, or"2"
for 2 CPUs.request_disk
to request an amount of disk space. Possible values are like"10GB"
for 10GB, or"1TB"
for 1 terabyte.
If any of the resource requests are not set, the default values set by your HTCondor cluster administrator will be used.
These would be set with MapOptions
. For example, this code
might be used:
options = htmap.MapOptions(
request_cpus="1",
request_disk="10GB",
request_memory="4GB",
)
htmap.map(..., map_options=options)
When it’s mentioned that “the option foo
needs to be set” in a
submit file, this corresponds to adding the option in the appropriate place in
MapOptions
.
GPUs¶
For any GPU job, the option
request_gpus
needs to be set.Many GPU jobs are machine learning jobs. CHTC has a guide on “Run Machine Learning Jobs on the HTC system”.
There are some site-specific options. For example, CHTC has a guide on some of these options “Jobs that use GPUs” to run jobs on their GPU Lab. Check with your site’s documentation to see if they have any GPU documentation.
Command Line Tools¶
HTMap tries to expose a complete interface for submitting and managing jobs, but not for examining the state of your HTCondor pool itself. Here are some HTCondor commands that you may find useful:
condor_q: seeing the jobs submitted to the scheduler (similar to
htmap.status()
).condor_status: seeing resources the different machines have.
The links go an HTML version of the man pages; they are also visible with man
(e.g., man condor_q
). Here’s a list of possibly useful commands:
## See the jobs user foobar has submitted, and their status
condor_q --submitter foobar
## See if how many machines have GPUs, and how many are available
condor_status --constraint "CUDADriverVersion>=10.1" -total
## See the stats on GPU machines (including GPU name)
condor_status -compact -constraint 'TotalGpus > 0' -af Machine TotalGpus CUDADeviceName CUDACapability
## See how much CUDA memory on each machine (and how many are available)
condor_status --constraint "CUDADriverVersion>=10.1" -attributes CUDAGlobalMemoryMb -json
# See which machines have that much memory
# Also write JSON file so readable by Pandas read_json
condor_status --constraint "CUDADriverVersion>=10.1" -attributes CUDAGlobalMemoryMb -attribute Machine -json >> stats.json
## See how many GPUs are available
condor_status --constraint "CUDADriverVersion>=10.1" -total
CUDAGlobalMemoryMb
is not the only attribute that can be displayed; a more
complete list is at
https://htcondor.readthedocs.io/en/latest/classad-attributes/machine-classad-attributes.html.
Tips and Tricks¶
Separate Job Submission/Monitoring/Collection¶
This is recommended because it’s more interactive and more flexible: it doesn’t rely on the script being free of bugs on submission. Likewise, un-expected errors can easily be adapted (such as hung jobs, etc). This is most appropriate for medium- or long-running jobs.
The CLI is useful to monitor and modify ongoing jobs. Generally, in simple use cases we recommend writing two or three scripts:
A script for job submission (which is run once).
Use the CLI or a script for monitoring jobs (which is run many times).
A script to collect results (which is a few times).
Each script uses these commands:
Submission: HTMap’s Python API is primarily used here, possibly through
map()
.Monitoring: CLI usage is heavy here.
htmap status
is a good way to view a summary. If any of the jobs fail, diagnose why with commands likehtmap reasons
orhtmap errors
.Collection: the completed jobs are collected (as mentioned in How do I only process completed jobs?) and the results are written to disk/etc.
The CLI is useful for debugging when dealing with component holds and execution errors.
It can be used to quickly view the same kind of information as the Map
API
(though we recommend loading up the map in Python once you need to do anything
more complex than read text).
Use the CLI¶
Use of the CLI is recommended to go alongside separation of submission/monitoring/collection as mentioned above. This section will provide some useful commands.
This command shows the status of each job for various tags:
htmap status --live # See live display of info on each job (and their tags)
This might indicate that 4 jobs in tag foo
are completed and 2 are idle (or
waiting to be run).
This command completely deletes the map with tag foo
, including removing
any jobs that are in any state (running, idle, held, whatever). Use this if you
want to completely resubmit the map from scratch, without any previous state.
htmap remove foo
This commands keeps the jobs in the queue, but prevents them from running. This allowed editing them and lets you edit them live.
htmap hold foo
These commands will show more information about individual maps and map components:
htmap logs # get path to log file; info here is useful for debugging
htmap components foo # view which component status for tag "foo"
htmap errors foo # view all errors for tag "foo"
htmap stdout foo 0 # view stdout for first component of tag "foo"
htmap stderr foo 0 # view stdout for first component of tag "foo"
htmap reasons foo # get reasons for holding map "foo"
Some of the longer output is useful to pipe into less
so it’s easily
navigable and searchable. For example,
htmap errors foo | less
To get help on less
, use the command man less
or press h
while in
less
.
Full CLI documentation is at CLI Reference.
Conditional Execution on Cluster vs. Submit¶
The environment variable HTMAP_ON_EXECUTE
is set to '1'
while map components are executing out on the cluster.
This can be useful if you need to switch certain behavior on or off depending whether you’re running your function locally or not.
Functional programming¶
Filter¶
In the parlance of higher-order functions, HTMap only provides map.
Another higher-order function, filter, is easy to implement once you have a map.
To mimic it we create a map with a boolean output, and use htmap.Map.iter_with_inputs()
inside a list comprehension to filter the inputs using the outputs.
Here’s a brief example: checking whether integers are even.
import htmap
@htmap.mapped
def is_even(x: int) -> bool:
return x % 2 == 0
result = is_even.map(range(10))
filtered = [input for input, output in result.iter_with_inputs() if output]
print(filtered) # [((0,), {}), ((2,), {}), ((4,), {}), ((6,), {}), ((8,), {})]
Groupby¶
In the parlance of higher-order functions, HTMap only provides map.
Another higher-order function, groupby, is easy to implement once you have a map.
To mimic it we’ll write a helper function that uses a collections.defaultdict
to construct a dictionary that collects inputs that have the same output, using the output as the key.
Here’s a brief example: grouping integer by whether they are even or not.
import collections
import htmap
@htmap.mapped
def is_even(x: int) -> bool:
return x % 2 == 0
def groupby(result):
groups = collections.defaultdict(list)
for input, output in result.iter_with_inputs():
groups[output].append(input)
return groups
result = is_even.map(range(10))
for group, elements in groupby(result).items():
print(group, elements)
# True [((0,), {}), ((2,), {}), ((4,), {}), ((6,), {}), ((8,), {})]
# False [((1,), {}), ((3,), {}), ((5,), {}), ((7,), {}), ((9,), {})]
FAQ¶
How do I abort a job?¶
For example, say you mistakenly launched a map tagged foo
,
but now want to abort/cancel it, fix some input parameters, then relaunch it.
The right CLI command is htmap remove foo
, or the HTMap function
remove()
. This mirrors the HTCondor API and will remove the job
from the job scheduler regardless of state (running, idle, held, etc).
How do I only process completed jobs?¶
Let’s say you submitted 10,000 long-running jobs, and 99.9% of these jobs have finished successfully. You’d like to get the results from the successful jobs, and save the results to disk without have to wait for the 10 remaining jobs slow jobs.
The right function to use is components_by_status()
. It can
filter out the successful jobs and process those. See the
components_by_status()
documentation for an example usage.
Is it possible to use Dask with HTCondor? How does it compare with HTMap?¶
HTMap provides a transparent interface to the underlying HTCondor behavior, allowing for features like using HTCondor file transfer and taking advantage of the rich HTCondor job model. HTMap does need to be running through the entire duration of your computation.
Dask can spawn its distributed workers on an HTCondor pool. By doing this you get access to Dask’s features, but not HTCondor’s. Dask will need to be running through the entire duration of your computation.
You should choose the appropriate option for your use case.
Dask Distributed is a lightweight library for distributed Python computation. Dask Distributed has familiar APIs, is declarative and supports more complex scheduling than map/filter/reduce.
Dask-Jobqueue present a wrapper for HTCondor clusters through their HTCondorCluster. After HTCondorCluster is used, Dask can be used as normal or on your own machine. This is common with other cluster managers too: Dask-Jobqueue also wraps SLURM, SGE, PBS and LSF clusters, and Dask Distributed can wrap Kubernetes and Hadoop clusters.
I’m getting a weird error from cloudpickle.load
?¶
You probably have a version mismatch between the submit and execute locations. See the “Attention” box near the top of Dependency Management.
If you are using custom libraries, always import them before trying to load any output from maps that use them.
I’m getting an error about a job being held. What should I do?¶
Your code likely encountered an error during remote execution. Briefly, try
viewing the standard error (stderr
) with HTMap, either via the CLI or API.
Details can be found in Tutorials and Error Handling.
API Reference¶
Mapping Functions¶
- htmap.map(func, args, map_options=None, tag=None, quiet=False)[source]¶
Map a function call over a one-dimensional iterable of arguments. The function must take exactly one positional argument and no keyword arguments.
- Parameters:
func (
Callable
) – The function to map the arguments over.args (
Iterable
[Any
]) – An iterable of arguments to pass to the mapped function.map_options (
Optional
[MapOptions
]) – An instance ofhtmap.MapOptions
.quiet (
bool
) – Do not print the map name in an interactive shell.
- Return type:
- Returns:
map – A
htmap.Map
representing the map.
- htmap.starmap(func, args=None, kwargs=None, map_options=None, tag=None, quiet=False)[source]¶
Map a function call over aligned iterables of arguments and keyword arguments. Each element of
args
andkwargs
is unpacked into the signature of the function, so their elements should be tuples and dictionaries corresponding to position and keyword arguments of the mapped function.- Parameters:
func (
Callable
) – The function to map the arguments over.args (
Optional
[Iterable
[Tuple
[Any
,...
]]]) – An iterable of tuples of positional arguments to unpack into the mapped function.kwargs (
Optional
[Iterable
[Dict
[str
,Any
]]]) – An iterable of dictionaries of keyword arguments to unpack into the mapped function.map_options (
Optional
[MapOptions
]) – An instance ofhtmap.MapOptions
.quiet (
bool
) – Do not print the map name in an interactive shell.
- Return type:
- Returns:
map – A
htmap.Map
representing the map.
- htmap.build_map(func, map_options=None, tag=None)[source]¶
Return a
MapBuilder
for the given function.- Parameters:
func (
Callable
) – The function to map over.map_options (
Optional
[MapOptions
]) – An instance ofhtmap.MapOptions
.
- Return type:
- Returns:
map_builder – A
MapBuilder
for the given function.
Map Builder¶
- class htmap.MapBuilder(func, map_options=None, tag=None)[source]¶
The
htmap.MapBuilder
provides an alternate way to create maps. Once created viahtmap.build_map()
or similar as a context manager, the map builder can be called as if it were the function you’re mapping over. When thewith
block exits, the inputs are collected and submitted as a single map.with htmap.build_map(tag="pow", func=lambda x, p: x ** p) as builder: for x in range(1, 4): builder(x, x) map = builder.map print(list(map)) # [1, 4, 27]
- __len__()[source]¶
The length of a
MapBuilder
is the number of inputs it has been sent.- Return type:
- property map: Map¶
The
Map
associated with thisMapBuilder
. Will raisehtmap.exceptions.NoMapYet
when accessed until thewith
block for thisMapBuilder
completes.
MappedFunction¶
A more convenient and flexible way to work with HTMap is to use the htmap()
decorator to build a MappedFunction
.
- htmap.mapped(map_options=None)[source]¶
A decorator that wraps a function in an
MappedFunction
, which provides an interface for mapping functions calls out to an HTCondor cluster.- Parameters:
map_options (
Optional
[MapOptions
]) – An instance ofhtmap.MapOptions
. Any map calls from theMappedFunction
produced by this decorator will inherit from this.- Return type:
- Returns:
mapped_function – A
MappedFunction
that wraps the function (or a wrapper function that does the wrapping).
- class htmap.MappedFunction(func, map_options=None)[source]¶
- Parameters:
func (
Callable
) – A function to wrap in aMappedFunction
.map_options (
Optional
[MapOptions
]) – An instance ofhtmap.MapOptions
. Any map calls from theMappedFunction
produced by this decorator will inherit from this.
- map(args, tag=None, map_options=None)[source]¶
As
htmap.map()
, but thefunc
argument is the mapped function.- Return type:
- starmap(args=None, kwargs=None, tag=None, map_options=None)[source]¶
As
htmap.starmap()
, but thefunc
argument is the mapped function.- Return type:
- build_map(tag=None, map_options=None)[source]¶
As
htmap.build_map()
, but thefunc
argument is the mapped function.- Return type:
Map¶
The Map
is your window into the status and output of your map.
Once you get a map result back from a map call,
you can use its methods to get the status of jobs,
change the properties of the map while its running,
pause, restart, or cancel the map,
and finally retrieve the output once the map is done.
The various methods that allow you to get and iterate over components will raise exceptions if something has gone wrong with your map:
htmap.exceptions.MapComponentError
if a component experienced an error while executing.htmap.exceptions.MapComponentHeld
if a component was held by HTCondor, likely because an input file did not exist or the component used too much memory or disk.
The exception message will contain information about what caused the error. See Error Handling for more details on error handling.
- class htmap.Map(*, tag, map_dir)[source]¶
Represents the results from a map call.
Warning
You should never instantiate a
Map
directly! Instead, you’ll get yourMap
by calling a top-level mapping function likehtmap.map()
, aMappedFunction
mapping method, or by usinghtmap.load()
. We are not responsible for whatever vile contraption you build if you bypass the correct methods!- __getitem__(item)[source]¶
Return the output associated with the input index. Does not block.
- Return type:
- classmethod load(tag)[source]¶
Load a
Map
by looking up itstag
.Raises
htmap.exceptions.TagNotFound
if thetag
does not exist.
- property components: Tuple[int, ...]¶
Return a tuple containing the component indices for the
htmap.Map
.
- wait(timeout=None, show_progress_bar=False, holds_ok=False, errors_ok=False)[source]¶
Wait until all output associated with this
Map
is available.If any components in the map are held or experience an execution error, this method will raise an exception (
htmap.exceptions.MapComponentHeld
orhtmap.exceptions.MapComponentError
, respectively).- Parameters:
timeout (
Union
[int
,float
,timedelta
,None
]) – How long to wait for the map to complete before raising ahtmap.exceptions.TimeoutError
. IfNone
, wait forever.show_progress_bar (
bool
) – IfTrue
, a progress bar will be displayed.holds_ok (
bool
) – IfTrue
, will not raise exceptions if components are held.errors_ok (
bool
) – IfTrue
, will not raise exceptions if components experience execution errors.
- Return type:
- get(component, timeout=None)[source]¶
Return the output associated with the input component index. If the component experienced an execution error, this will raise
htmap.exceptions.MapComponentError
. Useget_err()
,errors()
,error_reports()
to see what went wrong!
- get_err(component, timeout=None)[source]¶
Return the error associated with the input component index. If the component actually succeeded, this will raise
htmap.exceptions.ExpectedError
.- Parameters:
- Return type:
- iter(timeout=None)[source]¶
Returns an iterator over the output of the
htmap.Map
in the same order as the inputs, waiting on each individual output to become available.
- iter_with_inputs(timeout=None)[source]¶
Returns an iterator over the inputs and output of the
htmap.Map
in the same order as the inputs, waiting on each individual output to become available.
- iter_as_available(timeout=None)[source]¶
Returns an iterator over the output of the
htmap.Map
, yielding individual outputs as they become available.The iteration order is initially random, but is consistent within a single interpreter session once the map is completed.
- iter_as_available_with_inputs(timeout=None)[source]¶
Returns an iterator over the inputs and output of the
htmap.Map
, yielding individual(input, output)
pairs as they become available.The iteration order is initially random, but is consistent within a single interpreter session once the map is completed.
- property component_statuses: List[ComponentStatus]¶
Return the current
state.ComponentStatus
of each component in the map.
- components_by_status()[source]¶
Return the component indices grouped by their states.
- Return type:
Examples
This example finds the completed jobs for a submitted map, and processes those results:
from time import sleep import htmap def job(x): sleep(x) return 1 / x m = htmap.map(job, [0, 2, 4, 6, 8], tag="foo") # Wait for all jobs to finish. # Alternatively, use `futures = htmap.load("foo")` on a different process sleep(10) completed = m.components_by_status()[htmap.JobStatus.COMPLETED] for component in completed: result = m.get(future) # Whatever processing needs to be done print(result) # prints "2", "4", "6", and "8"
- property holds: Dict[int, ComponentHold]¶
A dictionary of component indices to their
Hold
(if they are held).
- hold_report()[source]¶
Return a string containing a formatted table describing any held components.
- Return type:
- property errors: Dict[int, ComponentError]¶
A dictionary of component indices to their
ExecutionError
(if that component experienced an error).
- error_reports()[source]¶
Yields the error reports for any components that experienced an error during execution.
- property memory_usage: List[int]¶
Return the latest peak memory usage of each map component, measured in MB. A component that hasn’t reported yet will show a
0
.Warning
Due to current limitations in HTCondor, memory use for very short-lived components (<5 seconds) will not be accurate.
- remove(force=False)[source]¶
This command removes a map from the Condor queue. Functionally, this command aborts a job.
This function will completely remove a map from the Condor queue regardless of job state (running, executing, waiting, etc). All data associated with a removed map is permanently deleted.
- property exists: bool¶
True
if and only if the map has not been successfully removed. Otherwise,False
.
- hold()[source]¶
This command holds a map. The components of the map will not be allowed to run until released (see
Map.release()
).HTCondor may itself hold your map components if it detects that something has gone wrong with them. Resolve the underlying problem, then use the
Map.release()
command to allow the components to run again.- Return type:
- release()[source]¶
This command releases a map, undoing holds (see
Map.hold()
). The held components of a released map will become idle again.HTCondor may itself hold your map components if it detects that something has gone wrong with them. Resolve the underlying problem, then use this command to allow the components to run again.
- Return type:
- pause()[source]¶
This command pauses a map. The running components of a paused map will keep their resource claims, but will stop actively executing. The map can be un-paused by resuming it (see the
Map.resume()
command).- Return type:
- resume()[source]¶
This command resumes a map (reverses the
Map.pause()
command). The running components of a resumed map will resume execution on their claimed resources.- Return type:
- vacate()[source]¶
This command vacates a map. The running components of a vacated map will give up their claimed resources and become idle again.
Checkpointing maps will still have access to their last checkpoint, and will resume from it as if execution was interrupted for any other reason.
- Return type:
- set_memory(memory)[source]¶
Change the amount of memory (RAM) each map component needs.
Warning
Edits do not affect components that are currently running. To “restart” components so that they see the new attribute value, consider vacating their map (see the vacate command).
- set_disk(disk)[source]¶
Change the amount of disk space each map component needs.
Warning
Edits do not affect components that are currently running. To “restart” components so that they see the new attribute value, consider vacating their map (see the vacate command).
- rerun(components=None)[source]¶
Re-run (part of) the map from scratch. The selected components must be completed or errored.
Any existing output of re-run components is removed; they are re-submitted to the HTCondor queue with their original map options (i.e., without any subsequent edits).
- retag(tag)[source]¶
Give this map a new
tag
. The oldtag
will be available for re-use immediately.Retagging a map makes it not transient. Maps that have never had an explicit tag given to them are transient and can be easily cleaned up via the clean command.
- property stdout: MapStdOut¶
A sequence containing the
stdout
for each map component. You can index into it (with a component index) to get thestdout
for that component, or iterate over the sequence to get all of thestdout
from the map.
- property stderr: MapStdErr¶
A sequence containing the
stderr
for each map component. You can index into it (with a component index) to get thestderr
for that component, or iterate over the sequence to get all of thestderr
from the map.
- property output_files: MapOutputFiles¶
A sequence containing the path to the directory containing the output files for each map component. You can index into it (with a component index) to get the path for that component, or iterate over the sequence to get all of the paths from the map.
- count(value) integer -- return number of occurrences of value ¶
- index(value[, start[, stop]]) integer -- return first index of value. ¶
Raises ValueError if the value is not present.
Supporting start and stop arguments is optional, but recommended.
- class htmap.ComponentStatus(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
An enumeration of the possible statuses that a map component can be in. These are mostly identical to the HTCondor job statuses of the same name.
- UNKNOWN = 'UNKNOWN'¶
- UNMATERIALIZED = 'UNMATERIALIZED'¶
- IDLE = 'IDLE'¶
- RUNNING = 'RUNNING'¶
- REMOVED = 'REMOVED'¶
- COMPLETED = 'COMPLETED'¶
- HELD = 'HELD'¶
- SUSPENDED = 'SUSPENDED'¶
- ERRORED = 'ERRORED'¶
- class htmap.MapStdOut(map)[source]¶
An object that helps implement a map’s sequence over its
stdout
. Don’t both instantiating one yourself: use theMap.stdout
attribute instead.- get(component, timeout=None)¶
Return a string containing the stdout/stderr from a single map component.
- class htmap.MapStdErr(map)[source]¶
An object that helps implement a map’s sequence over its
stderr
. Don’t both instantiating one yourself: use theMap.stderr
attribute instead.- get(component, timeout=None)¶
Return a string containing the stdout/stderr from a single map component.
- class htmap.MapOutputFiles(map)[source]¶
An object that helps implement a map’s sequence over its output file directories. Don’t both instantiating one yourself: use the
Map.output_files
attribute instead.- get(component, timeout=None)[source]¶
Return the
pathlib.Path
to the directory containing the output files for the given component.- Parameters:
- Return type:
- Returns:
path – The path to the directory containing the output files for the given component.
Error Handling¶
Map components can generally encounter two kinds of errors:
An exception occurred inside your function on the execute node.
HTCondor was unable to run the map component for some reason.
The first kind will result in HTMap transporting a htmap.ComponentError
back to you,
which you can access via htmap.Map.get_err()
.
The htmap.ComponentError.report()
method returns a formatted error report for your perusal.
htmap.Map.error_reports()
is a shortcut that returns all of the error reports for all of the components of your map.
If you want to access the error programmatically, you can grab it using htmap.get_err()
.
The second kind of error doesn’t provide as much information.
The method htmap.Map.holds()
will give you a dictionary mapping components to their htmap.ComponentHold
, if they have one.
htmap.Map.hold_report()
will return a formatted table showing any holds in your map.
The hold’s reason
attribute will tell you a lot about what HTCondor doesn’t like about your component.
- class htmap.ComponentError(*, map, component, exception_msg, node_info, python_info, scratch_dir_contents, stack_summary)[source]¶
Represents an error experienced by a map component during remote execution.
- node_info¶
A tuple containing information about the HTCondor execute node the component ran on.
- Type:
- python_info¶
A tuple containing information about the Python installation on the execute node.
- Type:
- scratch_dir_contents¶
A list of paths in the scratch directory on the execute node.
- Type:
List[pathlib.Path]
- stack_summary¶
The Python stack frames at the time of execution, excluding HTMap’s own stack frame.
- Type:
MapOptions¶
Map options are the equivalent of HTCondor’s submit descriptors. All HTCondor submit descriptors are valid map options except those reserved by HTMap for internal use (see below).
Fixed options are the most basic option. The entire map will used the fixed option. If you pass a single string as the value of a map option, it will become a fixed option.
Variadic options are options that are given individually to each component of a map.
For example, each component of a map might need a different amount of memory.
In that case you could pass a list to request_memory
, with the same number of elements as the number of inputs to the map.
Inherited options are given to a htmap.MappedFunction
when it is created.
Any maps made using that function can inherit these options.
Options that are passed in the actual map call override inherited options (excepting fixed_input_files
, see the note).
For example, if you know that a certain function always takes a large amount of memory, you could give it a large request_memory
at the htmap.MappedFunction
level so that you don’t have to do it for every individual map.
Additionally, default map options can be set globally via settings['MAP_OPTIONS.<option_name>'] = <option_value>
.
Warning
Only certain options make sense as inherited options. For example, they shouldn’t be variadic options.
fixed_input_files
has special behavior as an inherited option: they are merged together instead of overridden.
Note
When looking at examples of raw HTCondor submit files, you may see submit descriptors that are prefixed with a +
or a MY.
.
Those options should be passed to htmap.MapOptions
via the custom_options
keyword arguments.
- class htmap.MapOptions(*, fixed_input_files=None, input_files=None, output_remaps=None, custom_options=None, **kwargs)[source]¶
- Parameters:
fixed_input_files (
Union
[PathLike
,TransferPath
,Iterable
[Union
[PathLike
,TransferPath
]],None
]) – A single file, or an iterable of files, to send to all components of the map.input_files (
Union
[Iterable
[Union
[PathLike
,TransferPath
]],Iterable
[Iterable
[Union
[PathLike
,TransferPath
]]],None
]) – An iterable of single files or iterables of files to map over. This may be useful if you want additional files to be sent to each map component, but don’t want them in your mapped function’s arguments.output_remaps (
Union
[Mapping
[str
,TransferPath
],Iterable
[Mapping
[str
,TransferPath
]],None
]) – A dictionary, or an iterable of dictionaries, specifying output transfer remaps. A remapped output file is sent to a specified destination instead of back to the submit machine. If a single dictionary is passed, it will be applied to every map component (in this case, you may want to use the$(component)
submit macro to differentiate them). Each dictionary should be a “mapping” between the names (last path component, as a string) of o utput files and their destinations, given as aTransferPath
. You must still calltransfer_output_files()
on the files for the them to be transferred at all; listing them here only sets up the remapping.custom_options (
Optional
[Dict
[str
,str
]]) – A dictionary of submit descriptors that are not built-in HTCondor descriptors. These are the descriptors that, if you were writing a submit file, would have a leading+
orMY.
. The leading characters are unnecessary here, but can be included if you’d like.kwargs (
Union
[str
,Iterable
[str
]]) – Additional keyword arguments are interpreted as HTCondor submit descriptors. Values that are single strings are used for all components of the map. Providing an iterable for the value will map that option. Certain keywords are reserved for internal use (see the RESERVED_KEYS class attribute).
Notes
Warning
The representation of the values in
fixed_input_files
,input_files
,custom_options
andkwargs
should exactly match the characters in the submit file after the=
.For example, let’s say your job requires this submit file:
# file: job.submit foo = "bar" aaa = xyz bbb = false ccc = 1
The
MapOptions
that express the same submit options would be:>>> options = {"foo": '"bar"', "aaa": "xyz", "bbb": "false", "ccc": "1"} >>> print(options["foo"]) # exactly matches the value in the submit file ... "bar" >>> options["foo"] = "\"bar\"" # alternative value >>> MapOptions(**options)
Submit file values with quotes require escaped quotes in the Python string.
- RESERVED_KEYS = {'+IsHTMapJob', '+component', 'IsHTMapJob', 'MY.IsHTMapJob', 'MY.component', 'arguments', 'component', 'executable', 'jobbatchname', 'log', 'should_transfer_files', 'stderr', 'stdout', 'submit_event_notes', 'transfer_executable', 'transfer_input_files', 'transfer_output_files', 'transfer_output_remaps', 'universe', 'when_to_transfer_output'}¶
- classmethod merge(*others)[source]¶
Merge any number of
MapOptions
together, like acollections.ChainMap
. Options closer to the left take priority. :rtype:MapOptions
Note
fixed_input_files
is a special case, and is merged up the chain instead of being overwritten.requirements
are also combined, in a way where all requirements must be satisfied.
File Transfer¶
- class htmap.TransferPath(path, protocol=None, location=None)[source]¶
A
TransferPath
describes the location of a file or directory. If theprotocol
andlocation
are bothNone
, it describes a location on the local filesystem. If either are given, it describes a remote location.When used as an argument to a mapped function, a
TransferPath
tells HTMap to arrange for the specified files/directories to be transferred to the execute machine from some location, which may be the local filesystem on the submit machine or some remote location like an HTTP address or an S3 server.Transfer paths are recognized in mapped function inputs as long as they are either:
Arguments or keyword arguments of the mapped function.
Stored inside a primitive container (tuple, list, set, dictionary value) that is an argument or keyword argument of the mapped function. Nested containers are inspected recursively.
When the mapped function runs execute-side, it will receive (instead of this object) a normal
pathlib.Path
object pointing to the execute-side path of the file/directory.TransferPath
is also used to specify the locations for output files to be sent, if they are not to be returned to the submit machine. For example, output files could be sent to an S3 server. See theoutput_remaps
argument ofMapOptions
for more details on “remapped” output file transfer.Where appropriate,
TransferPath
has the same interface as apathlib.Path
. See the examples for some ways to leverage this API to efficiently construct transfer paths.Attention
You may need to pass additional submit descriptors to your map to actually be able to use input/output transfers for certain protocols. For example, to transfer to and from an S3 server, you also need to pass
aws_access_key_id_file
andaws_secret_access_key_file
. See the condor_submit documentation for more details.Examples
Transfer a file stored in your home directory using HTCondor file transfer:
transfer_path = htmap.TransferPath.cwd() / 'file.txt'
Transfer a local file at an absolute path using HTCondor file transfer:
transfer_path = htmap.TransferPath("/foo/bar/baz.txt")
Get a file from an HTTP server, located at
http://htmap.readthedocs.io/en/latest/_static/htmap-logo.svg
:transfer_path = htmap.TransferPath( path = "en/latest/_static/htmap-logo.svg", protocol = "http", location = "htmap.readthedocs.io", )
or
base_path = htmap.TransferPath( path = "/", protocol = "http", location = "htmap.readthedocs.io", ) transfer_path = base_path / 'en' / 'latest' / '_static' / 'htmap-logo.svg'
- Parameters:
path (
Union
[TransferPath
,PathLike
]) – The path to the file or directory to transfer.protocol (
Optional
[str
]) – The protocol to perform for the transfer with. If set toNone
(the default), use HTCondor local file transfer.location (
Optional
[str
]) – The location to find a remote file when using a protocol transfer. This could be the address of a server, for example.
- htmap.transfer_output_files(*paths)[source]¶
Informs HTMap about the existence of output files.
Attention
This function is a no-op when executing locally, so you if you’re testing your function it won’t do anything.
Attention
The files will be moved by this function, so they will not be available in their original locations.
Checkpointing¶
- htmap.checkpoint(*paths)[source]¶
Informs HTMap about the existence of checkpoint files. This function should be called every time the checkpoint files are changed, even if they have the same names as before.
Attention
This function is a no-op when executing locally (i.e., not execute-side), so you if you’re testing your function locally it won’t do anything.
Attention
The files will be copied by this function, so try not to make the checkpoint files too large.
Management¶
These functions help you manage your maps.
- htmap.status(maps=None, include_state=True, include_meta=True)[source]¶
Return a formatted table containing information on the given maps.
- Parameters:
maps (
Optional
[Iterable
[Map
]]) – The maps to display information on. IfNone
, displays information on all existing maps.include_state (
bool
) – IfTrue
, include information on the state of the map’s components.include_meta (
bool
) – IfTrue
, include information about the map’s memory usage, disk usage, and runtime.
- Return type:
- Returns:
table – A text table containing information on the given maps.
- htmap.get_tags(pattern=None)[source]¶
Return a tuple containing the
tag
for all existing maps, with optional filtering based on a glob-style pattern.
- htmap.load_maps(pattern=None)[source]¶
Return a
tuple
containing theMap
for all existing maps, with optional filtering based on a glob-style pattern.
- htmap.clean(*, all=False)[source]¶
Clean up transient maps by removing them.
Maps that have never had a tag explicitly set are assigned randomized tags and marked as “transient”. This command removes maps marked transient (and can also remove all maps, not just transient ones, if the –all option is passed).
Programmatic Status Messages¶
These functions are useful for generating machine-readable status information.
- htmap.status_json(maps=None, include_state=True, include_meta=True, compact=False)[source]¶
Return a JSON-formatted string containing information on the given maps.
Disk and memory usage are reported in bytes. Runtimes are reported in seconds.
- Parameters:
maps (
Optional
[Iterable
[Map
]]) – The maps to display information on. IfNone
, displays information on all existing maps.include_state (
bool
) – IfTrue
, include information on the state of the map’s components.include_meta (
bool
) – IfTrue
, include information about the map’s memory usage, disk usage, and runtime.compact (
bool
) – IfTrue
, the JSON will be formatted in the most compact possible representation.
- Return type:
- Returns:
json – A JSON-formatted dictionary containing information on the given maps.
- htmap.status_csv(maps=None, include_state=True, include_meta=True)[source]¶
Return a CSV-formatted string containing information on the given maps.
Disk and memory usage are reported in bytes. Runtimes are reported in seconds.
- Parameters:
maps (
Optional
[Iterable
[Map
]]) – The maps to display information on. IfNone
, displays information on all existing maps.include_state (
bool
) – IfTrue
, include information on the state of the map’s components.include_meta (
bool
) – IfTrue
, include information about the map’s memory usage, disk usage, and runtime.
- Return type:
- Returns:
csv – A CSV-formatted table containing information on the given maps.
Delivery Methods¶
- htmap.register_delivery_method(name, descriptors_func, setup_func=None)[source]¶
Register a new delivery method with HTMap.
- Parameters:
name (
str
) – The name of the delivery method; this is what theDELIVERY_METHOD
should be set to to use this delivery method.descriptors_func (
Callable
[[str
,Path
],dict
]) – The function that provides the HTCondor submit descriptors for this delivery method.setup_func (
Optional
[Callable
[[str
,Path
],None
]]) – The function that does any setup necessary to running the map.
- Return type:
Transplant Installs¶
These functions help you manage your transplant installs.
- class htmap.Transplant(hash: str, path: Path, created: datetime, size: int, packages: Tuple[str, ...])[source]¶
An object that represents metadata information about a transplant install.
Create new instance of Transplant(hash, path, created, size, packages)
- classmethod load(path)[source]¶
- Parameters:
path (
Path
) – The path to the transplant install.- Return type:
- Returns:
transplant – The
Transplant
that represents the transplant install.
Settings¶
HTMap exposes configurable settings through htmap.settings
,
which is an instance of the class htmap.settings.Settings
.
This settings object manages a lookup chain of dictionaries.
The settings object created during startup contains two dictionaries.
The lowest level contains HTMap’s default settings, and the next level up is
constructed from a settings file at ~/.htmaprc
.
If that file does not exist, an empty dictionary is used instead.
The file should be formatted in TOML.
Alternate settings could be stored in other files or constructed at runtime. HTMap provides tools for saving, loading, merging, prepending, and appending settings to each other. Each map is search in order, so earlier settings can flexibly override later settings.
Warning
To entirely replace your settings, do not do
htmap.settings = <new settings object>
.
Instead, use the htmap.settings.Settings.replace()
method.
Replacing the settings by assignment breaks the internal linking between the
settings objects and its dependencies.
Hint
These may be helpful when constructing fresh settings:
HTMap’s base settings are available as
htmap.BASE_SETTINGS
.The settings loaded from
~/.htmaprc
are available ashtmap.USER_SETTINGS
.
- class htmap.settings.Settings(*settings)[source]¶
-
- to_dict()[source]¶
Return a single dictionary with all of the settings in this
Settings
, merged according to the lookup rules.- Return type:
- replace(other)[source]¶
Change the settings of this
Settings
to be the settings from anotherSettings
.- Return type:
- append(other)[source]¶
Add a map to the end of the search (i.e., it will be searched last, and be overridden by anything before it).
- prepend(other)[source]¶
Add a map to the beginning of the search (i.e., it will be searched first, and override anything after it).
- classmethod from_settings(*settings)[source]¶
Construct a new
Settings
from anotherSettings
.- Return type:
Logging¶
HTMap exposes a standard Python logging hierarchy under the logger named 'htmap'
.
Logging configuration can be done by any of the methods described in the documentation.
Here’s an example of how to set up basic console logging:
import logging
import sys
logger = logging.getLogger("htmap")
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler(stream=sys.stdout)
handler.setLevel(logging.DEBUG)
handler.setFormatter(
logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
)
logger.addHandler(handler)
After executing this code, you should be able to see HTMap log messages as you tell it to do things.
Warning
The HTMap logger is not available in the context of the executing map function. Trying to use it will probably raise exceptions.
Exceptions¶
- exception htmap.exceptions.TimeoutError[source]¶
An operation has timed out because it took too long.
- exception htmap.exceptions.NoMapYet[source]¶
The
htmap.MapBuilder
does not have an associatedhtmap.Map
yet because it is still inside thewith
block.
- exception htmap.exceptions.TagAlreadyExists[source]¶
The requested
tag
already exists (recover theMap
, then either use or delete it).
- exception htmap.exceptions.ReservedOptionKeyword[source]¶
The map option keyword you tried to use is reserved by HTMap for internal use.
- exception htmap.exceptions.MisalignedInputData[source]¶
There is some kind of mismatch between the lengths of the function arguments and the variadic map options.
- exception htmap.exceptions.UnknownPythonDeliveryMethod[source]¶
The specified Python delivery method has not been registered.
- exception htmap.exceptions.MapWasRemoved[source]¶
This map has been removed, and can no longer be interacted with.
- exception htmap.exceptions.InvalidOutputStatus[source]¶
The output status of the map component was not recognized.
- exception htmap.exceptions.MapComponentError[source]¶
A map component experienced an error during remote execution.
- exception htmap.exceptions.ExpectedError[source]¶
A map component that contained an OK result was unpacked as if it contained an error.
- exception htmap.exceptions.CannotTransplantPython[source]¶
The Python interpreter you are using cannot be transplanted.
- exception htmap.exceptions.CannotRerunComponents[source]¶
The given components cannot be rerun because they are currently active.
Version¶
CLI Reference¶
HTMap provides a command line tool called htmap
that exposes a subset
of functionality focused around monitoring long-running maps without needing
to run Python yourself.
View the available sub-commands by running:
htmap --help # View available commands
Some useful commands are highlighted in the Tips and Tricks section at Separate Job Submission/Monitoring/Collection.
Here’s the full documentation on all of the available commands:
htmap¶
HTMap command line tools.
htmap [OPTIONS] COMMAND [ARGS]...
Options
- -v, --verbose¶
Show log messages as the CLI runs.
- --version¶
Show the version and exit.
autocompletion¶
Enable autocompletion for HTMap CLI commands and tags in your shell.
This command should only need to be run once.
Note that your Python environment must be available (i.e., running “htmap” must work) by the time the autocompletion-enabling command runs in your shell configuration file.
htmap autocompletion [OPTIONS]
Options
- --shell <shell>¶
Required Which shell to enable autocompletion for.
- Options:
bash | zsh | fish
- --force¶
Append the autocompletion activation command even if it already exists.
- --destination <destination>¶
Append the autocompletion activation command to this file instead of the shell default.
clean¶
Clean up transient maps by removing them.
Maps that have never had a tag explicitly set are assigned randomized tags and marked as “transient”. This command removes maps marked transient (and can also remove all maps, not just transient ones, if the –all option is passed).
htmap clean [OPTIONS]
Options
- --all¶
Remove non-transient maps as well.
components¶
Print out the status of the individual components of a map.
htmap components [OPTIONS] TAG
Options
- --status <status>¶
Print out only components that have this status. Case-insensitive. If not passed, print out the stats of all components (the default).
- Options:
UNKNOWN | UNMATERIALIZED | IDLE | RUNNING | REMOVED | COMPLETED | HELD | SUSPENDED | ERRORED
- --color, --no-color¶
Toggle colorized output (defaults to colorized).
Arguments
- TAG¶
Required argument
edit¶
Edit a map’s attributes (e.g., its memory request).
Edits do not affect components that are currently running. To “restart” components so that they see the new attribute value, consider vacating their map (see the vacate command).
htmap edit [OPTIONS] COMMAND [ARGS]...
disk¶
Set a map’s requested disk.
Edits do not affect components that are currently running. To “restart” components so that they see the new attribute value, consider vacating their map (see the vacate command).
htmap edit disk [OPTIONS] TAG DISK
Options
- --unit <unit>¶
- Options:
KB | MB | GB
Arguments
- TAG¶
Required argument
- DISK¶
Required argument
memory¶
Set a map’s requested memory.
Edits do not affect components that are currently running. To “restart” components so that they see the new attribute value, consider vacating their map (see the vacate command).
htmap edit memory [OPTIONS] TAG MEMORY
Options
- --unit <unit>¶
- Options:
MB | GB
Arguments
- TAG¶
Required argument
- MEMORY¶
Required argument
errors¶
Show execution error reports for map components.
htmap errors [OPTIONS] [TAGS]...
Options
- -p, --pattern <pattern>¶
Act on maps whose tags match glob-style patterns. Pass -p multiple times for multiple patterns.
- --all¶
Act on all maps.
- --limit <limit>¶
The maximum number of error reports to show (0, the default, for no limit).
Arguments
- TAGS¶
Optional argument(s)
hold¶
This command holds a map. The components of the map will not be allowed to run until released (see the release command).
HTCondor may itself hold your map components if it detects that something has gone wrong with them. Resolve the underlying problem, then use the release command to allow the components to run again.
htmap hold [OPTIONS] [TAGS]...
Options
- -p, --pattern <pattern>¶
Act on maps whose tags match glob-style patterns. Pass -p multiple times for multiple patterns.
- --all¶
Act on all maps.
Arguments
- TAGS¶
Optional argument(s)
logs¶
Print the path to HTMap’s current log file.
The log file rotates, so if you need to go further back in time, look at the rotated log files (stored next to the current log file).
htmap logs [OPTIONS]
Options
- --view, --no-view¶
If enabled, display the contents of the current log file instead of its path (defaults to disabled).
path¶
Get paths to parts of HTMap’s data storage for a map.
This command is mostly useful for debugging or interfacing with other tools. The tag argument is a map tag, optionally followed by a colon (:) and a target.
If you have a map tagged “foo”, these commands would give the following paths (command -> path):
htmap path [OPTIONS] TAG
Arguments
- TAG¶
Required argument
pause¶
This command pauses a map. The running components of a paused map will keep their resource claims, but will stop actively executing. The map can be un-paused by resuming it (see the resume command).
htmap pause [OPTIONS] [TAGS]...
Options
- -p, --pattern <pattern>¶
Act on maps whose tags match glob-style patterns. Pass -p multiple times for multiple patterns.
- --all¶
Act on all maps.
Arguments
- TAGS¶
Optional argument(s)
reasons¶
Print the hold reasons for map components.
HTCondor may hold your map components if it detects that something has gone wrong with them. Resolve the underlying problem, then use the release command to allow the components to run again.
htmap reasons [OPTIONS] [TAGS]...
Options
- -p, --pattern <pattern>¶
Act on maps whose tags match glob-style patterns. Pass -p multiple times for multiple patterns.
- --all¶
Act on all maps.
Arguments
- TAGS¶
Optional argument(s)
release¶
This command releases a map, undoing holds. The held components of a released map will become idle again.
HTCondor may itself hold your map components if it detects that something has gone wrong with them. Resolve the underlying problem, then use this command to allow the components to run again.
htmap release [OPTIONS] [TAGS]...
Options
- -p, --pattern <pattern>¶
Act on maps whose tags match glob-style patterns. Pass -p multiple times for multiple patterns.
- --all¶
Act on all maps.
Arguments
- TAGS¶
Optional argument(s)
remove¶
This command removes a map from the Condor queue. Functionally, this command aborts a job.
This function will completely remove a map from the Condor queue regardless of job state (running, executing, waiting, etc). All data associated with a removed map is permanently deleted.
htmap remove [OPTIONS] [TAGS]...
Options
- -p, --pattern <pattern>¶
Act on maps whose tags match glob-style patterns. Pass -p multiple times for multiple patterns.
- --all¶
Act on all maps.
- --force¶
Do not wait for HTCondor to remove the map components before removing local data.
Arguments
- TAGS¶
Optional argument(s)
rerun¶
Rerun (part of) a map from scratch.
The selected components must be completed or errored. See the subcommands of this command group for different ways to specify which components to rerun.
Any existing output of rerun components is removed; they are re-submitted to the HTCondor queue with their original map options (i.e., without any subsequent edits).
htmap rerun [OPTIONS] COMMAND [ARGS]...
components¶
Rerun selected components from a single map.
Any existing output of re-run components is removed; they are re-submitted to the HTCondor queue with their original map options (i.e., without any subsequent edits).
htmap rerun components [OPTIONS] TAG [COMPONENTS]...
Arguments
- TAG¶
Required argument
- COMPONENTS¶
Optional argument(s)
map¶
Rerun all of the components of any number of maps.
Any existing output of re-run components is removed; they are re-submitted to the HTCondor queue with their original map options (i.e., without any subsequent edits).
htmap rerun map [OPTIONS] [TAGS]...
Options
- -p, --pattern <pattern>¶
Act on maps whose tags match glob-style patterns. Pass -p multiple times for multiple patterns.
- --all¶
Act on all maps.
Arguments
- TAGS¶
Optional argument(s)
resume¶
This command resumes a map (reverses the pause command). The running components of a resumed map will resume execution on their claimed resources.
htmap resume [OPTIONS] [TAGS]...
Options
- -p, --pattern <pattern>¶
Act on maps whose tags match glob-style patterns. Pass -p multiple times for multiple patterns.
- --all¶
Act on all maps.
Arguments
- TAGS¶
Optional argument(s)
retag¶
Change the tag of an existing map.
Retagging a map makes it not transient. Maps that have never had an explicit tag given to them are transient and can be easily cleaned up via the clean command.
htmap retag [OPTIONS] TAG NEW
Arguments
- TAG¶
Required argument
- NEW¶
Required argument
set¶
Change a setting in your ~/.htmaprc file.
htmap set [OPTIONS] SETTING VALUE
Arguments
- SETTING¶
Required argument
- VALUE¶
Required argument
settings¶
Print HTMap’s current settings.
By default, this command shows the merger of your user settings from ~/.htmaprc and HTMap’s own default settings. To show only your user settings, pass the –user option.
htmap settings [OPTIONS]
Options
- --user¶
Display only user settings (the contents of ~/.htmaprc).
status¶
Print a status table for all of your maps.
Transient maps are prefixed with a leading “*”.
htmap status [OPTIONS]
Options
- --state, --no-state¶
Toggle display of component states (defaults to enabled).
- --meta, --no-meta¶
Toggle display of map metadata like memory, runtime, etc. (defaults to enabled).
- --format <format>¶
Select output format: plain text, JSON, compact JSON, or CSV (defaults to plain text)
- Options:
text | json | json_compact | csv
- --live, --no-live¶
Toggle live reloading of the status table (defaults to not live).
- --color, --no-color¶
Toggle colorized output (defaults to colorized).
stderr¶
Look at the stderr for a map component.
htmap stderr [OPTIONS] TAG COMPONENT
Options
- --timeout <timeout>¶
How long to wait (in seconds) for the file to be available. If not set (the default), wait forever.
Arguments
- TAG¶
Required argument
- COMPONENT¶
Required argument
stdout¶
Look at the stdout for a map component.
htmap stdout [OPTIONS] TAG COMPONENT
Options
- --timeout <timeout>¶
How long to wait (in seconds) for the file to be available. If not set (the default), wait forever.
Arguments
- TAG¶
Required argument
- COMPONENT¶
Required argument
transplants¶
Manage transplant installs.
htmap transplants [OPTIONS] COMMAND [ARGS]...
info¶
Display information on available transplant installs.
htmap transplants info [OPTIONS]
remove¶
Remove a transplant install by index.
htmap transplants remove [OPTIONS] INDEX
Arguments
- INDEX¶
Required argument
vacate¶
This command vacates a map. The running components of a vacated map will give up their claimed resources and become idle again.
Checkpointing maps will still have access to their last checkpoint, and will resume from it as if execution was interrupted for any other reason.
htmap vacate [OPTIONS] [TAGS]...
Options
- -p, --pattern <pattern>¶
Act on maps whose tags match glob-style patterns. Pass -p multiple times for multiple patterns.
- --all¶
Act on all maps.
Arguments
- TAGS¶
Optional argument(s)
version¶
Print HTMap and HTCondor Python bindings version information.
htmap version [OPTIONS]
wait¶
Wait for maps to complete.
htmap wait [OPTIONS] [TAGS]...
Options
- -p, --pattern <pattern>¶
Act on maps whose tags match glob-style patterns. Pass -p multiple times for multiple patterns.
- --all¶
Act on all maps.
Arguments
- TAGS¶
Optional argument(s)
Settings¶
HTMap’s settings are controlled by a global object which you can access as htmap.settings
.
For more information on how this works, see htmap.settings.Settings
.
Users can provide custom default settings by putting them in a file in their home directory named .htmaprc
.
The file is in TOML format.
HTMap can also read certain settings from the environment. When this is possible, it is noted in the description of the setting.
The precedence order is that runtime settings override .htmaprc
settings, which override environment settings, which override built-in defaults.
HTMap’s settings are organized into groupings based on TOML headers. The settings inside each group are discussed in the following sections.
At runtime, settings can be found via dotted paths that correspond to the section heads. Here, I’ll give the dotted paths - if they’re in the file instead, each dot is a header.
Here is an example .htmaprc
file:
DELIVERY_METHOD = "docker"
[MAP_OPTIONS]
REQUEST_MEMORY = "250MB"
[DOCKER]
IMAGE = "python:latest"
The equivalent runtime Python commands to set those settings would be
import htmap
htmap.settings["DELIVERY_METHOD"] = "docker"
htmap.settings["MAP_OPTIONS.REQUEST_MEMORY"] = "250MB"
htmap.settings["DOCKER.IMAGE"] = "python:latest"
Settings¶
These are the top-level settings. They do not belong to any header.
HTMAP_DIR
- the path to the directory to use as the HTMap directory.
If not given, defaults to ~/.htmap
.
DELIVERY_METHOD
- the name of the delivery method to use.
The different delivery methods are discussed in Dependency Management.
Defaults to docker
.
Inherits the environment variable HTMAP_DELIVERY
.
WAIT_TIME
- how long to wait between polling for component statuses, files existing, etc.
Measured in seconds.
Defaults to 1
(1 second).
CLI
- set to True
automatically when HTMap is being used from the CLI.
Defaults to False
.
MAP_OPTIONS¶
Any settings in this section are passed to every MapOption
as keyword arguments.
HTCONDOR¶
SCHEDULER
- the address of the HTCondor scheduler (see htcondor.Schedd
).
If set to None
, HTMap discovers the local scheduler automatically.
Defaults to None
.
Inherits the environment variable HTMAP_CONDOR_SCHEDULER
.
COLLECTOR
- the address of the HTCondor collector (see htcondor.Collector
).
If set to None
, HTMap discovers the local collector automatically.
Defaults to None
.
Inherits the environment variable HTMAP_CONDOR_COLLECTOR
.
DOCKER¶
These settings control how the docker
delivery method works.
IMAGE
- the path to the Docker image to run components with.
Defaults to 'continuumio/anaconda3:latest'
.
If the environment variable HTMAP_DOCKER_IMAGE
is set, that will be used as the default instead.
SINGULARITY¶
These settings control how the singularity
delivery method works.
IMAGE
- the path to the Singularity image to run components with.
Defaults to 'docker://continuumio/anaconda3:latest'
.
If the environment variable HTMAP_SINGULARITY_IMAGE
is set, that will be used as the default instead.
TRANSPLANT¶
These settings control how the transplant
delivery method works.
DIR
- the path to the directory where the zipped Python install will be cached.
Defaults to a subdirectory of HTMAP_DIR
named transplants
.
ALTERNATE_INPUT_PATH
- a string that will be used in the HTCondor transfer_input_files
option instead of the local file path.
If set to None
, the local path will be used (the default).
This can be used to override the default file transfer mechanism.
ASSUME_EXISTS
- if set to True
, assume that the zipped Python install already exists.
Most likely, you will need to set ALTERNATE_INPUT_PATH
to an existing zipped install.
Defaults to False
.
Dependency Management¶
Dependency management for Python programs is a thorny issue in general, and
running code on computers that you don’t own is even thornier.
HTMap provides several methods for ensuring that the software that your code
depends on is available for your map components.
This could include other Python packages like numpy
or tensorflow
, or
external software like gcc
.
There are two halves of the dependency management game.
The first is on “your” computer, which we call submit-side.
This could be your laptop running a personal HTCondor pool,
or an HTCondor “submit node” that you ssh
to,
or whatever other way you access your HTCondor pool.
The other side is execute-side, which isn’t really a single place:
it is all of the execute nodes in the pool that your map components might run on.
Submit-side dependency management can be handled using standard Python package
management tools.
We recommend using miniconda
as your package manager
(https://docs.conda.io/en/latest/miniconda.html).
HTMap itself requires that execute-side can run a Python script using a Python
install that also has htmap
installed.
That Python installation also needs whatever other packages your code needs to
run.
For example, if you import numpy
in your code, you need to have numpy
installed execute-side.
As mentioned above, HTMap provides several “delivery methods” for getting that Python installation to the execute location. The built-in delivery methods are
docker
- runs in a (possibly user-supplied) Docker container.singularity
- runs in a (possibly user-supplied) Singularity container.shared
- runs with the same Python installation used submit-side.assume
- assumes that the dependencies have already been installed at the execute location.transplant
- copy the submit-side Python installation to the execute location.
More details on each of these methods can be found below.
The default delivery method is docker
, with the default image
htcondor/htmap-exec:<version>
,
where version will match the version of HTMap you are using submit-side.
If your pool can run Docker jobs and your Python code does not depend on any
custom packages
(i.e., you never import any modules that you wrote yourself),
this default behavior will likely work for you without requiring any changes.
See the section below on Docker if this isn’t the case!
Attention
HTMap can transfer inputs and outputs between different minor versions of Python 3, but it can’t magically make features from later Python versions available. For example, if you run Python 3.6 submit-side you can use f-strings in your code. But if you use Python 3.5 execute-side, your code will hit syntax errors because f-strings were not added until Python 3.6. We don’t actually test cross-version transfers though, and we recommend running exactly the same version of Python on submit and execute.
HTMap cannot transfer inputs and outputs between different versions of cloudpickle
.
Ensure that you have the same version of cloudpickle
installed everywhere.
If you see an exception on a component related to cloudpickle.load
, this is the most likely culprit.
Note that you may need to manually upgrade/downgrade your submit-side or execute-side cloudpickle
.
Run Inside a Docker Container¶
In your ~/.htmaprc
file:
DELIVERY_METHOD = "docker"
[DOCKER]
IMAGE = "<repository>/<image>:<tag>"
At runtime:
htmap.settings["DELIVERY_METHOD"] = "docker"
htmap.settings["DOCKER.IMAGE"] = "<repository>/<image>:<tag>"
In this mode, HTMap will run inside a Docker image that you provide.
Remember that this Docker image needs to have the htmap
module installed.
The default Docker image is
htcondor/htmap-exec,
which is based on Python 3 and has many useful packages pre-installed.
If you want to use your own Docker image, just change the 'DOCKER.IMAGE'
setting.
Your Docker image needs to be pushed back to
Docker Hub
(or some other Docker image registry that your HTCondor pool can access)
to be usable.
For example, a very simple Dockerfile that can be used with HTMap is
FROM python:3
RUN pip install --no-cache-dir htmap
This would create a Docker image with the latest versions of Python 3 and
htmap
installed.
From here you could install more Python dependencies, or add more layers to
account for other dependencies.
Attention
More information on building Docker images for use with HTMap can be found in the Docker Image Cookbook.
Warning
This delivery mechanism will only work if your HTCondor pool supports Docker jobs! If it doesn’t, you’ll need to talk to your pool administrators or use a different delivery mechanism.
Run Inside a Singularity Container¶
In your ~/.htmaprc
file:
DELIVERY_METHOD = "singularity"
[SINGULARITY]
IMAGE = "<image>"
At runtime:
htmap.settings["DELIVERY_METHOD"] = "singularity"
htmap.settings["SINGULARITY.IMAGE"] = "<image>"
In this mode, HTMap will run inside a Singularity image that you provide.
Remember that this Singularity image needs to have the cloudpickle
module
installed.
Singularity can also use Docker images.
Specify a Docker Hub image as
htmap.settings['SINGULARITY.IMAGE'] = "docker://<repository>/<image>:<tag>"
to download a Docker image from DockerHub and automatically use it as a
Singularity image.
For consistency with Docker delivery, the default Singularity image is docker://continuumio/anaconda3:latest, which has many useful packages pre-installed.
If you want to use your own Singularity image, just change the
'SINGULARITY.IMAGE'
setting.
Warning
This delivery mechanism will only work if your HTCondor pool supports Singularity jobs! If it doesn’t, you’ll need to talk to your pool administrators or use a different delivery mechanism.
Note
When using this delivery method, HTMap will discover python3
on the
system PATH
and use that to run your code.
Warning
This delivery method relies on the directory /htmap/scratch
either
existing in the Singularity image, or Singularity being able to run
with overlayfs
.
If you get a stderr
message from Singularity about a bind mount
directory not existing, that’s the problem.
Assume Dependencies are Present¶
In your ~/.htmaprc
file:
DELIVERY_METHOD = "assume"
At runtime:
htmap.settings["DELIVERY_METHOD"] = "assume"
In this mode, HTMap assumes that a Python installation with all Python dependencies is already present. This will almost surely require some additional setup by your HTCondor pool’s administrators.
Transplant Existing Python Install¶
In your ~/.htmaprc
file:
DELIVERY_METHOD = "transplant"
At runtime:
htmap.settings["DELIVERY_METHOD"] = "transplant"
If you are running HTMap from a standalone Python install
(like an Anaconda installation),
you can use this delivery mechanism to transfer a copy of your entire Python
install.
All locally-installed packages (including pip -e
“editable” installs) will
be available.
For advanced transplant functionality, see TRANSPLANT.
Note
The first time you run a map after installing/removing packages, you will need to wait while HTMap re-zips your installation. Subsequent maps will use the cached version.
HTMap uses pip
to check whether the cached Python is current, so make
sure that pip
is installed in your Python.
Warning
This mechanism does not work with system Python installations (which you shouldn’t be using anyway!).
Note
When using the transplant method the transplanted Python installation will be used to run the component, regardless of any other Python installations that might exist execute-side.
Version History¶
v0.6.1¶
This version is a drop-in replacement for v0.6.0, except that it relaxes the version requirements for several dependencies to accommodate upcoming changes to the pip dependency resolver.
Known Issues¶
HTMap does not currently allow “directory content transfers”, which is an HTCondor feature where naming a directory in
transfer_input_files
with a trailing slash transfers the contents of the directory, not the directory itself. (If you try it, the directory itself will be transferred, as if you had not used a trailing slash). Issue: #215Execution errors that result in the job being terminated but no output being produced are still not handled entirely gracefully. Right now, the component state will just show as
ERRORED
, but there won’t be an actual error report.Map component state may become corrupted when a map is manually vacated. Force-removal may be needed to clean up maps if HTCondor and HTMap disagree about the state of their components. Issue: #129
v0.6.0¶
The big new features in this release are:
Improved support for input and output file transfer (inputs/outputs can come from/be sent to remote locations, i.e., not the submit machine).
A new delivery method,
shared
, where HTMap will use the same Python executable detected submit-side when executing (this supports HTCondor pools that use shared filesystems to make a Python installation universally available).
New Features/Improvements¶
Add the
shared
delivery method, which supports HTCondor pools that use shared filesystems to make Python installations available universally. Suggested by Duncan Macleod. Issues/PRs: #195, #198, #200HTMap now supports getting input files from remote destinations (i.e., not from the submit machine) via existing input file auto-discovery. Just use the revised
TransferPath
in your mapped function arguments, and HTMap will arrange for the file to be transferred to your map component! PR: #216HTMap now supports sending output files to destinations that are not the submit machine via HTCondor’s
transfer_output_remaps
mechanism. Output files can be sent to various locations, such as an S3 service. See the newoutput_remaps
argument ofMapOptions
and the revisedTransferPath
, as well as the new tutorial Transferring Output to Other Places for more details on how to use this feature. PR: #216Massive documentation upgrades courtesy of Scott Sievert! Issues/PRs: #208, #191, #202, #221
The HTMap CLI (normally accessed by running
htmap
) can now also be accessed by runningpython -m htmap
. Issue: #190The HTMap CLI now supports autocompletion on commands and tags. Run
htmap autocompletion
from the command line to add the necessary setup to your shell startup script.The HTMap CLI
logs
command now has a--view
option which, instead of just printing the path to the HTMap log file, displays its contents.
Changed/Deprecated Features¶
htmap.Map.exists
has replacedhtmap.Map.is_removed
. It has exactly the opposite semantics (it is onlyTrue
if the map has not been successfully removed). PR: #221htmap.ComponentStatus
is now a subclass ofstr
, so (for example)"COMPLETED"
can be used in place ofhtmap.ComponentStatus.COMPLETED
.Item access (
[]
) onMap.stdout
,Map.stderr
, andMap.output_files
is now non-blocking and will raiseFileNotFound
exceptions if accessed before available. The blocking API (with a timeout) is still available via theget
method.The HTMap CLI
version
command now also prints HTCondor Python bindings version information. Addedhtmap --version
that only prints HTMap version information.Several HTMap CLI commands now support explicit enable/disable flags instead of just one or the other. The default behaviors were not changed.
The name of the function used to register delivery methods changed to
register_delivery_method()
(fromregister_delivery_mechanism
).
Bug Fixes¶
HTMap is now less sensitive to job event logs becoming corrupted.
Type hints are now more correct on more functions (but not fully correct on all functions, bear with us!).
Known Issues¶
HTMap does not currently allow “directory content transfers”, which is an HTCondor feature where naming a directory in
transfer_input_files
with a trailing slash transfers the contents of the directory, not the directory itself. (If you try it, the directory itself will be transferred, as if you had not used a trailing slash). Issue: #215Execution errors that result in the job being terminated but no output being produced are still not handled entirely gracefully. Right now, the component state will just show as
ERRORED
, but there won’t be an actual error report.Map component state may become corrupted when a map is manually vacated. Force-removal may be needed to clean up maps if HTCondor and HTMap disagree about the state of their components. Issue: #129
v0.5.1¶
New Features¶
Deprecated Features¶
Bug Fixes¶
Maps can now be force-removed even if the schedd cannot be contacted. Graceful removal still requires contact with the schedd. Issue: https://github.com/htcondor/htmap/issues/186
Known Issues¶
Execution errors that result in the job being terminated but no output being produced are still not handled entirely gracefully. Right now, the component state will just show as
ERRORED
, but there won’t be an actual error report.Map component state may become corrupted when a map is manually vacated. Force-removal may be needed to clean up maps if HTCondor and HTMap disagree about the state of their components. Issue: https://github.com/htcondor/htmap/issues/129
v0.5.0¶
New Features¶
HTMap CLI commands that operate on tags can now pattern-match for tags using glob syntax. Try adding
-p "<pattern>"
to commands likehtmap remove
orhtmap release
! Issue: https://github.com/htcondor/htmap/issues/159Component status tracking is now preserved between sessions, so it won’t be performed from scratch every time. This will only work if the HTCondor Python bindings version is 8.9.3 or greater. You can upgrade your bindings version roughly-independently of HTMap by running
pip install --upgrade htcondor
. Issue: https://github.com/htcondor/htmap/issues/166htmap.Map
,htmap.MapStdOut
,htmap.MapStdErr
, andhtmap.MapOutputFiles
now all support in thein
operator to check if a component index is in the map.
Deprecated Features¶
The various iteration methods on
htmap.Map
no longer have acallback
argument.
Bug Fixes¶
It should now be much harder to accidentally get a dangling, inaccessible map due to an interrupted
remove
. Issue: https://github.com/htcondor/htmap/issues/127When an execution errors occurs, the exception and traceback will be printed to stderr execute-side (in addition to being brought back submit-side). This should make some debugging patterns work as expected. Issue: https://github.com/htcondor/htmap/issues/178
The CLI command
htmap status --live
now has much better behavior when the table width is nearly the width of the terminal. It should now never wrap unless the table is actually wider than the terminal, instead of a few characters before the actual width.HTMap now handles late materialized jobs much more smoothly: maps with unmaterialized components can be removed, and various CLI commands that output color won’t fail when acting on maps with unmaterialized components. However, unmaterialized components do not show as IDLE, which mirrors the behavior of condor_q. This does make it hard to know how many components are in a late-materialized map at a glance; we are thinking about how to address this. Issue: https://github.com/htcondor/htmap/issues/158
Known Issues¶
Execution errors that result in the job being terminated but no output being produced are still not handled entirely gracefully. Right now, the component state will just show as
ERRORED
, but there won’t be an actual error report.Map component state may become corrupted when a map is manually vacated. Force-removal may be needed to clean up maps if HTCondor and HTMap disagree about the state of their components. Issue: https://github.com/htcondor/htmap/issues/129
v0.4.4¶
New Features¶
Bug Fixes¶
In execution error reports, local variables with very long string forms are now cut down to a smaller size.
Known Issues¶
Execution errors that result in the job being terminated but no output being produced are still not handled entirely gracefully. Right now, the component state will just show as
ERRORED
, but there won’t be an actual error report.Map component state may become corrupted when a map is manually vacated. Force-removal may be needed to clean up maps if HTCondor and HTMap disagree about the state of their components. Issue: https://github.com/htcondor/htmap/issues/129
v0.4.3¶
New Features¶
Bug Fixes¶
CLI stdout and stderr commands were broken, but are now fixed.
Add the missing parts of the /.singularity.d directory that will make v0.4.2 Singularity support actually work.
Known Issues¶
Execution errors that result in the job being terminated but no output being produced are still not handled entirely gracefully. Right now, the component state will just show as
ERRORED
, but there won’t be an actual error report.Map component state may become corrupted when a map is manually vacated. Force-removal may be needed to clean up maps if HTCondor and HTMap disagree about the state of their components. Issue: https://github.com/htcondor/htmap/issues/129
v0.4.2¶
New Features¶
Bug Fixes¶
Map.errors and Map.error_reports() now work when there is a mix of holds and errors in the map. Previously, held components would cause both of these to raise MapComponentHeld when trying to access them in that situation. Issue: https://github.com/htcondor/htmap/issues/165
Requirements statement merging was broken when any of the three sources of requirements (settings, function-level map options, and individual-map map options) were not given. Requirements from all source are now properly merged, regardless of whether any of them actually exist. Issue: https://github.com/htcondor/htmap/issues/168
Top-level settings that were dictionaries (like
MAP_OPTIONS
) did not behave correctly when elements of them were set; they did not inherit the old settings. These kinds of settings are now properly inherited, but expect breaking changes in the Settings API next release to resolve the underlying issues. Issue: https://github.com/htcondor/htmap/issues/169The
htmap-exec
Docker image should now cleanly export to Singularity. Issue: https://github.com/htcondor/htmap/issues/173
Known Issues¶
Execution errors that result in the job being terminated but no output being produced are still not handled entirely gracefully. Right now, the component state will just show as
ERRORED
, but there won’t be an actual error report.Map component state may become corrupted when a map is manually vacated. Force-removal may be needed to clean up maps if HTCondor and HTMap disagree about the state of their components. Issue: https://github.com/htcondor/htmap/issues/129
v0.4.1¶
New Features¶
Bug Fixes¶
Fixed a bug where maps submitted with late materialization would choke on the “cluster submit” event when reading their event log. Band-aided for now.
Known Issues¶
Execution errors that result in the job being terminated but no output being produced are still not handled entirely gracefully. Right now, the component state will just show as
ERRORED
, but there won’t be an actual error report.Map component state may become corrupted when a map is manually vacated. Force-removal may be needed to clean up maps if HTCondor and HTMap disagree about the state of their components. Issue: https://github.com/htcondor/htmap/issues/129
v0.4.0¶
New Features¶
HTMap can now transfer output files! See the new recipe: Output Files and the new
htmap.transfer_output_files()
function.HTMap’s default Docker image is now
htcondor/htmap-exec
, which is produced from a Dockerfile in the HTMapgit
repository. It is based oncontinuumio/anaconda3
, withhtmap
itself installed as well. Issue: https://github.com/htcondor/htmap/issues/152Redid
htmap.Map
stdout
andstderr
. They are now attributes that represent sequences over thestdout
andstderr
from the map components, as strings, respectively.Acts and Edits on Maps that are not “active” (i.e., have components in the HTCondor queue) are now no-ops. Includes a new
htmap.Map.is_active
property, which isTrue
if any components are still in the queue. Issue: https://github.com/htcondor/htmap/issues/145
Bug Fixes¶
Known Issues¶
Execution errors that result in the job being terminated but no output being produced are still not handled entirely gracefully. Right now, the component state will just show as
ERRORED
, but there won’t be an actual error report.Map component state may become corrupted when a map is manually vacated. Force-removal may be needed to clean up maps if HTCondor and HTMap disagree about the state of their components. Issue: https://github.com/htcondor/htmap/issues/129
v0.3.2¶
New Features¶
Bug Fixes¶
Hopefully finally resolved a recurring issue with checkpoint directories being returned to the submit node after execution errors. Issue: https://github.com/htcondor/htmap/issues/128
htmap.Map.error_reports()
can now get error reports while part of a map is still running.
Known Issues¶
Execution errors that result in the job being terminated but no output being produced are still not handled entirely gracefully. Right now, the component state will just show as
ERRORED
, but there won’t be an actual error report.Map component state may become corrupted when a map is manually vacated. Force-removal may be needed to clean up maps if HTCondor and HTMap disagree about the state of their components. Issue: https://github.com/htcondor/htmap/issues/129
v0.3.1¶
New Features¶
Bug Fixes¶
Live status display will no longer explode if you remove a map out from under it. Issue: https://github.com/htcondor/htmap/issues/144
Fix new
htmap rerun
command.
Known Issues¶
Execution errors that result in the job being terminated but no output being produced are still not handled entirely gracefully. Right now, the component state will just show as
ERRORED
, but there won’t be an actual error report.Map component state may become corrupted when a map is manually vacated. Force-removal may be needed to clean up maps if HTCondor and HTMap disagree about the state of their components. Issue: https://github.com/htcondor/htmap/issues/129
v0.3.0¶
New Features¶
Revised internals on how error information is returned from execute nodes. HTMap now detects runtime errors during component status checks (without too much overhead).
Add
singularity
delivery method. More revisions needed to use best practices, but it works. Expect major changes in the future…Add
htmap components
CLI command, which can print out individual component statuses for a map. For example,htmap components <tag>
will print out all of the components for a map and their statuses.htmap components --status ERRORED <tag>
will print out only the components whose status is ERRORED.Some execution errors (especially the kind that result in output not being produced) are now turned into holds by using the submit descriptor
ON_EXIT_HOLD
.Reworked CLI
rerun
command. It now has separate sub-commands for rerunning entire maps or only certain components.
Bug Fixes¶
Known Issues¶
Execution errors that result in the job being terminated but no output being produced are still not handled entirely gracefully. Right now, the component state will just show as
ERRORED
, but there won’t be an actual error report.Map component state may become corrupted when a map is manually vacated. Force-removal may be needed to clean up maps if HTCondor and HTMap disagree about the state of their components. Issue: https://github.com/htcondor/htmap/issues/129
Contributing and Developing¶
HTMap is open to contributions!
Please feel free to submit a
Pull Request on GitHub.
If you contribute to HTMap, please add your name to the
CONTRIBUTORS
file in the repository root (if you want to be listed).
- Development Environment
How to set up an environment for development and testing.
- HTMap Innards
How HTMap does what it does.
- How to Release a New HTMap Version
How to release a new version of HTMap.
HTMap Innards¶
Overview¶
HTMap turns Python functions into HTCondor jobs. There are two levels of wrapping that it does: the function call, with its inputs and outputs (including file transfer) and any possible errors, are implicitly wrapped, but most other HTCondor features, like resource requests and custom submit descriptors, are presented more directly (though still with a Python-oriented interface).
The distinction between these two levels was chosen to provide the maximum amount of “do the expected thing” with the Python parts of running the job while allowing maximum flexibility for the HTCondor parts of the running job. There is no hard line, and different parts of this have moved back and forth over the line during development (namely file transfer, but at one point resource requests were also treated specially).
Guiding Principles¶
The only identifying piece of information about a map a user should ever need is a
tag
.Users should never have to directly interact with the filesystem to look at any information about their map.
We should store as little state as possible in memory. Recalculating state of anything but the very largest maps is very fast.
Any state we do store should be duplicated on disk immediately. It should be possible to resubmit (any part of) a map based only on information stored on disk.
Moving Things Around¶
HTMap relies on cloudpickle to move data back and forth the submit node and execute nodes. It pickles the Python function that the user provides as well as all of the input, then turns around and submits an HTCondor job cluster using HTCondor’s Python bindings. Instead of directly running user scripts, HTMap uses a script that it controls as the HTCondor executable. It hands the user back an object that can be used to look at the output of the function as well as control the execution of the underlying cluster jobs.
The run
Subdirectory¶
For basic functionality, HTMap itself does not need to be installed on the
execute node where jobs it creates run.
This offers the advantage of being using to use Docker images that only contain
cloudpickle
(which is many, because it’s installed as part of the Anaconda
distribution) without modification.
Currently, if you want to use checkpointing or output file transfer,
you must also install HTMap execute-side.
In practice, we expect people to install HTMap in their execute image, and all
of the instructions in the docs say to do so.
To accomplish this decoupling, HTMap uses a Python script as its HTCondor executable that
has no dependencies except the Python standard library and cloudpickle
.
This script is stored inside the library at htmap/run/run.py
.
The transplant delivery method wraps this script with
htmap/run/run_with_transplant.sh
, a bash
script that handles
unpacking the transplanted install.
A similar script exists for Singularity.
It is critical that the run.py
script make all possible efforts to exit
without an error. If the script itself generates an error, it tends to become
very difficult for users to understand what went wrong. For example,
we used to import cloudpickle
in the bag of imports at the top of the script.
If cloudpickle
wasn’t present in the execute image, the script would
immediately bail out and HTMap wouldn’t understand why; the user would have
to inspect the stderr
of the map component (which also wasn’t directly
supported at the time) to understand what went wrong.
Data Model¶
Each map is tied to a map directory, which is named by a UUID.
The map directories are stored in a subdirectory of the HTMap directory.
The HTMap directory is located according to settings['HTMAP_DIR']
(default ~/.htmap
).
The human-readable name of each map is its tag. Tags are stored in a different subdirectory of the HTMap directory, which acts a file-based map between tags and the names of the map directories. Each tag file’s name is that map’s tag, and the file’s contents are the name of the map directory.
All input, output, and HTCondor metadata (event logs, for example) for a map is stored in its map directory. A single input/output pair is a component, and the components of a map are just referred to by their index in the input iterable.
Serializing and Deserializing Data¶
HTMap uses a wide variety of data serialization formats, depending on what
needs to be stored.
The names of the directories and files can be found in htmap/names.py
.
They are all stored inside the map’s directory.
The itemdata for each map is stored as a JSON-formatted list of
dictionaries.
The itemdata is used to call htcondor.Submit.queue_with_itemdata()
during
map creation.
The submit object for each map is stored as a JSON-formatted dictionary.
The number of components is stored as a single string-ified integer in the file.
The cluster IDs of each HTCondor cluster job associated with the map are stored as newline-separated plain-text strings.
The event log for each HTCondor cluster job is routed to a file inside the map directory.
For generic data, like the inputs and outputs of mapped functions,
HTMap uses cloudpickle
.
The individual inputs and outputs for each component are stored in files named
by the component index.
The functions that handle storing and loading these various formats are in
the htmap.htio
submodule. All IO should go through methods defined in that
submodule, with the idea that if it becomes necessary to swap out some of the
internal implementations of those methods, the changes will be isolated to
that module.
- htmap.htio.save_object(obj, path)[source]¶
Serialize a Python object (including “objects”, like functions) to a file at the given
path
.- Return type:
- htmap.htio.load_object(path)[source]¶
Deserialize an object from the file at the given
path
.- Return type:
- htmap.htio.load_objects(path)[source]¶
Deserialize a stream of objects from the file at the given
path
.
- htmap.htio.save_func(map_dir, func)[source]¶
Save the mapped function to the map directory.
- Return type:
- htmap.htio.save_inputs(map_dir, args_and_kwargs)[source]¶
Save the arguments to the mapped function to the map’s input directory.
- Return type:
- htmap.htio.save_num_components(map_dir, num_components)[source]¶
Save the number of components in a map.
- Return type:
- htmap.htio.load_num_components(map_dir)[source]¶
Load the number of components in a map.
- Return type:
- htmap.htio.save_submit(map_dir, submit)[source]¶
Save a dictionary that represents the map’s
htcondor.Submit
object.- Return type:
- htmap.htio.load_submit(map_dir)[source]¶
Load an
htcondor.Submit
object that was saved usingsave_submit()
.- Return type:
Submit
- htmap.htio.save_itemdata(map_dir, itemdata)[source]¶
Save the map’s itemdata as a list of JSON dictionaries.
- Return type:
- htmap.htio.load_itemdata(map_dir)[source]¶
Load itemdata that was saved using
save_itemdata()
.
Development Environment¶
Repository Setup¶
You can get HTMap’s source code by cloning the git repository:
git clone https://github.com/htcondor/htmap
.
If you are planning on submitting a pull request, you should instead
clone your own
fork
of the repository.
After cloning the repository,
install the development dependencies using your Python package manager.
If you are using pip
, you would run
pip install -e .[tests,docs]
from the repository root.
The dependencies (development and otherwise) are listed in setup.cfg
.
Warning
The HTCondor Python bindings are currently only available via PyPI on Linux.
On Windows you must install HTCondor itself to get them.
On Mac, you’re out of luck.
Install pre-commit
manually, then use the development container to run
the test suite/build the documentation.
One of the dependencies you just installed is pre-commit
. pre-commit
runs a series of checks whenever you try to commit. You should “install” the
pre-commit hooks by running pre-commit install
in the repository root.
You can run the checks manually at any time by running pre-commit
.
Do not commit to the repository before running pre-commit install
!
Development Container¶
HTMap’s test suite relies on a properly set-up environment.
The simplest way to get that environment is to use the Dockerfile in
docker/Dockerfile
to produce a development container.
The repository includes a bash script named dr
(docker run)
in the repository root that will let you quickly build and execute commands
in a development container.
Attention
The dr
script bind-mounts your local copy of the repository into the
container. Any edits you make outside the container will be reflected
inside (and vice versa).
Anything you pass to dr
will be executed inside the container.
By default (i.e., if you pass nothing) you will get a bash
shell.
The initial working directory is the htmap
repository inside the container.
Running the Test Suite¶
The best way to run the test suite is to run pytest
inside the
development container:
$ ./dr
# ...
mapper@161b6af91d72:~/htmap$ pytest
The test suite can be executed in parallel by passing the -n
option.
pytest -n 4
seems to be a good number for laptops, while desktops can
probably handle -n 10
.
See pytest-xdist for more details
on parallel execution.
The test suite is very slow when run serially; we highly recommend running
with a large number of workers.
See the pytest docs
or run pytest --help
for more information on
pytest itself.
Building the Docs¶
HTMap’s documentation is served by Read the Docs, which builds the docs as well. The docs are deployed automatically on each commit to master, so they can be updated independently of a version release for minor adjustments.
It can be helpful to build the docs locally during development.
We recommend using sphinx-autobuild
to serve the documentation via a local webserver
and automatically rebuild the documentation when changes are made to the
package source code or the documentation itself.
To run the small wrapper script we have written around sphinx-autobuild
,
from inside or outside the development container run,
$ ./dr
# ...
mapper@161b6af91d72:~/htmap$ docs/autobuild.sh
NOTE: CONNECT TO http://127.0.0.1:8000 NOT WHAT SPHINX-AUTOBUILD TELLS YOU
# trimmed; visit URL above
Note the startup message: ignore the link that sphinx-autobuild
gives you,
and instead go to http://127.0.0.1:8000 to see the built documentation.
Binder Integration¶
HTMap’s tutorials can be served via Binder.
The tutorials are run inside a specialized Docker container
(not the development container).
To test whether the Binder container is working properly, run the
binder/run.sh
script from the repository root
(i.e., not from inside the development container):
$ ./binder/run.sh
It will give you a link to enter into your web browser that will land you in the same Jupyter environment you would get on Binder.
The binder/edit.sh
script will do the same, but also bind-mount the
tutorials into the container so that they can be edited in the Jupyter environment.
When preparing a release, run binder/exec.sh
and commit the results into
the repository.
How to Release a New HTMap Version¶
To release a new version of HTMap:
Run
binder/exec.sh
, check that they executed correctly by loading them up in a Jupyter session, and commit the resulting executed tutorial notebooks into the repository.Make sure that the version PR actually bumps the version in
setup.cfg
.Merge the version PR into
master
via GitHub.Make a GitHub release from https://github.com/htcondor/htmap/releases, based on master. Name it exactly
vX.Y.Z
, and link to the release notes for that version (like https://htmap.readthedocs.io/en/latest/versions/vX_Y_Z.html ) in the description (the page will not actually exist yet).Delete anything in the
dist/
directory in your copy of the repository.On your machine, make sure
master
is up-to-date, then runpython3 setup.py sdist bdist_wheel
to create the source distribution and the wheel.Install Twine:
pip install twine
.Upload to PyPI:
python3 -m twine upload dist/*
. You will be prompted for your PyPI login.
HTMap’s default Docker image is defined by the docker/
directory in this
repository. It is built automatically by Docker Hub, see
the builds page.
The Binder-served tutorials also use an image built by Docker Hub:
see here,
and are defined by the binder/
directory in this repository.