The second-easiest way is to run the tutorials in a Docker container on your computer. Run
docker run -p 8888:8888 htcondor/htmap-tutorials
and follow the instructions it gives you to get into the Jupyter environment. Then go to
tutorials/first-steps.ipynb in the file browser and open it to get back to this point.
Alternatively, you might want to immediately start running HTMap on your HTCondor pool. This tutorial assumes that you’ve already installed HTMap on your HTCondor pool’s submit node, or have access to HTMap through a JupyterHub server connected to an HTCondor pool or similar. See How do I install HTMap? for details!
This tutorial also assumes that you’re working in a Jupyter Notebook. It will work just as well in the Python REPL. Later, once you get a hang things, you’ll be ready to use HTMap in scripts as well. Either way, you’ll need to be on a computer that can submit jobs to an HTCondor pool.
This tutorial assumes that you have already set up your dependency management, as described in Dependency Management. If your HTCondor pool supports Docker, you’ll be good to go with the default settings.
The tutorials in this series are written inside Juypter Notebooks. If you click the “View page source” link in the upper right corner, you’ll be able to grab the raw
.ipynb file yourself and step through it along with the tutorial.
Suppose you’ve been given the task of writing a function that doubles numbers, like this:
def double(x): return 2 * x
If you want to double a list of numbers, you might do something like
doubled = [double(x) for x in range(10)] print(doubled)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
or we can use the built-in function
map(), which applies a function to each element of an iterable (like a list):
mapped = map(double, range(10)) print(mapped) doubled = list(mapped) print(doubled)
<map object at 0x7f7ae8393390> [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
In both cases,
doubled is the list
[0, 2, 4, ...]. The reason we need the
list call is that
map actually returns an iterator over the results, not the results themselves. So you need to iterate over it to get the output, which is what
list does: iterate over its input and put the elements in a list.
Now suppose that, for some reason, you want to double a lot of numbers. So many numbers that you can’t bear to do all the work on your own computer. It takes days to multiply all the numbers, and if your program crashes halfway through, you lose all of of your progress and have to start over. You’re losing sleep, and your boss is breathing down your neck because they need those numbers doubled now.
Luckily, you remember that you have access to an HTCondor high-throughput computing pool. Since each of your function calls is isolated from all the others, the computers in the pool don’t need to talk to each other at all, and you can achieve a huge speedup. The pool can run your code on hundreds or thousands of computers simultaneously, storing the inputs and outputs for you and recovering from individual errors gracefully. It’s the perfect solution.
The problem is: how do you get your code running in the pool?
With HTMap, it’s like this:
import htmap mapped = htmap.map(double, range(10)) print(mapped) doubled = list(mapped) print(doubled)
Created map super-busy-dog with 10 components Map(tag = super-busy-dog) [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
It may take some time for the second
In the next tutorial we’ll start digging into the extra features that HTMap provides on top of this basic functionality.