This tutorial is a first step towards deploying GroIMP on a compute cluster. This at the moment is more of a prove of concept than a established way of using GroIMP.
The idea is to have several multiple pods running the GroIMP API server GroLink and one pod or job that is scheduling tasks to the GroLink pods. This one pod/job could later be a web service or a deployment job, but for this tutorial its going to be just a small python terminal-Pod we can connect to and run our script by hand. This script than uses python multiprocessing to send API calls to the different GroLink pods in parallel and than collecting the results to a csv file.
The Python script will create a small data set for a sensitivity analysis of a very simple model and execute the model.
To run a Model we need a model, but since this is not about modelling or statistics or analysis this model can be very very simple. It grows a tree with branches of a first order and print the crown radius after every step. The radius is hereby depending on the angle and the length vactor (lenV), which are the values we are going to test in our analysis.
The idea of the sensitivity analysis using GroLink is explained in this tutorial.
//model.rgg import parameters.*; module Bud(int order,float len) extends Sphere(0.1); protected void init () [ Axiom ==> Bud(0,1); ] public void run () [ Bud(0,len) ==> F (len) [RL(parameters.angle) F(len*parameters.lenV) Bud(1,len*parameters.lenV/2)] RH(90)[RL(parameters.angle) F(len*parameters.lenV) Bud(1,len*parameters.lenV/2)] RH(90)[RL(parameters.angle) F(len*parameters.lenV) Bud(1,len*parameters.lenV/2)] RH(90)[RL(parameters.angle) F(len*parameters.lenV) Bud(1,len*parameters.lenV/2)] Bud(0,len*parameters.lenV); Bud(1,len) ==> F (len) Bud(1,len*parameters.lenV); {getRadius();} ] public void getRadius(){ println(max(location((*Bud*)).x)+0.2); }
// param/parameters.rgg static float lenV=0.9; static float angle=45;
To do the following a Kubernetes cluster and installed version of kubectl is required. I would suggest for the beginning to use a virtual cluster like minikube so if something goes wrong its not that bad.
We start by creating the namespace “grolinktutorial” on the cluster with the following command:
kubectl create namespace grolinktutorial
The namespace defines which nodes we want to address or what roles are used. In our case the only additional role we need is that a pod can see other pods (so our terminal node can find the GroLink nodes). We can just use the role named “system:node”.
Therefore we need a ClusterRoleBinding that defines who has this role, in our case we just bind it to default because this is the simplest. The following should be stored in rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: podreaderbinding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:node subjects: - kind: ServiceAccount name: default namespace: grolinktutorial
Now we can apply them using kubectl in a command line:
kubectl apply -f role.yaml kubectl apply -f rolebinding.yaml
For the Grolink pods we can create a deployment that uses the GroIMP default docker image from GitLab and executes the GroLINK api on it with -a api. For the beginning we only create 3 replicas since we are working with a simulated cluster. An we define a matchLabel so we can later figgure out which pods are actually running the API.
apiVersion: apps/v1 kind: Deployment metadata: name: groimp-grolink namespace: grolinktutorial labels: app: grolink spec: replicas: 3 selector: matchLabels: app: grolink template: metadata: labels: app: grolink spec: containers: - name: grolink image: registry.gitlab.com/grogra/groimp:latest args: ["-a", "api"]
This can be executed for a file similar to the role and the Binding:
kubectl apply -f grolinkDeploy.yaml
For our terminal pod we are not very picky we just need a pod that runs for ever and can execute python code. Therefore we can just use a python image with some dependencies installed and let it run the embedded web server(we don't need this server at all, but if the node is not busy is dies).
apiVersion: apps/v1 kind: Deployment metadata: name: terminal namespace: grolinktutorial labels: app: terminal spec: replicas: 1 selector: matchLabels: app: terminal template: metadata: labels: app: terminal spec: containers: - name: terminal image: registry.gitlab.com/grogra/groimp-models/sensi1/gropysensbase:latest args: ["python","-m", "http.server"]
We need to also deply this:
kubectl apply -f terminalDeploy.yaml
To test if all pods are running as we want them to we can just list all of them for our name space:
kubectl get pods --namespace grolinktutorial
Now we can start for the first time running code on our little setup. To test if every thing is working lets open a workbench on each pod, list the files in it and close it. To find all the Pods we use the python kr8s library on our namespace with a selector that checks if the app of this node is GgroLink. Then we can use the IP addresses of the pods with the GroPy library.
from GroPy import GroPy import kr8s podIPs=[] selector = {'app': 'grolink'} for podS in kr8s.get("pods", namespace="grolinktutorial", label_selector=selector): print(podS.status.podIP) podIPs.append(podS.status.podIP) for i in podIPs: link = GroPy.GroLink("http://"+i+":58081/api/") wb = link.createWB().run().read() print(wb.listFiles().run().read()) wb.close()
To run this on the terminal pod we have to first get the name of the terminal pod using the command
kubectl get pods --namespace grolinktutorial
In this list one name should start with “terminal-”, this is the one we want.
Now we can use kubectl to copy the python file with the code from above on this pod:
kubectl -n grolinktutorial cp run.py terminal-<second-part-of-the-name>:/app
and run it with
kubectl -n grolinktutorial exec terminal-<second-part-of-the-name> python run.py
Now with the existing connection we can run our model for the first time. Todo so we first need to copy our model to the terminal pod:
kubectl -n grolinktutorial cp model.gsz terminal-XXX:/app
And then we can open this project using the GroPy library, it is important to open it with the content of the gsz and not with the link to it because the API server runs on another system.
Then we can update and execute our file as we know it from other API examples:
from GroPy import GroPy import kr8s podIPs=[] selector = {'app': 'grolink'} for podS in kr8s.get("pods", namespace="grolinktutorial", label_selector=selector): podIPs.append(podS.status.podIP) #create link to the first pod link = GroPy.GroLink("http://"+podIPs[0]+":58081/api/") #open the workbench with a POST request wb = link.openWB(content=open("model.gsz",'rb').read()).run().read() # change the parameters fo the simulation wb.updateFile("param/parameters.rgg",bytes(""" static float lenV=4; static float angle=12; """,'utf-8')).run() wb.compile().run() #execute the run function data = wb.runRGGFunction("run").run().read() print(data) #close the workbench wb.close().run()
If we then update our run.py function on the terminal pod and execute it we can get a result of: {'console': ['0.2'], 'logs': []}
To now use the potential of the cluster, we need to send requests to all the pods in parallel. This can be done from our one terminal pod using the python multiprocessing library.
Using this library we can initialize a pool of “workers” that are each linked to one API server, depending on the number of workers several can be linked to one server, since the API server is multi-threaded.
With this pool of workers we can than work through a list of parameter sets and push each set in a simulation and collect the results in a file.
To generate the input data we can use the saltelli.sample function from salib. This creates us a distribution of input datasets in the defined range:
problem = { 'num_vars': 2, 'names': ['lenV', 'angle'], 'bounds': [[0.1, 1],[30, 70]] } param_values = saltelli.sample(problem, 2**4) # create parameter set
2**4
describes the number of input sets, we will define this very low by now (16) because we work on a simulated cluster and don't want to cause trouble.
First we create a list with links to all the API servers:
links=[] selector = {'app': 'grolink'} for podS in kr8s.get("pods", namespace="grolinktutorial", label_selector=selector): links.append(GroPy.GroLink("http://"+podS.status.podIP+":58081/api/"))
Then we can use this list to create a queue long enough to “feed” all “workers” with the links.
WORKERCOUNT =9 pods = multiprocessing.Queue() n = len(links) for i in range(0,WORKERCOUNT): pods.put(links[i%n])
This queue is required so that the workers can be initialized in parallel using the following function:
#initialize each worker def init_worker(function,queue ): function.cursor = queue.get().openWB(content=open("model.gsz",'rb').read()).run().read()
The function.cursor will then later be defined for each worker, by emptying the given queue.
The actual growth function is no much different to the one we used above to test our model for the first time. Only that we can use the variable grow.cursor as a workbench because we know already that it will be initialized in that way. And we only get one tuple as an input parameter from the ASlib function, so we split it in the first line:
# the actual execution def grow(val): lenV, angle = val results = [] #overwrite the parameters in the file grow.cursor.updateFile("param/parameters.rgg",bytes(""" static float lenV="""+str(lenV)+"""; static float angle="""+str(angle)+"""; """,'utf-8')).run() grow.cursor.compile().run() for x in range(0,10): #execute 10 times data=grow.cursor.runRGGFunction("run").run().read() results.append(float(data['console'][0])) return results
In the final step we initialize a multiprocessing pool using the init_worker function, with the grow function and pods queue as parameters and map this pool on the generated input values.
Finally we can transfrom and save our result in an csv file.
# Multi processing pool = multiprocessing.Pool(processes=WORKERCOUNT,initializer=init_worker, initargs=(grow,pods,)) results = pool.map(grow,param_values) pool.close() y = np.array(results) # save result np.savetxt("result.csv", y, delimiter=",")
After we put this all together and run it as we did above, we can read our csv file through our terminal pod:
kubectl -n grolinktutorial exec terminal-XXXX cat result.csv
For simplicity you can find the last python code here in one file:
import numpy as np from SALib.sample import saltelli from GroPy import GroPy import multiprocessing import kr8s from kr8s.objects import Pod WORKERCOUNT =9 # defining the problem problem = { 'num_vars': 2, 'names': ['lenV', 'angle'], 'bounds': [[0.1, 1],[30, 70]] } param_values = saltelli.sample(problem, 2**2) # create parameter set #creating a link for each pod links=[] selector = {'app': 'grolink'} for podS in kr8s.get("pods", namespace="grolinktutorial", label_selector=selector): print("x"+podS.status.podIP) links.append(GroPy.GroLink("http://"+podS.status.podIP+":58081/api/")) # create an queue to assign pods to workers pods = multiprocessing.Queue() n = len(links) for i in range(0,WORKERCOUNT): pods.put(links[i%n]) #initialize each worker def init_worker(function,pods ): function.cursor = pods.get().openWB(content=open("model.gsz",'rb').read()).run().read() # the actual execution def grow(val): lenV, angle = val results = [] #overwrite the parameters in the file grow.cursor.updateFile("param/parameters.rgg",bytes(""" static float lenV="""+str(lenV)+"""; static float angle="""+str(angle)+"""; """,'utf-8')).run() grow.cursor.compile().run() for x in range(0,10): #execute 10 times data=grow.cursor.runRGGFunction("run").run().read() results.append(float(data['console'][0])) return results # Multi processing pool = multiprocessing.Pool(processes=WORKERCOUNT,initializer=init_worker, initargs=(grow,pods,)) results = pool.map(grow,param_values) pool.close() y = np.array(results) # save result np.savetxt("result.csv", y, delimiter=",")