Auth0 Home Blog Docs

Batching task execution with a custom compiler


#1

A client of mine would like to ingest device generated event data into our platform. I’d like to be able to use Extend for normalizing the data I receive into an output that can be processed by my post task execution scripts. Since this device generated data has a fairly high throughput (up to 10 req/s), I want to batch the normalization process so I can use 1 webtask execution to normalize an array of events.

My ideal implementation would be for my end user to not worry about iterating over the array and just focus on writing a task that implements a normalization function for a single event. My ingestion service would queue events, execute the webtask once with an array of events after a certain time or event count, and receive an array of outputs as a response.

My initial thinking is that this would be a problem best solved by a custom compiler, however I’m struggling with how best to execute the webtask script in the compiler. Should I be using an eval?


#2

@alexjmathews Yes compilers are ideally suited to address this problem. When implementing a compiler, you can compile the user script into a function by using the options.nodejsCompiler function provided in the options passed to the compiler itself. For an example check out https://github.com/tjanczuk/wtc/blob/master/class_compiler.js#L6. This helper function provides an easy way for a custom compiler to interpret user script as a Node.js module that exports a single function. In fact, this is the default way in which Extend compiles JavaScript code when no custom compiler is used.

Using a custom compiler you can assume the user specified function provides processing logic for a single entry in the array, and implement the logic that iterates over the array and consolidates results in the compiler itself. That logic would invoke user function for every element of the array.


#3

Great! I’ve got an example working with a custom compiler that will run a batch process if it detects a particular key in a JSON request body.

I’m statically serving a javascript file with my compiler and specifying it in meta. If I want to use a npm package in my compiler, would I be forced to bundle my compiler and its dependencies into a single file? I’m unsure how to guarantee that the correct package and version is available to my compiler if I plan on preinitializing webtasks with this compiler.


#4

Hey Alex!

We have a new way to write compilers as middleware which are composable. Docs on this are in-progress, but you can see an example here. The advantage is this type of compiler can be plugged in with different middleware.


#5

Regarding modules dependencies you can package up a compiler as a node module which is on npm and specify the dependencies in the package.json of that module. When the task references the compiler module all its dependencies will be pulled in.


#6

Regarding the example, I think I may be a little confused as to what’s going on in this file, specifically regarding the events variable; however, I’ll stay tuned for the middleware documentation for the next version of our compiler.


#7

I tried bundling which worked successfully for my left-pad demo, but when i tried to bundle with async I hit the compiler size limit! haha. I’ll stick with your recommendation to use an npm dependency as a compiler, but I’m probably going to run through a bunch of versions before I get a working iteration! I’ll try to come up with some local tests for me to develop a compiler with.


#8

For developing a local compiler, one easy path is you can actually use Webtask itself. If you see Bobby’s post he shows you how: https://goextend.io/blog/securing-webtasks-part-2-using-middleware. You can also just use a gist. Both are possible because a compiler can be a URL rather than a module.


#9

Regarding the confusion / docs, understood. @jedimaster is working on updating the docs as we speak.


#10

Hey @alexjmathews, One non-obvious cause of the size limit error is the bundler attempting to bundle your dependencies. This is caused when the package.json containing your dependencies are in a different directory than you webtask.js code.

For example, if you have a project folder that contains your package.json in its root and then all of your code is located in say a src folder, when you attempt to wt create ./src/webtask.js --bundle the package.json file is ignored by the CLI and all your dependencies are bundled into a single file.

If you relocate the package.json into the same directory as the webtask, the CLI will actually read the dependencies from it and add them as metadata instead of attempting to bundle them.

The intention of the --bundle flag is to only bundle user code, not dependencies.

This is an example of the typical project structure that I use: https://github.com/NotMyself/webtask-slackin

Note that the package.json and task.js are in the root directory together.


#11

Hello Alex, I’ve got a sample for you. It is rough - and brittle - but it seems to work well. My plan is to polish this up a bit later in the month for a blog post, but I wanted to get something helpful to you now.

In my example, the webtask is processing a lead. This is the logic I wrote in my web task:

module.exports = async function(lead) {

  if(lead.value && lead.value > 100) lead.vip = true;
  
  return lead;
    
};

Notice I’ve changed the form here and expect just one object (lead) passed in. I then wrote middleware such that it would expect an array of values passed in, execute the task, collect the results, and return the array. Again, this is brittle.

const rawBody = require('raw-body');

/*
This is passed an array of leads. Expects webtask to handle each one.
*/

module.exports = () => {

	return (req, res, next) => {
		let handler = req.webtaskContext.compiler.script;
		req.webtaskContext.compiler.nodejsCompiler(handler, (e, func) => {
			if(e) return next(e);

			rawBody(req, {encoding:'utf-8'}, (err, body) => {

				let data = JSON.parse(body);

				let promises = [];
				data.forEach(d => {
					promises.push(func(d));
				});

				Promise.all(promises)
				.then(results => {
					//console.log(results);
					return res.end(JSON.stringify(results));
				});
			
			});

		});
	}

}

In my testing, I was able to POST an array of values like so:

[
  {"name":"ray","value":0},
  {"name":"ray2","value":900},
  {"name":"cam","value":9000}
]

and got back:

[
  {
    "name": "ray",
    "value": 0
  },
  {
    "name": "ray2",
    "value": 900,
    "vip": true
  },
  {
    "name": "cam",
    "value": 9000,
    "vip": true
  }
]

I used Surge to publish my middleware. If you want to recreate this test, you can use this URL, but I can’t promise it will be there forever: http://miniature-writing.surge.sh/test.js

Let me know what you think. As I said, I plan on cleaning this up and making a nicer blog post out of it, as well as updating our core docs on the topic. I hope this helps!


#12

I forgot to share, but here is the blog post I wrote on this: https://goextend.io/blog/batch-processing-with-compilers