I previously wrote a post about Reactively talking to Cloud Foundry with Groovy. In this post, I want to discuss something of keen interest: tuning reactor flows.
When you use Project Reactor to build an application, is the style a bit new? Just trying to keep your head above water? Perhaps you haven’t even thought about performance. Well at some point you will. Because something big will happen. Like a 20,000 req/hour rate limit getting dropped on your head.
Yup. My development system mysteriously stopped working two weeks ago. I spotted some message about “rate limit exceeded” and rang up one of my friends in the Ops department to discover my app was making 43,000 req/hour. Yikes!
As I poured over the code (big thanks to the Ops team giving me a spreadsheet showing the biggest-to-smallest calls), I started to spot patterns that seemed like things I had seen before.
Reactor tuning a lot like SQL tuning
Long long ago, I learned SQL. As the saying goes, SQL isn’t rocket science. But understanding what is REALLY happening is the difference between a query taking twenty minutes vs. sub-second time to run.
So let’s back up and refresh things. In SQL, when you join two tables, it produces a cartesian product. Essentially, a table with n rows + a table with m rows, will produce a table with n x m rows, combining every possible pair. From there, you slim it down based on either relationships or based on filtering the data. What DBMS engines have had decades is learning is how to read your query and figure out the BEST order to do all these operations. For example, many queries will apply filtering BEFORE building the cartesian product.
In Reactor, when you generate a flux of data and then flatmap it to another flux, you’re doing the same thing. My reactor flow, meant to cache up a list of apps for Spinnaker, would scan a list of eighty existing apps and then perform a domain lookup…eighty times! Funny thing is, they were looking up the same domain EIGHTY TIMES! (SQL engines have caching…Reactor doesn’t…yet).
So ringing up my most experienced Reactor geek, he told me that it’s more performant to simply fetch all the domains in one call, first, and THEN do the flatmap against this in memory data structure.
Indexing vs. full table scans
When I learned how to do EXPLAIN PLANs in SQL, I was ecstatic. That tool showed me exactly what was happening in what order. And I would be SHOCKED at how many of my queries performed full table scans. FYI: they’re expensive. Sometimes it’s the right thing to do, but often it isn’t. Usually, searching every book in the library is NOT as effective as looking in the card catalog.
So I yanked the code that did a flatmap way at the end of my flow. Instead, I looked up ALL domains in a CF space up front and passed along this little nugget of data hop-to-hop. Then when it came time to deploy this knowledge, I just flatmapped against this collection of in memory of data. Gone were all those individual calls to find each domain.
.map(function((org, app, environments) -> Mono.when(
This code block, done right after fetching application details, pauses to getAllDomains(). Since it should only be done once, we only need one instance from our passed along data structure. The collection is gathered, wrapped up in a nice Mono, and passed along with the original apps. Optionally, if there are no domains, an empty is passed along.
(NOTE: Pay it no mind that after all this tweaking, the Ops guy pointed out that routes were ALREADY included in the original application details call, hence eliminating the need for this. The lesson on fetching a whole collection up front can be useful.)
To filter or not to filter, that is the question
Filtering is an art form. Simply put, a filter is a function to reduce rows. Being a part of both Java 8’s Stream API as well as Reactor’s Flux API, it’s pretty well known.
The thing is to watch out for if the filter operation is expensive and if it’s inside a tight loop.
Loop? Reactor flows don’t use loops, right? Actually, that’s what flatmaps really are. When you flatmap something, you are embedding a loop to go over every incoming entry and possibly generating a totally different collection. If this internal operation inside the flapmap involves a filter that makes an expensive call, you might be repeating that call too many times.
I used to gather application details and THEN apply a filter to find out whether or not this was a Spinnaker application vs. someone else’s non-Spinnaker app in the same space. Turns out, finding all those details was expensive. So I moved the filter inward so that it would be applied BEFORE looking up the expensive details.
Look at the following code from getApplications(client, space, apps):
return requestApplications(cloudFoundryClient, apps, spaceId)
applicationResource.getEntity().getEnvironmentJsons() != null &&
.map(resource -> Tuples.of(cloudFoundryClient, resource))
.switchIfEmpty(t -> ExceptionUtils.illegalArgument("Applications %s do not exist", apps));
The code above is right AFTER fetching application information, but BEFORE going to related tables to find things such as usage, statistics, etc. That way, we only go for the ones we need.
Sometimes it’s better to fetch all the data, fetch all the potential filter criteria, and merge the two together. It requires a little more handling to gather this together, but again this is what we must do to tailor such flows.
Individual vs. collective fetching
Something I discovered was that several of the Cloud Foundry APIs have an “IN” clause. This means you can feed it a collection of values to look up. Up until that point, I was flatmapping my way into these queries, meaning that for each application name in my flux, it was making a separate REST call for one.
Peeking at the lower level APIs, I spotted where I could give it a list of application ids vs. a single one. To do that, I had to write my flow. Again. By putting together a collection of ids, by NOT flatmapping against them (which would unpack them), but instead using collectList, I was able to fetch the next hop of data in one REST call (not eight), shown below:
.requestClientV2Resources(page -> client.spaces()
cf-java-client has an handy utility to wrap paged result sets, iterating and gathering the results…reactively. Wrapped inside is the gold: client.spaces().listApplications(). There is a higher level API, the operations API, but it’s focus is replicating the CF CLI experience. The CF CLI isn’t built to do bulk operations, but instead operate on one application at a time.
While nice, it doesn’t scale. At some point, it can a be a jump to move to the lower level APIs, but the payoff is HUGE. Anyhoo, by altering this invocation to pass in a list of application names, and following all the mods up the stack, I was able to collapse eighty calls into one. (Well, two, since the page size is fifty).
You reap what you sow
By spending about two weeks working on this, I was able to replace a polling cycle that perform over seven hundred REST calls with less than fifty. That’s basically a 95% reduction in network traffic, and nicely put my app in the safe zone for the newly imposed rate limit.
I remember the Ops guy peeking at the new state of things and commenting, “I’m having a hard time spotting a polling cycle” to which the lead for Cloud Foundry Java Client replied, “sounds like a good thing.”
Yes it was. A VERY good thing.