Since we are somewhat sensitive to long latencies, we'll set a timeout of 100 (ms). Based on data we've collected about each task, we'll model each task latency as a log-normal distribution, with σ = 1 and μ = 0. That gives us about 10ms @ 99%.
The question here: what does the distribution of latencies for the whole system look like? The function d samples from a task's latencies.
After some simulation, we get
The system has 24ms @ 99% latency. Our system has 2.4x the latency of one of its local tasks.
How does concurrency affect latency distributions? With R, the simulation is a little slow. Here are the results for 1-10 concurrent calls.
The work is quadratic, so we'd like to go much faster. Let's try out Julia.
It took just a few minutes to implement the simulation in Julia.
Our Julia-based simulation runs at least 10x faster. That'll allow us to look at a lot more data. Here's a plot in R of data simulated in Julia.
As the Central Limit Theorem tells us, the function is logarithmic:
Residuals: Min 1Q Median 3Q Max -3.2829 -1.0077 -0.1158 0.7845 6.1596 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.8672 0.5922 8.219 8.63e-13 *** log(d0$X) 7.5954 0.1578 48.133 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.457 on 98 degrees of freedom Multiple R-squared: 0.9594, Adjusted R-squared: 0.959 F-statistic: 2317 on 1 and 98 DF, p-value: < 2.2e-16
With Julia's performance, our simulations can be more than 10 times richer, which allows us to get verifiable results for even moderately complex systems.
This just in: the Julia simulation for up to 500 concurrent tasks.
Julia is very fast!