Skeema Logo
Skeema
Sign inGet started →
Analysis & intelligence

Load simulation & latency percentiles

Skeema models how your architecture behaves as traffic grows — from 100 to 100,000 concurrent users — and reports the numbers engineers actually use to reason about performance. This page explains those numbers from first principles.

The two questions performance answers

Every performance discussion comes down to two measurements:

Latency
How long a single request takes, end to end. Measured in milliseconds (ms). Lower is better.
Throughput
How many requests the system handles per unit time — requests per second (RPS) or queries per second (QPS). Higher is better.

They’re related but distinct: a system can have low latency at low load and still collapse under high throughput. Skeema simulates both — it ramps throughput (the user tier) and reports the resulting latency.

Why averages lie — and percentiles don’t

The average (mean) latency hides the experience of your slowest users. If 99 requests take 50 ms and one takes 5,000 ms, the average is ~100 ms — which no single user experienced. Percentiles describe the distribution instead.

PercentileReads asMeaning
P50MedianHalf of requests are faster than this. The “typical” experience.
P9090th percentile9 in 10 requests are faster than this.
P9595th percentileCommon SLO target; only 1 in 20 requests is slower.
P9999th percentileTail latency. 1 in 100 requests is slower — your worst real experiences.
Read P99 as “P-ninety-nine”
“P99 = 1.7s” means 99% of requests finish in under 1.7 seconds, and the slowest 1% take longer. At scale, that 1% is a lot of people: at 10,000 requests/second, P99 describes 100 requests every second.

Why the tail (P99) matters more than you’d think

A single user action often fans out into many backend calls. If one page makes 20 service calls and each has a 1% chance of being slow, the probability that at least one is slow is about 1 − 0.99²⁰ ≈ 18%. So a “1% tail” at the service level becomes a ~1-in-5 slow page at the user level. This is why teams set SLOs on P95/P99, not averages.

How a bottleneck forms

Every component has a capacity — a ceiling on requests per second. As load climbs toward that ceiling, requests start to queue, and queueing time dominates latency (this is the practical lesson of queueing theory: wait time rises sharply as utilization approaches 100%). The first component to saturate is the bottleneck — and the weakest link sets the throughput of the whole path. Skeema names it explicitly.

Synchronous chains add up
Latency along a synchronous path is the sum of each hop. One slow dependency in a chain of blocking calls drags the entire request down — which is why moving non-critical work to an async queue is such a common fix.

How Skeema simulates

The simulation is a transparent heuristic, not a load test:

  • Each node has a base latency by type (e.g. cache ≈ 2 ms, Postgres ≈ 18 ms, external API ≈ 250 ms).
  • A load multiplier scales latency as the user tier rises and utilization climbs.
  • Skeema sums latency along each critical (synchronous) path and adds P95/P99 variance.
  • The node carrying the largest share of load is flagged as the bottleneck with a root cause.
Directional, not absolute
These figures are for comparing architectures and finding weak links — not production SLAs. Real numbers depend on hardware, code, and data. Use the simulation to decide “is adding a read replica worth it?”, then validate with a real load test.

The architecture score (A–F)

Alongside latency, Skeema grades the design across four dimensions and rolls them into a single A–F score:

DimensionWhat it rewards
ReliabilityRedundancy, replication, no single points of failure
ScalabilityLoad balancing, caching, async decoupling, horizontal scaling
ObservabilityMonitoring, logging, and tracing components
SecurityAuth, gateways, and isolation boundaries

For each issue, Skeema proposes a concrete fix — “add a load balancer”, “add a read replica”, “move email to a queue” — and can apply it to the diagram, anchoring the new nodes next to the ones they relate to.

Key takeaways
  • Latency = how long one request takes; throughput = how many you handle per second.
  • Use percentiles, not averages. P95/P99 (the tail) is what users actually feel at scale.
  • Bottlenecks form when load nears a component’s capacity and requests queue — the weakest link caps the path.
  • Skeema’s simulation is directional: great for comparing designs and finding weak links, not a substitute for a real load test.
Try it yourself

Generate a full system from one prompt — free, no card required.

Open the live demo →