The most common pushback I get on the Throughput Reliability Index is fair: “How do I know this isn’t just another black-box index dressed up in math?”
That skepticism is healthy. Plants have been burned by indices, scorecards, and “maturity scores” that look rigorous on a slide and fall apart the second a supervisor asks how the number was made. If TRI is going to sit between MES data and operating decisions, it needs to be legible. You should be able to read this post and walk away knowing — in plain English — what TRI measures, why it’s built that way, and what it can and can’t tell you.
This is the methodology, stepped out.
The shape of the formula
TRI keeps the same three-factor structure plant people already know from OEE:
OEE = Performance × Availability × Quality
Three factors, multiplied. If any one collapses, the score collapses. That structure is the thing OEE got right. TRI inherits it.
TRI = TF × RF × DF
Three factors, multiplied. Each one answers a different operating question. Each one is computed from the shift-level OEE history your MES already records — no new sensors, no new software, no instrumented plant.
Here is what each factor is doing.
TF — the Throughput Factor
The question: How much are you producing versus how much you said you would?
What it does: TF is the actual output divided by the target output, averaged over a recent trailing window of shifts. It is capped at 1.0.
The cap matters. Without it, a line that overproduces on a few good shifts could mask weeks of underperformance and inflate the score. TRI is not a score for hero shifts. It is a score for whether you can plan around the line. If you ran 110% of target on Tuesday and 60% Wednesday through Sunday, your throughput factor reflects the shortfall, not the spike.
TF is the foundation. It is the dimension OEE already captured. The next two factors are what OEE was never designed to measure.
RF — the Reliability Factor
The question: Can you trust that throughput number shift after shift?
This is the heart of TRI. It is also where most of the skepticism around “the average looks fine” comes from.
RF is built from the coefficient of variation of shift-level OEE — the standard deviation divided by the mean. CV is dimensionless. A line averaging 31% with CV 0.12 is far more consistent than a line averaging 31% with CV 0.25, even though both look the same on a weekly summary.
TRI takes that CV and runs it through a smooth decay curve. Low variance maps to a high reliability score. High variance maps to a low one. The relationship is monotonic — more variance always reduces RF, never increases it — and continuous, so a line trending from CV 0.18 to CV 0.22 sees its RF drift gradually rather than tripping a threshold.
What that translates to operationally:
| CV | Approximate RF | What it means |
|---|---|---|
| 0.10 | ~0.82 | Best-in-class consistency. Safe to plan around. |
| 0.20 | ~0.67 | Moderate noise. Planning buffers needed. |
| 0.30 | ~0.55 | High variance. Investigation warranted. |
| 0.50 | ~0.37 | Near-random output at shift level. |
| 0.70+ | <0.25 | Chaotic. Immediate intervention required. |
The decay constant that controls this curve is calibrated against real plant data, not picked from a textbook. I am not publishing the exact value because publishing it invites people to game the curve. But the shape — smooth, monotonic, dimensionless — is the part that matters for trusting the number.
DF — the Direction Factor
The question: Is the line getting better, holding, or getting worse?
A snapshot is not a story. Two lines can both report 65% OEE today. One has been climbing from 55% over six weeks. The other has been falling from 75%. They are not the same line. They do not deserve the same intervention.
DF is built from the slope of the trend over a longer trailing window of shifts — longer than RF’s window, because direction needs more history to stabilize. The slope gets normalized by the line’s mean OEE, which makes the factor comparable across lines running at very different absolute levels.
One important asymmetry: declines are penalized harder than equivalent improvements are rewarded.
That is not arbitrary. It reflects something every plant manager already knows: decline tends to be self-reinforcing. A line that drops 3% one week, if not intervened, usually drops more the next. Improvement tends to revert. A 3% gain often retraces. The asymmetric weighting bakes that operational reality into the score so DF is harder to fool with one good week.
DF is also bounded. It can pull the score down meaningfully, but it cannot inflate the score above the throughput and reliability layers. You cannot trend your way out of a fundamentally underperforming line.
Why multiplicative, not additive
This is the question I get from analytics people most often.
Add the three factors and you let strengths compensate for weaknesses. A line with great throughput, terrible reliability, and a flat trend would average out to “okay.” That is exactly the line that destroys schedules and erodes trust in the system.
Multiply them and any one collapsing collapses the whole score. A line cannot be reliable, improving, and producing nothing. It cannot be high throughput, deteriorating, and still “fine.” The multiplicative form forces the score to behave the way an operating leader actually thinks.
What the score routes to: four operating states
Once you have TRI, every line falls into one of four states. The state — not the score — drives the action.
| State | What it means | What you do |
|---|---|---|
| Strong & Dependable | Near target, low variance, flat or improving. | Plan around it. Document what keeps it here. |
| Strong but Unstable | Good average, high shift-to-shift swings. | Decompose the variance. Stabilize before scaling. |
| Weak but Improving | Below target, but trend is positive and consistent. | Support the trajectory. Reinforce what changed. |
| Weak & Deteriorating | Below target, high or rising variance, downward trend. | Escalate. Root cause analysis. Containment. |
A worked example: same OEE, different lines
This is from a real four-line, 83-day analysis at a food manufacturing facility. Hourly OEE data, aggregated to shift-level. Line names anonymized. Numbers are real.
| Line | OEE | CV | TF | RF | DF | TRI |
|---|---|---|---|---|---|---|
| Bravo | 31.7% | 0.248 | 0.488 | 0.609 | 1.007 | 0.299 |
| Charlie | 31.1% | 0.118 | 0.479 | 0.790 | 1.005 | 0.380 |
Read the OEE column on its own and these lines are interchangeable. Both reporting around 31%. Standard Monday meeting: “Bravo and Charlie are running about the same.”
Now read the rest of the row.
Charlie has a CV of 0.118. That is best-in-class consistency. When Charlie says 31%, it means 31% — tight band, predictable, plannable. Its RF lands at 0.790.
Bravo has a CV of 0.248. More than double. Its 31.7% average hides weeks where it ran 43.9% and weeks where it ran 11.3%. Its RF lands at 0.609 — about 23% lower than Charlie’s on the reliability dimension alone.
Both have a flat direction (DF near 1.0), so the trend isn’t the story here. The throughput factors are nearly identical. The only meaningful difference between these two lines is reliability — and it is the difference between a line a planner can commit orders against and a line that quietly generates overtime, expediting, and missed promises.
TRI: 0.299 versus 0.380. Roughly 27% apart. Same OEE, completely different operating decisions.
Every plant manager I have shown this to says the same thing: “I’d rather plan around Charlie.” They already knew it. TRI just gave them a number that matched the intuition — and a way to explain it to a CFO who isn’t in the Monday meeting.
What TRI doesn’t tell you
This is the part most indices skip. Worth saying clearly.
- TRI does not tell you the cause. A low RF says variance lives in this line. It doesn’t tell you whether that variance is in the equipment, the crew, the schedule, or the material. The variance decomposition layer handles that — and that’s a separate computation against the same dataset.
- TRI does not replace OEE, Lean, TPM, or your CMMS. It is a routing layer. It tells those systems where to point first.
- TRI does not work on insufficient history. If you don’t have at least a few weeks of consistent shift-level data, the score reports as provisional and does not escalate. A short window can’t produce a trustworthy reliability or direction read, and pretending otherwise would defeat the point.
- TRI is not an operator scorecard. The factors are line-level. Pushing them down to individual operators is a category error and will get you bad data fast.
Why this is hard to game
The other reasonable skepticism: “If supervisors know how this is calculated, won’t they game it?”
Two reasons that’s harder than it sounds.
First, TRI runs against the same MES data your MES has always recorded. If a reason code distribution suddenly shifts without a corresponding operational change, or a line’s variance collapses overnight without an obvious process improvement, those patterns are detectable. The system flags them and suspends interpretation rather than reporting a clean score. Anti-gaming checks like this run continuously in the background. I’m deliberately not publishing the specific patterns the system watches for — that would be the recipe for evading them.
Second, the multiplicative structure punishes narrow optimization. You cannot push one factor up without watching the others. A supervisor who games throughput by skipping changeovers will see reliability and direction collapse in the trailing windows. The score, by design, is harder to lie to than any one underlying number.
Third, every output carries a state label — valid, provisional, suspect, or context-flagged — and downstream actions are gated by that label. A score from a line under “data suspect” cannot be used to justify a decision until the underlying data is reconciled.
What this means for trusting the number
The point of stepping the methodology out like this is not that you should be able to recompute TRI on your own from this post. You shouldn’t, and that’s on purpose — the calibrated parameters, the variance decomposition layer, the burden adjustment, the change-point and confidence layers underneath are not in this post.
The point is that you should be able to read the methodology and decide whether it’s the kind of system you’re willing to put between your MES data and a Monday morning decision. Three factors. A specific question for each one. A multiplicative form that doesn’t let strengths cover for weaknesses. A worked example with real numbers where the score did the thing every plant manager already wanted it to do.
If that case is plausible to you, the next step is short.
Send 90 days of shift-level OEE data from one line. The 10-day assessment returns TRI baselines, variance decomposition, financial exposure, and a prioritized intervention plan. If it doesn’t surface at least one decision your current reporting missed, you pay nothing.