What a Single Confirmed-Malicious Threat Intel Hit Should Do to Your Risk Score

A common design pattern in risk scoring is purely additive: every signal contributes some number of points, capped, and the points add up to a final score. It’s simple to reason about and easy to tune — until a single very strong signal shows up, and the additive model quietly under-reacts to it.

Here’s a concrete version of the problem we ran into: an IP address confirmed by ThreatFox as active Cobalt Strike command-and-control infrastructure, at 100% confidence, contacting an API endpoint. Under a purely additive model, that ThreatFox match contributed a capped +30 points. Combined with a low-risk user-agent signal, the event landed at 33 out of 100 — solidly in “LOW” severity, comfortably under the threshold for deeper automated review.

That’s the wrong outcome. A single, independent, high-confidence confirmation that an IP is known malware infrastructure shouldn’t compete on equal footing with a dozen soft probabilistic signals that each nudge the score up a little. It should dominate the score, because the evidentiary weight isn’t comparable — “this user-agent is unusual” and “this exact IP is a confirmed Cobalt Strike C2 node” are not the same category of evidence, even though an additive model treats them as fungible points.

The fix was a floor rule, not a bigger additive weight: any single source independently confirming “this is malicious” — a ThreatFox IOC match, an OTX malicious flag, a non-RIOT GreyNoise malicious classification, AbuseIPDB confidence at or above 75%, or a malicious sandbox verdict — floors the event’s score to 75, regardless of what the additive components alone produced. The same Cobalt Strike event that scored 33 before now scores 75, clears the threshold for full AI-assisted analysis, and gets treated with the urgency a confirmed C2 contact actually deserves.

The general lesson generalizes past this one rule: additive scoring is a reasonable default for combining many weak-to-moderate signals, but it systematically under-weights rare, high-confidence, independent confirmations. If your scoring model has any signal that’s supposed to mean “we are sure,” it needs a floor, not just a point value.

What a Single Confirmed-Malicious Threat Intel Hit Should Do to Your Risk Score

See this detection run on a real report