Why many CISOs are handling the signal-to-noise ratio incorrectly [12:26]

New podcast hosted by Edgile expert Brad Smith, discussing the customer-focused aspects of modern threat protection and how the Microsoft suite of security tools—such as Sentinel SIEM—can be used in different ways to lower enterprise risks.

Cybersecurity has always been–and likely will always be–an incredibly fast-moving arena, where the behaviors of the attackers and the best defense tactics of enterprises are constantly changing. Sometimes, defense mechanisms that were absolutely appropriate just a year ago suddenly are undermining defenses rather than strengthening them. To a certain extent, the signal-to-noise-ratio approach is a good example of where change is needed.

The issue is that much of what appears to be noise today might actually become high-quality signal tomorrow. Not only are SOCs not retaining enough information today, but they are not not retaining the data that they choose to save nearly long enough. Fortunately, the need to discard seemingly irrelevant data is much lower today, thanks mostly to the cloud.

Historically, data storage has been costly. That drove security and IT operations to limit their spend to only data that was seen as critically important. Many enterprises have systems in place to purge necessary telemetry that they didn’t realize they needed. With cloud having driven data storage costs way down, there is much less of a need to quickly delete data. And today, there is a need to be able to analyze signals in new ways.

The systems themselves today are producing more data and that trend is only going to continue. What enterprises need to do is learn the lessons of big data, machine learning and overall AI development. If you’re talking about solving this problem with your on-prem infrastructure, you’re solving the wrong problem. The cost of storage has gotten down to the point where it’s almost irrelevant.

Another dated concern is accessibility. That old fear is that if the enterprise has too much data, it’s not usable because the enterprise can’t effectively query it. But we’ve moved away from relational databases and we’ve moved away from the schema-based individual transaction states because of the scalability of global cloud compute platforms. And we know that this is true because that’s how we’re building these machine learning models around things like language detection and biological data processing and scientific development. We’ve already solved those problems from a data science perspective. 

What we’re not doing as a market for cybersecurity and IT operations is learning from that set of lessons and saying that we need to do the exact same thing at scale and leave the assumptions about data storage behind. The new mechanism for signaturing emergent threats is a data science exercise. Data science is only accurate when you have enough data to feed into the models that you’re building. That means that we need to identify that which has not yet been encountered: the emergent threat. 

The threats that we’re encountering are no longer typically representative of Zero Day Layer 1 through Layer 4 threat detections. The adversaries aren’t penetrating our systems at scale to drive that three trillion dollar global dark market around ransomware and identity theft. They’re not doing that by coming up with clever ways of decrypting packet traffic. They’re not doing that by finding individual penetrations with firewalls. Those individual vulnerabilities that come up represent only minute steps in what the actual emergent threat is. 

Attackers now adapt and agilely seek vulnerabilities between governance boundaries. And they are doing that at Layer 7 and Layer 8. They are manipulating human behavior and system behavior, application behavior and exploiting those behaviors by looking at them in aggregate. In order to be able to defend against such a thing, we have to be able to look at it in aggregate as well. That is a machine learning function. It’s not about reducing the telemetry until a human can analyze it. It’s about increasing telemetry so that you can train a machine to detect it.

A data lake is a critical part of the threat analysis process, but CISOs sometimes do not appreciate its role. A lot of the emergent threats like supply chain vector attacks, human behavior manipulation or compromise are exploiting unmonitored activities or activities that, in all other historical contexts, seem totally normal and follow expected behavioral patterns. In order to be able to understand how those interact, you need to store the data in its raw original source schema. The purpose of a data lake is to be able to store the original state of the information, with the original properties and the original metadata that was part of that transaction that was recorded. You are then able to derive its lifecycle, such as “this thing changed this number of times because of X.” You have to retain that raw in its original schemas so that you can start building inference models and data models across those different schemas to understand the differences and the changes that are happening. This helps identify patterns that you weren’t seeing in the information before.