/home /about /github /feed

software supply chain security - what’s next?

2021 is shaping up to be the year of software supply chain security (SSCS, because yes, we need a new buzzwordy acronym). From a backdoor in widely-used 3rd party software (you know who) to name squatting for internal libraries (“dependency confusion”), compromises keep coming from inside the house.

It’s a real arms race. On the Red side, we have well-intentioned security researchers, national intelligence agencies and a whole army of altcoin miners. On the Blue side, we have companies of all sizes, from Snyk to GitHub to Google, and industry-wide groups like OpenSSF. And the Red side seems to be winning the race.

In this post I’ll try to list the current approaches to SSCS and then speculate on how else we could poke at the problem.

Disclaimer: I’m by no means an expert in this area. Just someone observing it from the sidelines and discussing SSCS risks on my previous team at Google.

let’s scope it down

When it comes to SSCS, there seem to be 3 primary sources of risk:

3rd party code you use (libraries)
3rd party software you run (from the Linux kernel to on-prem enterprise bloatware)
build/release pipelines (including developer machines, code hosting and build artifact storage)

All of these are huge beasts to tackle and require different solutions. For this post I’ll only cover the first one - libraries. Because it’s the area I’m most familiar with and because it needs the most attention.

current approaches

There are a few approaches in use today, both standard practices and commercial products. They are all useful but not sufficient to address the risks of 3rd party libraries.

detection

Detection tooling tells you when some of your dependencies have known vulnerabilities. For example CVE scanners for dependency lockfiles. They are often accompanied with automatic patch suggestions - generating a version bump for you to pick up the fix.

Detection tools are essential. If you don’t know about publicly disclosed vulnerabilities in your dependencies and don’t patch them - you’re an easy target.

There are a few tactical limitations - package manager coverage and vulnerability data sources. Most scanners will focus only on a handful of the most popular languages and ingest data mostly from NVD. These are fixable, given enough time.

But the real problems are higher-level. First, you’re always a step behind the attackers - there’s a period of time between CVE disclosure and your patched release actually rolling out (unless you’re a major tech company who gets warned in advance). And during this time you’re vulnerable.

Second, you can only detect and fix CVEs that were reported. There are many known unknowns that are either undiscovered or not reported (due to process burden or intentional vuln hoarding).

runtime security

Runtime security is another mandatory tool in the SSCS arsenal. You should sandbox untrusted code, give it the absolute minimal set of permissions and networking access and generally contain the blast radius when compromised. And since running purely 1st party trusted software (without 3rd party libraries) is rare these days, you may as well approach everything as untrusted software.

With runtime security, you’re always at a disadvantage. As a defender, you need to get everything right, handle every exploit vector and not miss a single piece of software. As an attacker, you just have to find one thing that the defenders missed. Attacker’s job is much easier.

And even if you’re the most diligent defender, the risk of unknown unknowns is always present. A novel attack can appear overnight, with none of the existing runtime security tools prepared to deal with it. Just like the CPU vulns of the past few years.

pentesting

Pentesting is another corporate tool for finding vulns, both in your code and in dependencies. Pentesting sells well and always finds holes in software.

However, it is opportunistic. The list of findings is never complete, it’s not even guaranteed to contain your worst issues. While having a second pair of eyes is undeniably useful, it’s not a complete solution.

other options

All of the above approaches appear to follow the same philosophy:

Your supply chain will get owned, repeatedly, forever. Optimize for mitigation!

And it’s true, but we’re just throwing our hands up in the air and taking all the punches. What’s missing is prevention. I’m not talking about preventing active exploits, I’m talking about preventing being vulnerable in the first place. Mitigation would be so much easier if you didn’t have to patch tens of high-severity CVEs weekly. You might even circle back to the hundreds of medium/low severity ones hanging around.

Here’s a couple random ideas for preventing getting owned via a library.

scanning/linting

Mature programming languages have tons of linting/scanning tools. They can catch everything from stylistic nits (making your code more readable for reviewers) to memory races and SQL injections. Using an extensive set of linters in my projects has prevented more potential vulns than I can remember.

Linting usually stops at your own code (or even just the changed part of code in your commit). What if we extended this to 3rd party libraries? After all, most of the dependencies of modern software are open-source. For example, every new or updated library version has to pass the linter to pass your CI (or have an explicit signoff to ignore findings). If issues are found, they are either fixed upstream (benefiting every consumer) or in your own fork.

This may sound like redundant work - shouldn’t library authors deal with linting and keeping their code up to standards? Unfortunately no, you can’t force OSS maintainers to do anything. At best you can suggest it or propose a contribution.

distributed review

Mandatory code review has proven to be a great tool in catching bugs and vulnerabilities. But a sole maintainer of a small library might not have a second reviewer for their own changes.

What if additional reviews happened on the consumer side? You can make any dependency addition or update be conditional on the same code review you do for your own code. Of course, this doesn’t scale if every project and organization try to review every dependency they have. At least they can’t do it thoroughly.

But what if these reviews were shared across org boundaries? For example, a public database where anyone can attest to reviewing a certain version of a library. Given a sufficient number of existing reviews, you can just trust them and skip your own review.

There may be enough network effect for popular libraries, but not for smaller ones. This is opportunity to make a business! I can imagine a consultancy that sells code reviews, especially for niche libraries or specialized domains like cryptography. You can even make this a subscription - send us your dependency list and we’ll review their latest version once per week and send you recommendations on what’s safe to update.

reduce attack surface

This could make me sound like an old grumpy man, but I think as an industry we’ve become way too liberal with dependency usage. JavaScript is the poster child for this, of course, with a blank React project importing thousands of libraries from authors unknown to you. But JS is not the only offender, pretty much every modern language with a package manager and a public repo of libraries suffers from the same problem. A somewhat disciplined Go project like Kubernetes has hundreds of dependencies, and it’s easy to find other Go projects with over a thousand dependencies.

Moving away from this is a cultural change. Teaching (and incentivising) engineers at all levels to be concerned with dependency creep seems like an uphill battle. It would take a long time - a decade or more. The current startup-obsessed MVP-everything ship-ship-ship culture popularized in the US is the exact opposite of the disciplined and careful engineering utopia I imagine.

But I refuse to give up on this ideal. Languages like Go, with batteries-included standard libraries, make me hopeful that the dependency creep is avoidable. We just have to show engineers that a better way exists.

Or maybe I’m totally wrong? Maybe software evolution is similar to CPU evolution. Just like the number of transistors going up makes everything faster, the number of dependencies going up makes software more capable or robust. Anecdotal experiences suggest otherwise, but maybe I’ve just not seen many good examples of solid dependency-heavy software.

what’s next?

Clearly, there’s no silver bullet to make all 3rd party libraries safe to use. Protecting our supply chains will take a patchwork of overlapping prevention and mitigation work. But we need research investment in the area. Especially to discover new approaches, and not just incremental improvements on existing ones.

If you have clever new ideas or want to shout how wrong I am, please email or tweet at me. I’m especially curious about existing projects that don’t fall into the above buckets!