SCA products typically work as follows: • An engine scans the software source code, and the associated artifacts used to compile a software application. • The engine identifies the OSS components and their versions and usually stores this information in a database creating a catalog of OSS in use in the scanned application. • This catalog is then compared to databases referencing known security vulnerabilities for each component, the licensing requirements for using the component, and the historical versions of the component. For security vulnerability detection, this comparison is typically made against known security vulnerabilities (CVEs) that are tracked in the
National Vulnerability Database (NVD). Some products use an additional proprietary database of vulnerabilities. For
IP / Legal Compliance, SCA products will extract and evaluate the type of licensing used for the OSS component. Versions of components are extracted from popular open source repositories such as
GitHub,
Maven,
PyPi,
NuGet, and many others. • Modern SCA systems have incorporated advanced analysis techniques to improve accuracy and reduce false positives. Notable contributions include
vulnerable method analysis, which determines whether vulnerable methods identified in dependencies are actually reachable from the application code. This approach, pioneered by
Asankhaya Sharma and colleagues, uses call graph analysis to trace execution paths from application entry points to vulnerability-specific sinks in third-party libraries. •
Hybrid static-dynamic analysis techniques combine statically-constructed call graphs with dynamic instrumentation to improve the performance of false positive elimination. This modular approach addresses limitations of purely static analysis, which can introduce both
false positives and false negatives on real-world projects. •
Machine learning-based vulnerability curation automates the process of building and maintaining vulnerability databases by predicting the vulnerability-relatedness of data items from various sources such as bug tracking systems, commits, and mailing lists. These systems use self-training techniques to iteratively improve model quality and include deployment stability metrics to evaluate new models before production deployment. •
Natural language processing techniques for automated vulnerability identification analyze commit messages and bug reports to identify security-related issues that may not have been publicly disclosed. This approach uses machine learning classifiers trained on textual features extracted from development artifacts to discover previously unknown vulnerabilities in open-source libraries. • The results are then made available to end users using different digital formats. The content and format depend on the SCA product and may include guidance to evaluate and interpret the risk, and recommendations especially when it concerns the legal requirements of open source components such as
strong or weak copyleft licensing. The output may also contain a
Software Bill of Materials (SBOM) detailing all the open source components and associated attributes used in a software application ==Advanced techniques==