SCANOSS: A simplified framework for OSS Assets Management

Introduction

OSS component identification is one of the key steps in OSS Compliance processes. At SCANOSS we have implemented a modern OSS scanner to help with Open Source component identification. Using our knowledge base which is stored on our very own database technology, our scanner is capable of storing all the open source in the community, and returning scanning results in microseconds. We have also implemented the best scanning engine, using intelligent algorithms that kills false positives, focusing on what is important. We expose all this functionality using a standards based API. Our open source code fingerprinting algorithm that increases the security and shields customers intellectual property. Our scanning solution is fast, secure and easy to integrate.

Open source Identification is one step in the wider OSS compliance chain. The ultimate purpose of an OSS compliance process is to manage OSS components in an organisation. This is so that organisations can comply with their regulatory obligations, and for security purposes. There is also a governance aspect, to protect the organisation intellectual property and make sure that all Open Source usage is in accordance with Corporate Open Source policies.

At SCANOSS we have implemented a lightweight approach to help with OSS Asset Management. By declaring OSS components in a simple JSON file, our scanning engine can take this information into account, and return only the results that include components not listed in the file. This helps companies being in control of the Open Source usage, in an automated way, inline with modern software development processes, reducing the need for manual processing and cumbersome user interfaces. It is a simple process that brings visibility and favours extensibility and integration with other tools. OSS Asset management, licence compliance, and open source dependency vulnerability management are currently addressed and implemented in the SCANOSS solution. The input from this files can be aggregated to provide an overall vision of the status of compliance for a business unit or an entire organisation. Another possible usage is to feed internal compliance tools.

The continuous nature of software development adds a layer of complexity to OSS Compliance. For every project in a company, OSS Compliance must be reviewed and assessed continuously in order to stay up to date with regulations. Therefore, it is very convenient that the OSS Compliance process adds a relatively lightweight overhead to the software development process. Otherwise there will be a measurable impact on project delivery, affecting the overall performance of the development teams. It is very important to take into account integration of OSS compliance with CI and CD. Automation must be a central element in every good OSS Compliance process.

Identification

Identification of OSS Components is an important step in a good OSS Compliance process. Use of these OSS components need to be in agreement with the corporate Open Source policies. As a key step towards compliance with license obligations, a detailed Software Bill Of Materials (SBOM) must be produced. This document must include all the licenses of the OSS components distributed with the product. Once OSS components are identified, a side benefit is that they can be scanned for security vulnerabilities.

Identification of Open Source components can be done manually or semi automatically with the aid of source code scanning tools. However, given the volume of Open Source code in the community, the scanning tools will always return several likely components that will need to be manually reviewed to identify the exact match.

There are several types of open source dependencies of a project. Some have regulatory impact and therefore must be accounted in the OSS compliance process, whereas others do not have regulatory impact.

Open source dependencies that have a OSS compliance impact:

  • Embedded sources: Source code from external OSS components included in the source code of the project, total or partially (file and snippet identification)
  • Build time dependencies: OSS components downloaded at build time via a dependency management tool such as Maven, npm, pip, NuGet, Godep, cargo, bundler… A special case is the use of Cloud Native component technologies, such as Docker. A Dockerfile may download many different components from the internet on the fly using wget for instance. Because these components are included in the shipped artefact, they must be included in the OSS compliance process.
  • Embedded binaries: Binaries that are shipped with the distributable artefact.

Open source dependencies without OSS compliance impact:

  • Libraries that are dynamically linked in the host machine and not distributed with the distributable artefact. Examples: OS libraries (e.g. Linux kernel), external libraries (e.g. Python libraries that should be present in the machine)
  • Executables that form part of the OS of the machine where the distributed code will run (e.g. curl, python, perl, bash).

Any decent source code scanning tool, such as SCANOSS OSS Scanner, helps identifying embedded sources, source code from external OSS components included in the source code of the project.

Scanning of build time dependencies is imperfect at the moment. The main reason is that the Open Source community is very good at writing from scratch new dependency and package managers, without common elements or standards. This lack of standardisation makes very difficult to bring a complete solution to the problem of resolving build time dependencies.

License compliance

A lot has been written about OSS License compliance. There are many OSS licenses, which impose different obligations on the party that distributes software built with OSS components. Linux Foundation SPDX offers a standardised nomenclature of the different families of licenses.

There are several initiatives aiming at the automatic identification of licenses. This is a complex subject and might lead to limited success in our opinion. While these initiatives might help eventually identifying the license family, in order to fulfil license obligations the full original license text of the OSS component needs to be used in the SBOM. One reason for this is that licenses might include copyright notices and/or special conditions or information included by the author which is not part of the original license template, but must be made available in the final product. Therefore a generalised copy of the license is not useful to fulfil copyright obligations.

All the licenses used in the components contained in a distributable piece of software must be analysed and validated against the Corporate Open Source Policy.

Managing Assets with oss_assets.json

It is important to preserve the list of approved and identified OSS components in a persistent form, so that they can be checked and re-validated in the future against OSS policies. This would enable the team to focus on violations of the policy instead of having to repeat the same OSS compliance validation again and again, without building on the previous experience. This is a wasteful and expensive exercise. This OSS identification artefact should ideally be placed in a visible part of each software project, using a format that is simple enough for inspection and automation.

At SCANOSS, we have designed an open file format for this task. A JSON file called oss_assets.json is placed at the root of every project. The specification of the format can be found here.

Use of oss_assets.json brings several benefits:

  • Machine readable format, excellent for automation: Based on an open JSON specification.
  • Simple, easy to modify and inspect: Using any text editor, a developer can easily edit it.
  • Extensible: SCANOSS tools only parse the attributes that they need. Companies can add more attributes to suit policies and processes, promoting flexible automation.
  • Identifies all approved OSS assets for a project: When used in conjunction with a OSS Scanning tool, it can be used to validate if unapproved assets have been included in the project. SCANOSS source code scanning tool uses the input of ossassets.json_ to report only on the unapproved assets found in the project. This eliminates a lot of noise and simplifies the process of identification.
  • Input for vulnerability scanners: The list of approved OSS components in a project provides the perfect input for a security vulnerability scan. Ensuring we only scan components that matter because they are the components that are used. SCANOSS Vulnerability Scanner uses the CPE declaration to find vulnerabilities, checking against sources such as the National Vulnerability Database.
  • Input for license compliance: oss_assets.json supports license attributes. These attributes simplify the creation of the SBOM document. Also, they can be used to perform a validation of licenses against corporate policies.

oss_assets.json is a simple way to be on top of OSS compliance, bringing visibility and helping with automation.