Identifying components by SHA values

The public VersionEye API is exposing a lot of information about open source projects. If you know the coordinates of a component (software library) in a certain package manager, you can use the VersionEye API fetch all kind of meta information about the component. Meta information like available versions, licenses, security vulnerabilities and many more.

But what if you don’t know the coordinates of a component? What if you have somewhere on your disk a “beanutils.jar” and you don’t know which version it is or to which Maven GroupId it belongs to? You don’t know which license it has and you don’t know if it is vulnerable or not! For this problem the VersionEye API has a solution now. Simply create the SHA1 value for the “beanutils.jar” file. In Ruby it’s a one liner:

sha1 = Digest::SHA1.file "beanutils.jar"

Now take the SHA1 value and use the new SHA API Endpoint at VersionEye.

screen-shot-2017-02-08-at-08-54-36

With Curl it would be this command:

curl -X GET --header 'Accept: application/json' 'https://www.versioneye.com/api/v2/products/sha/427662b038bd8f52097f783f6ea163e45851b2a1?api_key=YOUR_API_KEY'

As response you get back the coordinates of the component. Now you know the Maven GroupId & ArtifactId and the version of your JAR file. That information you can use to gather even more information about the component from the VersionEye API. Simply use the products API Endpoint:

Screen Shot 2017-02-08 at 08.55.35.png

With Curl it would be this command:

curl -X GET --header 'Accept: application/json' 'https://www.versioneye.com/api/v2/products/Java/commons-beanutils%3Acommons-beanutils?prod_version=1.9.0&api_key=YOUR_API_KEY'

With that request you get back the dependencies, licenses and security vulnerabilities of the component. Now you know that your “beanutils.jar” is licensed under the Apache-2.0 license and it has at least 1 security vulnerability.

As you know the Maven coordinates of your “beanutils.jar” you could also simply visit the corresponding VersionEye page in the browser.

Screen Shot 2017-02-08 at 08.56.58.png

Here you see as well the license and the security vulnerability. And in the right upper corner you can see that there is a newer version of the component available, version 1.9.3. By clicking on that number you will see that the newest version of the component is not vulnerable.

The VersionEye database has SHA values for more than 90K .NET projects, more than 170K Java projects, almost 400K Node.JS projects and more than 88K Python projects. Altogether we have more than 9 Million SHA values.

screen-shot-2017-02-22-at-08-48-40

Every version of a component has at least 1 SHA value. In Maven a version of a component contains multiple files, like for example “*.jar”, “-javadoc.jar”, “-sources.jar” and “.pom”. In this case every file has his own SHA value, that means to 1 artefact we have multiple SHA values in our database.

The SHA method for Maven components and Node.JS modules is always “SHA1”. For Python projects we store the MD5 hash value. For .NET components we either have a base64 encoded SHA256 or SHA512 value. If you have a .NET Nuget package simply create a base64 encoded SHA256 value for it and fire it against our API. If the result is empty repeat the process with a SHA512 value. Here an example how it would look like in Ruby:

Base64.encode64 Digest::SHA512.digest File.read('Newtonsoft.Json.9.0.1.nupkg')

For more than 99% of all Nuget packages we have a SHA value.

Currently we have only SHA values for Java components from Maven central. That’s the official central repository for Java components. VersionEye is crawling a couple other Maven repositories as well, but for those we don’t have SHA values right now. But we are working on it.

Right now we do NOT have SHA values for Ruby but we are working on it!

Try out the new API and let us know how you like it.

3 thoughts on “Identifying components by SHA values

  1. SHA-1? Seriously? In the same week as Google has announced to release an algorithm and a tool in 90 to generate hash collisions?

    Yes, I am aware, that this attack does not affect versioneyes use case. But please, even if we cannot get rid of SHA-1 in all existing software, could we please at the very last not create new software that depends on broken hash algorithms?

    1. Yes! Seriously! I know that article from Google. Not sure if you finished reading this article or you know how Maven works. We have SHA-256 and SHA-512 values for C# Nuget packages. The SHA1 values are mostly provided by Maven Central. So we did NOT create NEW SHA1 values at all! We are reusing stuff which is already out there. Beside that this is the first shot for SHA support. We are planing to create new SHA-512 values for Java packages in future and we are even planing to create SHA values for files inside of JAR files to be able to identify unpacked Maven packages.

  2. Great idea. It would be awesome if you could index the outdated artifacts published by Google for Android in the past. There are a couple of libraries such as Google Play Services and Google Analytics which have been served as .jar files. I am pretty sure they still can be found on GitHub. I cannot tell if there is a central repository.

Leave a reply to Robert Reiz Cancel reply