Adding Python Support

VersionEye 2.0 started with only 3 languages. Ruby, NodeJS and PHP. Now Python was added as well!

VersionEye supports the package manger PIP. You can simply upload your requirements.txt file to get a BOM (Bill of Materials). By default the ‘requirements.txt’ file contains only the direct dependencies. I would recommend to use Pips freeze feature to lock down all transitive dependencies. That can be achieved with this command:

python -m pip freeze > frozen-requirements.txt

That command will take all direct dependencies from the requirements.txt file, resolve all transitive dependencies and create a new file frozen-requirements.txt which contains ALL (direct & transitive) dependencies with fixed versions. That’s much better then the pure requirements.txt file because the frozen version contains all dependencies which are shipped to production. That contains ALL dependencies which are necessary to run the application.

Here is an example for a Python VersionEye scan:

Try it out and let me know how you like it! Would love to hear your feedback.

If you don’t have an account yet, signup with this promotion code “Ve2Python“. That code is valid until 4th of June and gives you a 50% discount on the monthly subscription!

VersionEye 2.0

More than 3 years ago I discontinued VersionEye. A SaaS tool for Open Source License Compliance & Security. VersionEye 1.0 could monitor your package manager files on GitHub, notify you about outdated dependencies and show you which of your dependencies are affected by security vulnerabilities and copyleft licenses (AGPL, MPL-2.0 and so on).

I was operating it for almost 7 years and at the end I was pretty much burned out! The reason why I discontinued it are described in this blog post. Working on it was not a joy anymore. My mailbox was flooded with support emails and after 6 years pretty much everything else looked more interesting to me than OS license compliance and security. That’s why I joined LivePerson at the End of 2017. I got a new challenge, met a lot of smart people and helped to grew the LivePerson Mannheim Office from 15 to 60 people in less than 3 years.

In 2018 Alex Spinelli, former head of Alexa OS at Amazon, joined LivePerson as the new CTO. He made a lot of organisational changes and did teach us a lot about Amazon management principles. Things like writing PR/FAQs, single threaded ownership, 2 Pizza teams, mission statements, leadership principles and many more things. That was very valuable and I enjoyed learning a lot of new things.

Of course, License compliance was a topic at LivePerson as well. Software teams had to ensure that they are compliant with a company license whitelist. The tools which we had to use for that at LivePerson have been far away from user friendly and good. With a crying eye, I looked back at VersionEye. I had no interest in resurrecting VersionEye 1.0 because I still remembered all the pain points of that solution. But somehow the topic was keeping circling in my head. So I started to think about how VersionEye 2.0 should look like. Not just on a technical level, but also on a lifestyle level. How would a version 2.0 look like which I would enjoy working on and wouldn’t burn me out again? So I thought about all the points which stressed me out at version 1.0:

  • Customer Support: With VersionEye 1.0 I had a lot of non paying users. Less than 1% of all signed up users (52K) paid a monthly subscription. More than 99% of all signed up users did not pay a single dime but most of them caused server costs and some of them caused support.
  • Too much data: In 2017 VersionEye supported something like 20 package managers and crawled all repository servers for them to collect all kind of information about software libraries. VersionEye had meta information for more than 2 Million software libraries/components. Every component can have multiple versions. A specific version of a component is an “artefact”. VersionEye had meta information for more than 50 Million artefacts. That’s of course a nice vanity KPI. Kind of cool if you can tell investors that your database contains meta information for more than 2 Million software components and 50 Million artefacts! Right? But the truth is that 99% of these software components are never referenced in customer projects! 99% of them are garbage! Even my biggest customer at that time, a big software giant from Germany, did not use more than 6K unique open source components for their active Java/NodeJS software development.
    Keeping the database with 2 Million components up-to-date requires a lot of crawling activities and that causes again a lot of costs.
  • Not focused data: VersionEye 1.0 collected even information which was not relevant for open source compliance & security. For example, the crawlers collected information about the authors of the software components and about the relationships between the components. That’s all kind of nice to know but not really necessary for the core business. Customers of VersionEye have only 3 questions:
    • Are my dependencies compliant with a license whitelist?
    • Are my dependencies vulnerable?
    • Are my dependencies outdated?
  • Open sourcing: VersionEye 1.0 was fully open source. I open sourced it because I assumed it would offer more transparency and therefor bring more new customers. And I assumed I would get a lot of contributions from open source developers. Both assumptions have been wrong! I did not really get more paying customers after I open sourced everything. My existing paying customers did not really care because they did pay for the service, implementation of new features and SLAs.
    Open sourcing the whole thing actually caused even more support. Because all suddenly more people opened GitHub tickets. Just very few people actually implemented an improvement and send a pull request. Most people just opened a ticket and waited for an reaction. All together I had to spend more time on GitHub tickets. I needed to read them, think about it, respond to them, doing more code reviews and so on. All activities which are very time consuming.
  • GitHub Integration: One feature of VersionEye 1.0 was that it could monitor your package manager files directly in your GitHub repositories. For that, users had to grant VersionEye access to their GitHub accounts/organisations. VersionEye then received a token per user/orga, which could be used to fetch files from the corresponding GitHub repositories. That of course, caused a lot of background processing. But the real pain point was that these tokens expired at some point in time and then VersionEye had to reach out to the users to ask for permission again. And that caused even more background processing.
  • Competition: I did focus too much on competition. At that time GitHub was announcing to offer security alerts on their platform and I thought that’s the End of the game. But it wasn’t! GitHubs security database is good but not complete. And not everybody wants to host their code on GitHub to get security notifications. Of course there are more competitors in the field. But they are all kind of expensive and difficult to use. There is still much space for improvements!

So I thought about these things and how I would do it differently for VersionEye 2.0:

  • Customer Support: VersionEye 2.0 would be for paying customers only. People could still sign up for free and test the solution. They would maybe get 5 scans for free and then they would have to convert to paying customers. I would be happy to do support for paying customers!
  • Datensparsamkeit: VersionEye 2.0 would only crawl data which is relevant for customer projects. There is no reason to fill the database with meta information about 2 Million OS components if customers only use a fraction of them. That’s why VersionEye 2.0 would do crawling on demand!
  • Focused data: VersionEye 2.0 would collect as few data as possible and as much as necessary. Version information, license information and information about security vulnerabilities are necessary for the core business. Information about authors, transitive dependencies, meta tags/labels, description in Markdown files and so on are not necessary.
  • Closed source: Open sourcing the whole solution is not necessary! For some parts of the solution it makes sense to be open source. For example for API clients and so on. But the parsers, crawlers, license normalisation engine, API and WebApplication can be closed source.
  • GitHub Integration: A GitHub integration like in version 1.0 would not exist anymore! There might be an integration for GitHub Actions, which is a push mechanism to the VersionEye API.
  • Competition: I would focus more on the product and less on the competition.

VersionEye 1.0 started as a fun project but over the years it mutated to a huge software monster which sucked me in and have eaten up all my time. VersionEye 2.0 should be fun again and that means it must be lean again. In April 2020 I started to work on VersionEye 2.0! I started completely from 0. Did not reuse the old code base. I thought about the data structure and started with a new database schema. Much more focused and lightweight than the first version.

The main idea of VersionEye 2.0 is that there is one single API Endpoint where you can send your package manager files to and you get back the most important KPIs as JSON, in seconds! You can send your files via Curl, like in this example:

curl --http1.1 -X POST https://api.versioneye.com/v1/scans \
     -H "Authorization: Bearer 9PAtYY4VzBfcYwGDvFTTtt910CCzlooAh0YZZ1oQ" \
     -F "pm_file[]=@package-lock.json" \
     -F "pm_file[]=@package.json" \
     -F "project_name=game-ui"

And you get back a result like this:

{
  "dependencies": 1317,
  "outdated": 541,
  "license_violations": 2,
  "unknown_licenses": 2,
  "security_vulnerabilities": 1,
  "report": https://www.versioneye.com/en/organisations/ve-demo-orga/scans/5f9f2024a38b31d9e082600c"
}

Curl is already installed on most Linux/Unix based machines. That means you don’t need to download an additional client. With Curl you can already integrate a VersionEye scan into your CI/CD pipeline! You just need an API Key from VersionEye.

Now VersionEye 2.0 is online since a couple months and it has the first paying customers again. It supports less package managers than the old version and has a much smaller database. It’s much more focused and much faster than the old version. And I enjoy working on it again.

If a paying customer is requesting support for a new package manager, I will take some time and implement it. I’m also fixing bugs very quickly and I’m constantly improving the license normalisation engine. But it’s all not that time & brain consuming like at version 1.0. Much less emails now and the few emails I’m getting are mostly from paying customers or prospects.

Check it out and give it a try. If you sign up, use the Promotion code “VersionEye20“. With that you get a 50% discount on the monthly subscription. Would love to hear your feedback!

Rust Support

Rust is a blazingly fast new language which guarantees thread safety! Our latest project, a command line tool to identify random components by their SHA values, is implemented in Rust. As we use Rust for our own purposes we want to track Rust dependencies of course 🙂 That’s why added support for Rust now! VersionEye is tracking more than 9K Rust packages from crates.io now and they are all available through our public search.

Screen Shot 2017-05-04 at 08.59.58

Beside that VersionEye can parse cargo.toml and cargo.lock files. If you are using cargo to manage your Rust dependencies then you can let VersionEye monitore those files to receive notifications about new releases of your dependencies.

Screen Shot 2017-05-04 at 09.09.45

The above screenshot shows a Rust project which is monitored directly on GitHub.

This integration is still very fresh. Please try it out and give us feedback!

VersionEye Maven Plugin 3.11.1

Version 3.11.1 of the VersionEye Maven Plugin is out and here are the change logs. This release brings a couple improvements and BugFix!

BugFix

Assume you have a Maven multi module project with 3 modules. Now you add the VersionEye Maven Plugin to the parent pom and run mvn versioneye:create. That will create a new project with subprojects in VersionEye. Now on your CI system on each build you run maybe this goal: mvn versioneye:securityAndLicenseCheck. That will always update the VersionEye project with the current dependencies from your Maven project and check them for security vulnerabilities and license violations. Everything works so far.

Now you are adding a new Maven module to your Maven reactor build and the next execution of mvn versioneye:securityAndLicenseCheck will fail, because it will try to update an non existing module on the VersionEye server.

This bug is fixed now! Version 3.11.1 of the VersionEye Maven Plugin can handle newly added Maven modules. If a new module is added it will automatically create a new subproject for that on the VersionEye server.

Improved Merging

For Maven multi module projects one project is created for the parent pom and for each module a project is created on the VersionEye server and merged into the project for the parent pom. This merging process was working through the GAV coordinates of the parent pom. That was leading to problems if the same project was monitored twice on a VersionEye server. The new merge mechanism is independent from the GAV coordinates, it only relies on the project ID from the VersionEye API.

Support for Project License

The VersionEye Maven Plugin is resolving all dependencies locally and only sends a pom.json file to the VersionEye API, which contains only the needed information. Now this pom.json file also contains the license from the pom.xml file and the project license is displayed in the web interface.

Screen Shot 2017-01-12 at 13.31.55.png

 

Open Source Component Inventory

If you are working in a big company with many software projects it’s very likely that you are using many open source components as dependency. But how do you keep track of it? How many components are you using in different version of multiple projects? You don’t know? This are questions VersionEye can answer you in seconds.

If VersionEye is monitoring your projects you can take advantage of the “inventory” feature. Simply click on the “inventory” link in your organisation and VersionEye will show you ALL the open source components your organisation is using over all the projects which are monitored by VersionEye. That can look like this.

Screen Shot 2016-10-19 at 22.21.53.png

It’s long list of all the components and below each component you can see which of you projects are using it in which version. If a project is using an out-dated version of the component the line is marked yellow.

Here is another screenshot with some comments to the meaning.

Screen Shot 2016-10-19 at 22.21.53 2.png

That’s already pretty cool because this open source inventory list is generated in real time, it always shows you the current status of your projects. But the really cool thing is that you can filter the inventory by different criteria like:

  •  Teams
  • Language
  • Project Version
  • Duplicates

Let’s make an example. Assume you don’t wanna see the full list of components, you are only interested in open source components which are used in 2 or more different versions over different projects. In that case you would select “All teams”, “All languages”, “All versions”, “only duplicates” and hit the “Filter” button. The result would like similar to this.

screen-shot-2016-10-19-at-22-22-53

In the example above you can see that the component “versioneye-core-j” is used in 2 different versions over 4 different projects.

Now let’s say you only want to see duplicate open source components in the “developers” team. For that you would select the team “developers” in the first field and hit the “Filter” button. Here is an example.

screen-shot-2016-10-19-at-22-23-59

In the example above not ALL projects in the organisation have been analysed. In this example only the projects which belong to the “developers” team have been analysed and now we can see that testng is used in 2 different version over 2 Java projects and capybara is used in 2 different versions over 3 projects.

Now let’s say you want to see the open source dependencies of a team over all projects, but if a dependency is used by another team as well you want to see that as well. For that you would select for example “Team: developers”, “All languages”, “All versions”, “Show duplicates” and hit the “Filter” button.

screen-shot-2016-10-19-at-22-24-49

Now all the dependencies from the “developers” team have been analysed and in this example we can see that the component “versioneye-maven-plugin” is used by 2 projects in the “developers” team AND in addition to that in the last 2 rows we can see that this dependency is used by 2 other projects as well which are in a complete different team.

I hope this new filter options are helpful for your daily work. Leave a comment here if you have feedback.

Projects with multi file support

In modern software development it is not uncommon that you use more than 1 package manager for a software project. Maybe you are using Ruby for the backend and JavaScript for the front end. In that case you would use 2 package managers. Bundler/Rubygems for the backend and maybe bower for the front end. You would have at last 3 files for describing your dependencies. A Gemfile and a Gemfile.lock for your backend and a bower.json file for your front end.

Up to now a project on VersionEye was always a representation of 1 single file. For example the Gemfile. So if you wanted to monitor all 3 files you had to have 3 projects on VersionEye. Now a project on VersionEye can have multiple files!

Screen Shot 2014-12-23 at 11.21.47
If you click on a Gemfile in our GitHub integration, VersionEye will check if there is a corresponding Gemfile.lock in the same directory. If so, it will parse that Gemfile.lock as well and create a new project with 2 files.

VersionEye-Multi
In the dependency tab, the corresponding files are displayed above the dependency table. In this example the Gemfile is selected and we see the dependencies of the Gemfile. By clicking on the Gemfile.lock we can switch to the dependencies of the Gemfile.lock. It’s all in the same project view.

The overall numbers in the head of the page are summed up from all files in the project. That means over the 2 files there are 58 unique dependencies and 33 of them are out-dated. Also the dependency badge is summed up over all files. The dependency badge only turns green if ALL dependencies in ALL files are up-to-date.

But that’s not all. How do we get the bower.json file into this project? Let’s assume the bower.json file is already a seperate project on VersionEye. Now we can merge projects into each other. This is how it works. Let’s navigate to the desired project we would like to merge into our Ruby project. Navigate to the “settings” tab. Here there is a new input field which looks like this:

Screen Shot 2014-12-23 at 12.13.53
Here we can pick a “parent” project to merge in. By clicking on the merge button the current project will be merged as subproject into the selected project.

VersionEye-Multi-3Now we have a project with 3 files. Gemfile, Gemfile.lock and a bower.json file. The overall numbers are updated. Overall 3 files we have 59 dependencies and 33 are out-dated.

Under the file names there is a link “unmerge bower.json”. By clicking on that link the bower.json can be unmerged from this project. That means it will be removed from this project and be again a seperate project.

Pretty cool! Right? This is just the first step. In the next update we will bring this feature to the API and to the plugins as well 😉

Let us know what you think about this feature, either here in the comments or on Twitter.

New Project Notification Emails

Up to now VersionEye sent out one notification email for each of your projects. That was not the best solution, specially for power users with 100+ projects because they got A LOT of emails. The new version only sends out 1 single email with a summary.

Assume you have 100 projects in VersionEye. 50 of them are monitored daily, and the other 50 are monitored weekly. In that case you receive now every day just 1 single email with a summary about your daily monitored projects. That looks similar like this one here.

Screen Shot 2014-11-25 at 18.03.19
The Dependencies column shows how many dependencies in the project are out-dated and how many are unknown. The Licenses column shows how many dependencies violet the license whitelist and for how many dependencies VersionEye doesn’t know the license.

By clicking on the project name you come to the project detail page on VersionEye, where you can see all dependencies of the project with more details.

Now you get less emails and a better oveview.

New license crawler finds licenses in git repositories

VersionEye is crawling a whole bunch of package managers + some GitHub Repositories. Many open source libraries provide their license directly within the package manager. On search.maven.org the license information is even mandatory. If somebody submits a new package and the license information is not included in the pom.xml file the package will be declined and not published. That’s a good thing! Unfortunately at most other package managers the license is not mandatory and so it comes that we have many open source libraries in our database without a license 😦

opensource_logo

But many projects on GitHub provide a LICENSE file. Other projects copy & paste the license text directly into their README page. That’s why I developed a new crawler which is looking for a LICENSE file in the root of a git repository. If some kind of LICENSE file was found the crawler tries to recognize the license text. Is it an MIT text, a GPL 3 text or some other license text? If the crawler recognizes the license text it saves the information into the VersionEye database. Otherwise not.
If there is no LICENSE file the crawler tries to find some license text directly in the REAMDE file. If the crawler couldn’t find anything it skips the repo and goes on to the next one.

The license crawler doesn’t crawl whole GitHub. It crawls only git repositories of open source packages which are already in the VersionEye database and have a corresponding link to GitHub. It basically completes meta data to already existing open source packages in the database. And right now there are 463.873 open source libraries in the VersionEye database.

Results

In the past couple days the license crawler was able to find license information for 22.694 open source libraries. 5.684 of this open source libraries are JavaScript libraries registered on Bower, without any license information. One popular example is angular-mocks. The repository provides a bower.json file without a license! The license text of angular-mocks is directly embedded into the README.md file. VersionEye found & recognized the license as MIT and displays it now on the corresponding VersionEye page. There are many more examples like this one.

Next Step

These works already pretty good. But the next logical step is to write a crawler which recognizes licenses in source code. That requires a bit more computing power and a bit more time to write it. Nothing for 2014. But a good goal for 2015 😉

VersionEye Maven Plugin 3.1.0

The VersionEye Maven Plugin is build for big multi module projects. If you have a big Maven project with a parent pom.xml and some modules, the VersionEye Maven Plugin is the tool of the choice to keep them up-to-date. We just released version 3.1.0 with many small improvements and some new features.

Multi-Module Projects

If you wanna see how to use the VersionEye Maven Plugin to setup a multi module project on VersionEye, check out this video on YouTube.

Proxy

Now version 3.1.0 has support for Proxies. If your run your maven goals behind a proxy server, you can set “ProxyHost”, “ProxyPort”, “ProxyUser” and “ProxyPassword” now. Here is the code snippet for the pom.xml file.

<build>
  <plugins>
    <plugin>
      <groupId>com.versioneye</groupId>
      <artifactId>versioneye-maven-plugin</artifactId>
      <version>3.1.0</version>
      <configuration>
        <proxyHost>127.0.0.1</proxyHost>
        <proxyPort>8888</proxyPort>
        <proxyUser>proxy_hopsi</proxyUser>
        <proxyPassword>dont_tell_anybody</proxyPassword>
      </configuration>
    </plugin>
  </plugins>
</build>

MongoDB Aggregation slows down server

Currently we are using MongoDB 2.6.3 at VersionEye as primary database. Almost 3 years ago I picked MongoDB because of this reasons:

  • Easy setup
  • Easy replication
  • Very fast
  • Very good support from the Ruby Community
  • Schemaless

So far it did a good job in the past years. Unfortunately we are facing some issues with it now. Currently it’s running on a 3 node replica set.

At VersionEye we had a very cool reference feature. That showed you on each page how many references a software package has. Mean how many other software packages are using the selected one as a dependency.

Screen Shot 2014-07-16 at 20.57.27

And you could even click on it and see the packages.

Screen Shot 2014-07-16 at 20.58.19

This feature was implemented with MongoDB Aggregation Framework. Which is a simple version of Map & Reduce. In MongoDB we have a collection “dependencies” with more than 8 Million entries. This collection describes the dependency relationship between software packages. To get all references to a software package we run this aggregation code.

deps = Dependency.collection.aggregate(
  { '$match' => { :language => language, :dep_prod_key => prod_key } },
  { '$group' => { :_id => '$prod_key' } },
  { '$skip' => skip },
  { '$limit' => per_page }
)

At first we match all the dependencies which link to the selected software package and then we group them by prod_key, because in the result list we want to have each prod_key only once. In SQL that would be a “distinct prod_key”.

So far so good. At the time we launched that feature we had something like 4 or 5 Million entries in the collection and the aggregation worked fast enough for the web app. But right now with 8 Million entries the aggregation queries take quiet some time. Sometimes several minutes. Far to slow to be part of a HTTP request – response roundtrip. And it slowed down each node in the replica set. The nodes have been running permanently on ~ 60% CPU.

Oh. And yes. There are indexes for the collection 😉 These are the indexes for the collection.

index({ language: 1, prod_key: 1, prod_version: 1, name: 1, version: 1, dep_prod_key: 1}, { name: "very_long_index", background: true })
index({ language: 1, prod_key: 1, prod_version: 1, scope: 1 }, { name: "prod_key_lang_ver_scope_index", background: true })
index({ language: 1, prod_key: 1, prod_version: 1 }, { name: "prod_key_lang_ver_index", background: true })
index({ language: 1, prod_key: 1 }, { name: "language_prod_key_index" , background: true })
index({ language: 1, dep_prod_key: 1 }, { name: "language_dep_prod_key_index" , background: true })
index({ dep_prod_key: 1 }, { name: "dep_prod_key_index" , background: true })
index({ prod_key: 1 }, { name: "prod_key_index" , background: true })

Sometimes it was so slow that the whole web app was not reacting and I had to restart the MongoDB nodes.

Finally I turned off the reference feature. And look what happened to the replica set.

Screen Shot 2014-07-16 at 20.06.54

The load went down to ~ 5% CPU. Wow. VersionEye is fast again 🙂

Now we need another solution for the reference feature. Calculating the references for 400K software libraries in the background would be very intensive. I would like to avoid that.

My first thought was to implement that feature with ElasticSearch, with their facet feature. That would make sense because we use ES already for the search. I wrote a mapping for the dependency collection and started to index the dependencies. That was this morning German time, 12 hours ago. The indexing process is still running :-/

Another solution could be Neo4J.

If you have any suggestions, please let me know. Any input is welcome.