NoSQL Geek2Geek at Wikimedia

Last week we did again a Geek2Geek event in Berlin. This time the event was hosted by Wikimedia Deutschland and the topic was “NoSQL Ring Architecture”.

Ring Architecture

Ring Architecture is designed for high scalability and BIG DATA! The idea is the your data store is organized in a ring and your write operations can be send to multiple nodes in the ring. The nodes of the ring can be server nodes distributed over the whole planet.

ring_architecture

Typically your data is stored redundantly in multiple nodes (Not necessary in all nodes). That makes it automatically very failure tolerant.

Ring Architecture is optimized for writes! There is a bunch of use cases for this kind of Architecture. For example smart meters. If you have a couple million houses with smart meters, sending every 1 minute their state to a centralized data store, you have A LOT of write load! Good luck if you try to handle that load with a multi master MySQL Cluster 😀

It is also very interesting for Geo Location use cases. If you run a service world wide for local retail for example, you will send most of your write operations from Europe to nodes in Europe. that means that read operations from Europe are super fast. Reads for European retail data from other continents would be still possible of course, but it would be a bit slower. But that’s OK, because you know that kind of reads are the Exception.

Geek2Geek

The Geek2Geek event happens every couple weeks. Always at a different location and always with a different topic. We always invite at least 2 speakers to 1 topic. We want to see different solutions for a problem and get people with different background into 1 room.

Two very popular implementations for Ring Architecture are Apache Cassandra and Riak. We had a talk to each of them. Round about 50 people attended to the event.

geek2geek

Check out more pictures on the MeetUp page.

Cassandra

The Apache Cassandra project is completely open source and it was originally started by engineers at Facebook. No wonder, they have a lot of writes and reads on their backend I guess 🙂

Christian Johannsen gave a talk to Cassandra. The talk was recorded and we even live streamed the talk on YouTube. Watch the talk here.

Christian did a great job. He gave a really good intro to Cassandra and explained very well how it works. We had surprisingly many questions to Cassandra from the audience. Much more than usual.

Riak

Riak is a younger implementation of the Ring Architecture. Guillermo Alvarez from Wooga did the 2nd talk about Riak. He gave a short intro and showed us how they use Riak at Wooga to develop some amazing games. Watch his talk here:

Guillermo did a great job as well. It was very interesting to listen to the Wooga use cases. And we had a lot of questions from audience to Riak as well. This time the audience showed a high engagement 🙂

Sponsors

Geek2Geek is always FREE and most times we have even free drinks and Pizza! That wouldn’t be possible without our awesome sponsors!

Wikimedia Deutschland

Many Thanks to Wikimedia Deutschland for sponsoring the great room and the possibility to record and even live stream the talks! The Wikimedia Foundation in Berlin is working on several Open Source projects. The most popular is of course Wikipedia. But indeed most of the engineers at Wikimedia in Berlin work on the WikiData project. That is the biggest fact database in the world.

wikimedia_family

 

DataStax

Many Thanks to DataStax for the food sponsoring. DataStax is the company behind Apache Cassandra. If you need professional help & support for your Cassandra installation you should talk to DataStax. They have more than 300 employees all around the world, ready to help you.

DATASTAX LOGO

Basho Technologies

Many Thanks to Basho Technologies for the food sponsoring. Basho Technologies is the company behind Riak. If you need professional support for your Riak installation you should contact Basho Technologies. And maybe you are even interested in their new Cloud Hosting Service for Riak.

basho-logo-color-vert

VersionEye

And many thanks to VersionEye for organizing the event. For bringing together the right people, the right speakers and the right sponsors. VersionEye is a Notification System for Software Libraries. It helps developers to stay up-to-date.

VerisonEye

NoSQL Geek2Geek at Wikimedia

Tomorrow we are hosting again a Geek2Geek event in Berlin at WikimediaDe. The topic this time is “NoSQL Ring Architecture”. Famous implementations are RIAK and Cassandra. We will have one talk to each of them. Check out the Agenda at MeetUp.com.

Screen Shot 2014-08-26 at 17.31.09

 

You can signup for free for the event at MeetUp.com. Thanks to our sponsors we will have free Beer & Pizza 🙂   

How do you imagine the perfect Crowd Software Testing Tool?

The StartUp testCloud – a crowd software testing provider based in Berlin – is offering a service for crowd software testing. 

testcloud_-logo-14

Now they are working on a new Crowd Software Testing Tool, which is completely self-service. They prepared a 3 min survey to get known your needs for the perfect Tool:  

https://www.umfrageonline.com/s/d3d0f6b

Everybody who finishes the survey can use the new tool for free up to 5 Bugs after official launch.  

I finished the survey by myself and I’m very curious to see the results.

How popular is VersionEye

VersionEye belongs to the top 100K Domains in the world! Alexa is measuring the traffic a Domain gets. As more traffic a Domain gets as lower is his Alexa Rank. Google has the Alexa Rank 1, because it is the most visited Domain on this planet! Facebook has the Rank 2 and Twitter the Rank 7. For the top 100K Domains Alexa draws a chart. 100K sounds a lot, but not really if you realize that there are 215 Million Domains online! Just to visualize it, this chart shows the top 100K Domains in red color. The other Domains are blue. Yes, there is a little thin red line in the blue cookie 🙂 That’s the top 100K Domains 🙂

Screen Shot 2014-08-11 at 10.57.44

VersionEyes Global Rank is 96.535 and our Rank in Germany is 10.247! And it’s getting better 🙂 Check out the VersionEye Alexa Page.

Screen Shot 2014-08-11 at 10.37.59

We are entering the Alexa Chart. Yahooo! And this are the countries we get most of the traffic from.

Screen Shot 2014-08-11 at 10.38.45

According to Alexa there are 118 Domains linking to VersionEye. That’s not correct. According to Google there are 336 Domains linking to VersionEye!

Screen Shot 2014-08-11 at 11.06.48

More than 4000 Links alone from GitHub, because of our awesome dependency and reference badges 🙂 Currently we are processing up to 140K HTTP Requests per day.

Many Thanks to our awesome community for all the support and the passion. This would not be possible without you! We keep up the good work. I promise!

VersionEye Enterprise updates

There are 2 new docker containers out for VersionEye Enterprise.

  • veye/rails_app:2.3.5
  • veye/rails_api:1.0.2
  • veye/

The new rails_app contains the new Single Page App for GitHub and Bitbucket. Which is much more straight forward and much faster.

Screen Shot 2014-07-31 at 10.12.36

The new rails_api contains updates for the new reference endpoints.

Unified search about your fav. web apps with StartHQ

StartHQ is one of the few tools I’m using every day! It is my default Tab Launcher for Chrome. Everytime I open a new tab in Chrome I see my StartHQ dashboard.

01_StartHQ

 

This works over the StartHQ Chrome extension. The cool thing here is that I can customize my Tab Launcher with the apps I like and everything is stored in the cloud. In that way I can use the same Tab Launcher over all my devices.

But the coolest feature is the unified search over all my favorite web apps. Let’s say I trigger a “MongoID” search on my StartHQ dashboard. This is the result list.

03_StartHQ

 

I get results from GMail, StackOverflow, DuckDuckGo, Twitter, GitHub and VersionEye. All in one place. It works the same for document searches. If you don’t know where you stored a document, simply search through all your web apps with StartHQ.

The unified search works through the extension, which is also available for Firefox, Safari and Opera. If you are logged in into your favorite Apps, the StartHQ Chrome AddOn can search through them and offer your deep links to your Cloud Content. The company StartHQ itself doesn’t have really access to your cloud services. Privacy wise it’s all safe. That’s awesome! Isn’t it?

Another really cool feature is the email newsletter. No! I’m not kidding you. The StartHQ Newsletter is one of the few newsletters I’m really reading, because it shows me the newest blog posts from my favorite Web Apps. Here is an example:

04_StartHQ

I really like the format because it’s short. In each section there are never more than 3 links! And they are always important to me. That’s perfect. It’s not another email newsletter with a never ending link collection.

Check it out today and give it a try. And let us know what you think on Twitter and Google+.

VersionEye is already integrated into the StartHQ search. In future we will work on a deeper integration with StartHQ.

New Single Page App for GitHub/Bitbucket integration

Yesterday we launched the new Single Page App for the VersionEye – GitHub/Bitbucket integration. For sure you know the old one. This here.

OldSPA_Github

It had a lot of disadvantages. The UI was confusing and cluttered with a lot of information, which is not important for the VersionEye use case. And specially if you had a lot of branches and project files in a repository you had to scroll in the little fixed div area. That was not very comfortable.

The new SPA (Single Page App) is a completely rewrite, with ReactJS.  The UI is much simpler, there is nothing you can do wrong. This is the initial page.

Screen Shot 2014-07-31 at 10.10.40

You get ALL your repositories listed, without any other useless information. With the text input field you can filter the list and quickly get to your desired repository. By clicking on a repository name you come to the next page, there we import ALL the branches and the supported project files.

Screen Shot 2014-07-31 at 10.12.36

Here you can take advantage of the full size of your screen. If you want to monitor a file, simply flip the switch on. After the file is parsed the file name will turn into a link which leads to the VersionEye project page, where you can see all the dependencies of the file.

This SPA is much simpler than the old one. There is nothing you can do wrong and it works much faster. Beside the SPA we also refactored the backend services, which now take more advantage of RabbitMQ. You can try the new SPA in the login area here for GitHub:

https://www.versioneye.com/user/projects/github_repositories

And here for Bitbucket:

https://www.versioneye.com/user/projects/bitbucket_repositories

Let me know how you like it. Either here in the comments or on Twitter.

VersionEye Enterprise Update

Another update for all the Enterprise clients is out.

rails_app 2.1.5

This Docker container ships the new web app for VersionEye Enterprise. This includes many small improvements. Here are the 2 biggest ones:

  • References are turned on again.
  • BugFix for too many open database connections

There was Bug which didn’t closed open database connections. The connections summed up and the web app was slower and slower. This Bug is fixed in this release. If you update this container, please stop and start before the MongoDB Docker Container. That will close all open connections on MongoDB.

Intro to Firegento

Firegento e.V. is an association around the PHP ecommerce framework Magento, dedicated to Open Source, the Community around it and supporting the apprenticeship and education of developers. It started with a group of Developers from different companies to create open source projects under a neutral name. Firegento regularly organizes Magento centred Hackathons in Europe and supports a good set of Magento related projects.
I want to tell you about the two which are now related to VersionEye.

The Magento Composer Installer

Every bigger PHP project seems to have its own installer to integrate packages from composer into the current project, so we thought we should write one for Magento, too. That was nearly 2 years ago at a hackathon. Today it is a standard tool many Magento Developers know and its even a part of the regular Magento trainings (at least people told me this). In other News, it’s currently the 3rd most referenced php project on VersionEye and even the Magento project itself uses it as base for an own composer installer.

With the increased interest into composer there also grow a need to have access to the whole world of magento modules, which created the next project I want to introduce.

The Firegento Repository

It uses the composer satis project as base and got initial created to offer easy access to a few Magento modules hosted on GitHub. On one of the hackathons two members of the community felt the need to expand this and created a converter for all the free Magento modules which are published on the Magento marketplace. Thanks to both projects Firegento plays a major role today in spreading the composer usage under Magento developers.

logo

Useful Links related to this article:

firegento page: http://firegento.com/
packagist page: http://packages.firegento.com/
repository github; https://github.com/magento-hackathon/composer-repository
installer github; https://github.com/magento-hackathon/magento-composer-installer

Daniel Fahlke

This article is written by Daniel Fahlke.

I am Daniel Fahlke, on the Internet also known as Flyingmana and work as a Magento Developer at http://www.melovely.de/.  I am also an active member of the Open Source Community and in my free time currently mostly active in magento and composer related projects. Other things I participate in are PHP-Mentoring, various other PHP projects and some regularly usergroups and conferences.

MongoDB lessons learned

MongoDB is currently the primary database for VersionEye. In the last couple weeks we had some performance and scaling issues. Unfortunately that caused some down times. Here are the learnings from the last 3 weeks.

MongoID

The Ruby code at VersionEye is using the MongoID driver to access MongoDB. All in one MongoID is a great piece of open source software. There is a very active community which offers a great support.

In our case MongoID somehow didn’t close the opened connections. With each HTTP Request a new connection to MongoDB is created. If the HTTP Response is generated the connection can be closed. Unfortunately this didn’t happened automatically. So the open connections summed up on the MongoDB Replica Set  and the application become slower and slower over time. After a restart of the Replica Set the game started by 0 again the application was fast again. At least for a couple hours until the open connections summed up again into the hundreds.

For right now that’s fixed with this filter in the ApplicationController.

  after_filter :disconnect_from_mongo

  def disconnect_from_mongo
    Mongoid.default_session.disconnect
  rescue => e
    p e.message
    Rails.logger.error e.message
    Rails.logger.error e.stacktrace.join "\n"
  end

Still not sure if this is a bug in MongoID or a misconfiguration on our side.

MongoDB Aggregation Framework

We have a cool Feature at VersionEye which shows the references for software packages. This are the references for the Rails framework, for example.

Screen Shot 2014-07-16 at 20.58.19

This feature shows you which other software libraries are using the selected software library as a dependency. Usually many references are a good sign for quality software.

In the beginning this feature was implemented with the Aggregation Framework of MongoDB and it was fast enough. This is the aggregation code snippet we used for this feature.

deps = Dependency.collection.aggregate(
  { '$match' => { :language => language, :dep_prod_key => prod_key } },
  { '$group' => { :_id => '$prod_key' } },
  { '$skip' => skip },
  { '$limit' => per_page }
)

At the time this was implemented we had less than 4 Million dependency records in the collection. Over time the collection was growing. Right now there are more than 9 Million records in the collection and the aggregation code snippet above is just terrible slow. And it slows down everything else too. If multiple HTTP Requests trigger this code the whole database is getting super slow! I wrote already a blog about that here.

One thing I learned is that the Aggregation Framework doesn’t take advantage of Indexes. Same is true for the Map & Reduce Feature in MongoDB. Originally Map & Reduce was created to crunch data in parallel, super fast. On MongoDB Map & Reduce is running on a single Thread, without indexes :-/

Wrong Indexes

Instead of calculating the references in real time with MongoDBs Aggregation Framework, we wanted to pre calculated the references with a simple query. This one:

prod_keys = Dependency.where(:language => product.language, :dep_prod_key => product.prod_key).distinct(:prod_key)

The advantage of this distinct query over the Aggregation Framework is that it can take advantage of Indexes. And specially for that query there is an index!

index({ language: 1, dep_prod_key: 1 }, { name: "language_dep_prod_key_index" , background: true })

On localhost the query was running quiet fast. Still to slow for real time, but fast enough to pre calculate all values over night. On production it was running super slow! It needed for each query 17 seconds. Calculating the references for all 400K software libraries in our database would take 78 days.

Finally Asya gave the right hint. He recommended to double check the query in the mongo console with “.explain()”, to see which indexes are used. And indeed MongoDB was using the wrong index on production! Only God and the core-committers know why. For me that’s a bug!

This is what happens if you run a couple distinct queries which use the wrong index.

Screen Shot 2014-07-19 at 22.11.34

I deleted 5 indexes on the collection until MongoDB had no other choice than using the dam right index! And now it’s running fast enough. Finally!

Conclusion

Here are the conclusions for working with MongoDB:

  •  Check regularly the logs on the MongoDB Replica Set to recognize odd things.
  • Close open connections.
  • Avoid The Aggregation Framework if you can do the same with a simple query.
  • Ensure that MongoDB is using the right Index for your query.

So far so good.