Wednesday, March 26, 2008

Visited Hadoop Conference

NOTE: Any comments in this blog entry is based on my personal thoughts after visiting the Hadoop conference and doesn't represent any current plans within MySQL.

I visited the Hadoop conference today which was a very interesting event. The room was filled to its limit, people were even standing up in lack of chairs. Probably around 300 people or so.

It was interesting to see the wide scope of web-scale problems that could be attacked using Hadoop. The major disruptive feature in Hadoop is the MapReduce solution to solving parallel data analysis problems.

One piece that I started thinking of was how one could introduce the MapReduce into SQL. One presentation of HIVE showed an interesting approach of how to solve this problem. I thought a bit on how one could integrate a MapReduce solution in MySQL and there are certainly a lot of problems to solve but I got a few interesting ideas.

The concept of being able to query both business data stored in a database and web-based logs and other type of massive amounts of data is certainly an interesting problem to consider.

In principle what one can add by introducing MapReduce into MySQL is the ability to handle streaming queries (queries that use dataflows as input table(s) and dataflows as output table).

However the actual implementation of Hadoop and HBase still were very much in their infancies so availability and reliability were far away from always on and also performance wasn't yet a focus.

1 comment:

Jeff Hammerbacher said...

Hey Mikael,

My name's Jeff Hammerbacher, and I manage the Data team at Facebook, where Hive was developed. Please drop me a line at hammer at facebook dot com to discuss further! We're heavy MySQL users as well (second largest in the world, I've heard) so would love to hear your thoughts.

Regards,
Jeff