Yesterday, I attended a meet-up session on big data and machine learning. Hadoop summit 2013 is kicking off in San Jose and event organizers were able use it as an excuse catch hold of some big names/vendors in this field.
The first speaker for the night was Ted Dunning, who as everyone knows is guru in this field. He started off with an introduction on Apache Mahout, pointing out areas where Mahout is good and comparable to best performing implementations in other platforms. He spoke about different packages Mahout provides and how to utilise them best. For example Recommendation package has plethora of good online algorithms, but it performs poorly in classification tasks. He also spoke about math library in java, which can be used to do all vector/matrix manipulations like Python or Matlab. He also mentioned that these algorithms have both in memory and distributed implementation, so that will be something cool to checkout. Link to his slides.
Second talk was from Alpine data labs which sounded almost like a sales pitch to me. They showed their parallel implementation of SVM where the key was to apply an approximation technique to one of the computation of Lagrange multiplier coefficients. It was a good descriptive talk and got many people thinking about the inherent details of the algorithm.
0xdata started off with the theme of how they want to bring data science to masses and help them get away from the direct confrontation with mathematics. Their product can interface with disparate sources like excel, R, SAS and extend the in memory implementations on to the distributed platform. They worked through an interesting proof of concept using a on-time-airline dataset http://stat-computing.org/dataexpo/2009/.