Finding related searches with Spark

Unsupervised learning from user behaviour When a user navigates a site they leave a valuable trail of information - what their first search was, what they followed this search with, and so on. Using this data we can learn related searches automatically by co-occurrence counting. This post takes you through the steps to get from raw search logs to results using the Spark cluster computing framework. Spark provides a natural processing language for flows of data, and can be scaled up to clusters when data growth dictates. [Read More]

Spark and Elasticsearch

Elastic Sparkle If you work in the Hadoop world and have not yet heard of Spark, drop everything and go check it out. It’s a really powerful, intuitive and fast map/reduce system (and some). Where it beats Hadoop/Pig/Hive hands down is it’s not a massive stack of quirky DSLs built on top of layers of clunky Java abstractions - it’s a simple, pure Scala functional DSL with all the flexibility and succinctness of Scala. [Read More]