Finding related searches with Spark

Unsupervised learning from user behaviour When a user navigates a site they leave a valuable trail of information - what their first search was, what they followed this search with, and so on. Using this data we can learn related searches automatically by co-occurrence counting. This post takes you through the steps to get from raw search logs to results using the Spark cluster computing framework. Spark provides a natural processing language for flows of data, and can be scaled up to clusters when data growth dictates. [Read More]

Scala Future and the Elasticsearch Java API

Most examples of the Elasticsearch Java API you’ll have seen follow the prepare/execute/actionGet pattern (in Scala): val response = client.prepareIndex("twitter", "tweet", "1") .setSource(jsonBuilder() .startObject() .field("user", "kimchy") .field("postDate", new Date()) .field("message", "trying out Elastic Search") .endObject() ) .execute() .actionGet() This blocks in actionGet(). If you’re developing in a reactive environment (eg. Akka or Play), blocking is a no no. And even if you’re not, it’s a win to be able to avoid blocking so you can parallelize indexing to eek out the best performance. [Read More]

Spark and Elasticsearch

Elastic Sparkle If you work in the Hadoop world and have not yet heard of Spark, drop everything and go check it out. It’s a really powerful, intuitive and fast map/reduce system (and some). Where it beats Hadoop/Pig/Hive hands down is it’s not a massive stack of quirky DSLs built on top of layers of clunky Java abstractions - it’s a simple, pure Scala functional DSL with all the flexibility and succinctness of Scala. [Read More]