Just opensourced a starcluster plugin for provisioning elasticsearch clusters automatically in the cloud.
What is Starcluster? Starcluster is great for spinning up clusters of ec2 nodes quickly for some analytics / data chomping.
If you’re not familiar take a look at this very cool screencast:
Starcluster and …
Due to the recent outage on GoDaddy a lot of people are reconsidering their DNS options. Amazon Route53 is a great option - cheap, flexible and well proven.
To migrate you first need to export a zone file for the domain from GoDaddy.
It’s been highlighted the zone files are slightly broken in CNAME records, so …
Frequently with search based, big data projects the problem of content duplication is an obstacle to having a clean data source. Here’s an approach to improving the data quality by training a classifier to spot duplicates.
The Problem The data set has about 470,000 non-unique hotel descriptions (e.g. name, …
Made a screencast (my first!) of iboto to give a demo of the Amazon EC2 multi-account coolness:
Incidentally, making screencasts on Linux was a bit of a slog until I found the right tools and workarounds, so that might make the subject of a blog post of its own.
My house tweets. It speaks. It sends me jabber messages. This seems to be a constant source of amusement for guests. This ‘house of the future’ was built with a small amount of DIY electronics and some software hacking, on a shoestring budget.
Over time I’ve added more systems to the house and it is …
Amazon have launched a neat new Route 53 feature: latency-based routing. The idea behind this is when someone hits www.yoursite.com this resolves to the closest server to them, cutting latency.
This DNS cleverness has been used by the big boys for some time, but not been available to us mortals without shelling out big …
If you’ve used Amazon webservices much at all you’ll probably have come across their DNS service route53. This offers very competitively priced DNS hosting on the Amazon cloud.
$ pip install cli53 The first step everyone migrating commonly goes through is getting their existing zones into the …
** Note: I’ve replaced jekyll with the equally adapt pelican now. **
This article describes how to host your own static blog/site on S3. It revolves around the evolution this site has taken.
First off I started using github’s public site feature. Dead neat, nice set of features and so quick to get running. …
This is the first in a series of posts introducing some of the tools I’ve developed.
The first is s3grep - parallelized grep for Amazon S3.
The need for this one arose as one recent project processes and stores on S3 large (text) log files. Often to diagnose problems it’s really handy to check direct in the …
cli53 Command line script to administer the Amazon Route 53 DNS service
iboto Amazon EC2 shell for managing multiple accounts and regions easily
elasticsearch* An elasticsearch starcluster plugin
s3grep Parallelized grep for Amazon S3
cloudily Automatically visualize your EC2 infrastructure