Introducing s3grep

Share on:

This is the first in a series of posts introducing some of the tools I’ve developed.

The first is s3grep - parallelized grep for Amazon S3.

The need for this one arose as one recent project processes and stores on S3 large (text) log files. Often to diagnose problems it’s really handy to check direct in the log files. Whilst some tricks with s3cmd and xargs can get you so far, it’s hard to parallelize and seems trickier than it should be.

Hence s3grep was born.

Some quick examples (lifted verbatim from the docs):

Search for straw in bucket ‘mybucket’, keys starting ‘prefix’:

$ s3grep s3://mybucket/prefix straw

Regular expressions:

$ s3grep s3://mybucket/prefix a.+b

The project is available on pypi now:

$ pip install s3grep

And on github: http://github.com/barnybug/s3grep/

Enjoy!