Entries in Cascalog (4)

Saturday
May082010

News Feed in 38 lines of code using Cascalog

In this tutorial for Cascalog, we are going to create part of the back-end for a simplified version of a Facebook-like news feed. In doing so we are going to walk through an end-to-end example of running Cascalog on a production cluster. If you're new to Cascalog, you should first look at the introductory tutorials here and here.

The code and sample data for the example presented in this tutorial can be found on Github.

Click to read more ...

Friday
May072010

Cascalog Presentation at Bay Area Clojure User Group

Here are the slides from my presentation about Cascalog at the Bay Area Clojure User Group last night:




Tuesday
Apr272010

New Cascalog features: outer joins, combiners, sorting, and more

In the first tutorial for Cascalog, I showed off many of Cascalog's powerful features: joins, aggregates, subqueries, custom operations, and more. Since Cascalog's release a couple weeks ago, I've added a number of new features to Cascalog that seriously increase the expressiveness and performance of the language without compromising its simplicity or flexibility.

Click to read more ...

Wednesday
Apr142010

Introducing Cascalog: a Clojure-based query language for Hadoop

I'm very excited to be releasing Cascalog as open-source today. Cascalog is a Clojure-based query language for Hadoop inspired by Datalog.

Highlights

  • Simple - Functions, filters, and aggregators all use the same syntax. Joins are implicit and natural.
  • Expressive - Logical composition is very powerful, and you can run arbitrary Clojure code in your query with little effort.
  • Interactive - Run queries from the Clojure REPL.
  • Scalable - Cascalog queries run as a series of MapReduce jobs.
  • Query anything - Query HDFS data, database data, and/or local data by making use of Cascading's "Tap" abstraction
  • Careful handling of null values - Null values can make life difficult. Cascalog has a feature called "non-nullable variables" that makes dealing with nulls painless.
  • First class interoperability with Cascading - Operations defined for Cascalog can be used in a Cascading flow and vice-versa
  • First class interoperability with Clojure - Can use regular Clojure functions as operations or filters, and since Cascalog is a Clojure DSL, you can use it in other Clojure code.

Click to read more ...