« How to get a job at a kick-ass startup (for programmers) | Main | 5 Tips for Thinking Under Uncertainty »
Tuesday
Aug312010

Clojure or: How I Learned to Stop Worrying and Love the Parentheses

I'm a longtime Java, Ruby, and Python programmer. Yet Clojure is the first language I've used that I truly enjoy using on a daily basis.

Clojure is a special language. There have been many attempts to articulate the benefits of Lisp-based languages before, but most of these attempts seem to end in futility. Until you use the language, it's hard to understand why functional programming, macros, and immutability are such a big deal.

So I'm going to take a different approach in explaining the virtues of Clojure. I'm going to start off somewhat unusually by talking about SQL, show how querying is done fundamentally differently in Clojure, and transition from there into a broader discussion about domain specific languages, accidental complexity, and how Clojure solves problems that have plagued programmers throughout programming's history.

The problem with SQL

SQL is a language for querying relational databases. SQL is one of the most successful technologies ever, but very few technologies can claim to have led to as much unnecessary complexity as SQL.

SQL solves the problem of querying a relational database in a concise, expressive manner. In that regard, SQL does a very good job.

The problem with SQL is that it's a custom language. Using SQL from other languages causes a host of other problems - problems that are orthogonal to querying a database. These problems are examples of accidental complexity, complexity in an application caused by the tool used to solve a problem rather than the problem itself.

The prime example of accidental complexity caused by the nature of SQL being a custom language are SQL injection attacks.

SQL injection has nothing to do with querying databases

SQL injection results from using one language from within another by doing string manipulation. This has nothing to do with querying databases, it's an integration problem. As we'll see later, it's a problem we can avoid in Clojure.

I know what you're thinking. "There are X, Y, and Z libraries for parameterizing SQL queries and avoiding SQL injection attacks!" This begs the question: then why are SQL injection attacks so pervasive?

The obvious answer is that string manipulation is the most straightforward way to use one language within another. There's something wrong with your tools when the obvious, straightforward way to do something causes major security problems.

There are other problems that arise from using one language within another. The embedded language is second class. The usual techniques programmers use to reduce program complexity, modularization and composition, can't be fully applied to the embedded language.

Similar problems arise when generating HTML - cross site scripting attacks are very pervasive. This is due to the same problems that occur when trying to use one language from within another.

Clojure lets you create integrated languages

There are some serious issues that arise when using distinct languages together. Yet the languages are distinct for a reason - they're intended for different problems and operate with completely different mental models.

What if you could fully integrate the query language into your general purpose programming language? What if queries were first class and could be manipulated as such?

Say hello to Clojure (and Lisps in general).

In Clojure, you can extend the language within the language to create domain specific languages. These mini-languages are fully integrated into Clojure and can be manipulated like anything else in Clojure. Most importantly, you get the benefits of a custom language - conciseness and expressiveness - without the accidental complexities.

There's a number of reasons why building mini-languages is possible in Clojure. These include the "code as data" philosophy of the language, macros, closures, and the emphasis on functional programming.

An example of an integrated query language for Clojure

I wrote an integrated query language for Clojure called Cascalog. Cascalog is a query language for Hadoop clusters, but a very similar library could be built for querying relational databases.

Cascalog forgoes the syntax-heavy design of SQL in favor of the syntax-light design of Datalog. Here are some examples of what Cascalog looks like, compared against the equivalent SQL queries. Remember, Cascalog is a library for Clojure that has the look and feel of an embedded language:

Teaching Cascalog is not the goal of this article, so don't worry if you don't fully understand the Cascalog queries. I just wanted to show that Cascalog is just as concise and declarative as SQL. To learn more about Cascalog, see the introductory tutorial.

The key difference between Cascalog and SQL, of course, is that Cascalog is an embedded language within Clojure. The first class integration between Clojure and Cascalog avoids accidental complexity and lets us use techniques that are otherwise restricted. Queries written with Cascalog, unlike SQL, can be modularized and composed in all sorts of useful and interesting ways.

For example, you can make functions that return subqueries:

You can parameterize your queries without needing to explicitly say that you're doing so. You just use variables in your query like you were passing them to any other function:

You can pass a subquery to a function to use in another query:

You can compose operations together to create new operations. Here's how to define the "average" aggregator in terms of count, sum, and division:

"average" can then be used like any other operation, as in the following query which determines the average age of people in the dataset:

Mold Clojure to your problem

Linq is an integrated query system for C#. It exists to solve the integration problems I discussed when using a query language from within a general purpose language.

There's one huge difference between Cascalog and Linq: Linq is part of C#. You can't define Linq in terms of regular C#, it needed to be added by the language designers. Cascalog, on the other hand, needs no special support from Clojure. Cascalog is a regular Clojure library.

This means that you can define DSL's in Clojure yourself but won't get any help from C#. I've created lots of mini-languages, optimized to my problem domains.

Clojure has a relentless focus on minimizing accidental complexity

The ability to make embedded languages from within Clojure is just one example of Clojure's relentless focus on minimizing accidental complexity.

Clojure has a very opinionated approach to mutable state, another big source of accidental complexity. Looking back on my Java programming days, I'm amazed at how much of my programming time involved controlling when and how the states of objects were modified.

Clojure prefers immutable data and forces the programmer to be explicit about manipulating state. Clojure makes explicit the difference between a value (an immutable piece of data) and an identity (an entity whose value changes over time).

Concurrency can also be a huge source of accidental complexity. Locks and semaphores are not the right abstraction for a large number of concurrency problems. Clojure has a number of concurrency primitives baked in such as software transactional memory, futures, and promises. These primitives are higher level than locks and more appropriate for many problems (although it's worth saying there are some problems where locks are appropriate). Clojure's concurrency features are fully integrated with how it handles mutable state.

You should watch this excellent talk by Rich Hickey where he talks in depth about Clojure's philosophy on state, value, and identity.

Conclusion

A lot of people talk about how wonderfully expressive is Clojure. However, expressiveness is not the goal of Clojure. Clojure aims to minimize accidental complexity, and its expressiveness is a means to that end.

You should follow me on Twitter here.

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>