Interview with "Programmer Magazine"
I was recently interviewed for "Programmer Magazine", a Chinese magazine. The interview was published in Chinese, but a lot of people told me they'd like to see the English version of the interview. Due to the Google translation being, ahem, a little iffy, I decided to just publish the original English version on my blog. Hope you enjoy!
What drew you to programming and what was the first interesting program you wrote?
I started programming when I was 10 years old on my TI-82 graphing calculator. Initially I started programming because I wanted to make games on my calculator – and also because I was bored in math class :D. The first interesting game I made on my calculator was an archery game where you'd shoot arrows at moving targets. You'd get points for hitting more targets or completing all the targets faster. A couple years later I graduated to programming the TI-89 which was a huge upgrade in power. I remember how the TI-82 only let you have 26 variables (for the characters 'a' through 'z') and thinking how incredible it was that the TI-89 let you have as many variables as you want.
What do you do to improve your skills as a programmer?
I get better by doing a lot of programming and trying new things. One of the best ways to become a better programmer is to learn new programming languages. By learn I mean more than just learning the syntax of the language, I mean understanding the language's idioms and writing something substantial in it. For me, learning Clojure made me a much better programmer in all languages.
Could you talk about your experience before joining BackType?
I got my bachelor's and master's in Computer Science at Stanford University with a focus on software theory. So I did a lot of algorithms and proofs and so on. Probably the best thing I did at Stanford was choose classes not so much by the subject material but by the professor. When I found a professor who was a great teacher I would take as many classes with that professor as possible. For example, one of the greatest teachers I've ever had is Professor Tim Roughgarden. I took a bunch of "algorithmic game theory" classes with him – algorithmic game theory is basically the intersection of economics and computer science. I took the classes not so much for the material but to improve my problem solving skills. Professor Roughgarden had an incredibly coherent and disciplined way of breaking down extremely difficult problems and making them easy to understand. Learning those skills has made me a much better problem solver in all scenarios, as well as being a much better communicator of difficult concepts.
You once said, leaving Twitter is a tough decision, could you please tell us why you decide start your own company? What object do you want to achieve?
I had a pretty great situation at Twitter, having my own team and working full-time on a project I started. But when I thought of the idea for my company, it was so compelling I just couldn't stop thinking about it. So I felt that if I didn't start this company, I would regret it for the rest of my life.
What are the main lessons you learned in the last few years of your professional career?
Feedback is everything. Most of the time you're wrong, and feedback is the only way to realize your mistakes and help you become less wrong. This applies to everything. In product development, get your product out there as soon as possible so you can get feedback and see what works and what doesn't. In many cases you don't even need to build anything – a link to a "feature" that actually goes to a survey page can give you the feedback you need to test your idea.
In managing a team, it's really important to have feedback on all the processes you do. At BackType we'd have a once a month meeting to discuss our processes and whether they are effecive or too restrictive. This caused us to introduce standups, and then remove standups when we didn't feel they were that useful to us. We used that process to go from monthly meetings to biweekly meetings to weekly meetings, then back to biweekly meetings.
In your blog you said "I'm always happy to give advice or connect with people doing interesting work", what interesting projects you have seen, and which suggestions you provide?
The founder of Insight Data Science approached me when he was starting it, and I think it's an absolutely terrific program. They provide a 6 week bootcamp to help math/science/physics PhD's learn programming skills so they can start a career in data science. Basically the program recognizes that there is a surplus of very smart people who don't necessarily have the most interesting job prospects, while there is a booming tech industry with a huge talent shortage of data scientists. So they bridge that gap. I was able to help them out with a couple things and I think their execution has been very impressive.
What prompted you to write the book Big Data and what problems you want to solve? Writing a book will take a long process, what you have learned during this process?
I had developed a lot of theory and best practices about architecting big data systems that no one else was talking about. People were focused on very specific use cases, whereas I had developed rigorous, holistic approaches. A lot of the things I talk about, like being resilient to human error (something I consider to be absolutely non-negotiable) are ignored by the vast majority of industry. I think the industry will be much better off by building these systems more rigorously and less haphazardly, and I felt that this book was the right way to effect that change.
I knew that writing a book would be a lot of work, but it turned about to be signicantly more work than I expected. I think my book is especially challenging because it's such a huge subject. At one point I had half the book written, but I realized I was taking the wrong approach to communicating the material so I scrapped everything and started over. It was definitely worth it though because based on the feedback I get from readers they love the material and really get what I'm trying to communicate.
My editors have been absolutely invaluable in the writing process and have helped me become a much better writer. I've learned that the way I was taught in school to write is actually the complete opposite of effective communication. I was taught to make your general "thesis" statement up front, and then drill down into that general statement with supporting points and eventually specific details. It turns out that this forces the reader to do a lot of work to synthesize everything you're saying. They won't grasp the thesis up front – because they haven't read the supporting points yet. So after drilling down the reader now has to drill back "up" to connect everything. It's a convoluted way to achieve an understanding of something. A much better way to communicate is to tell a story – start with a situation the reader already understands, and then connect step by step to the ultimate general statement you want your reader to understand. Specific to general is always better than general to specific.
You contributed a lot open source projects, what makes you believe in open source?
Open source benefits so many people in so many ways. When you're a startup, you're highly resource-constrained, so being able to take advantage of the work other people have done is a godsend. Lowering the cost of doing startups, of course, is highly beneficial to society. When you benefit that much from open source, you do feel obligated to give back as well. On top of that, when you open source software as a company, you benefit from other people trying it out, finding issues, and improving your software "for free".
On a personal level, open source has given me an opportunity to interact with an entire world of developers, rather than just those in whatever company I happened to be at. This has been hugely beneficial to my career, allowing me to get to know tons of awesome people and travel the world to speak at conferences.
Which person has influenced you the most?
Philosophically I'd have to say the most influential person to me is Carl Sagan. I've read most of his books and find them hugely inspirational. I think he was one of the greatest communicators of all time, and what impresses me most about him is his extreme empathy towards his audience. For example, he has quite a bit of writing about science vs. religion – but as a scientist he is not hostile towards religion or anything like that. He understands why people are religious and the value they get out of it. So when he communicates the value of science and skepticism to religious people he starts with religion = valuable as a starting point. That degree of empathy is really rare, and it's something that I'm continuously trying to improve at. He taught me that empathy is the basis of good communication.
What's the key points John McCarthy told you about his life and perspective? How these words affect the rest of your life, what is your own life and perspective?
I talked with John McCarthy for two hours when I was a sophomore in college. The most striking thing he told me was when I asked about the history of Lisp. He told me he needed a better programming language for doing AI research, so he invented Lisp for that purpose. He really didn't seem to care that much about programming languages – his real passion was AI. It struck me as exactly like how Isaac Newton invented calculus because he needed it for his physics work. The pebbles of giants really are big boulders.
When designing a software system, which process you will use? (the fist step, the second step...)
I think designing a software system is entirely about learning what to build as you go. I use a technique which I call "suffering-oriented programming" in order to maximize learning and minimize wasted work. I detailed this approach on my blog. The general idea is to avoid making "general" or "extensible" solutions until you have a very deep understanding of the problem domain. Instead you should hack things out very directly to get a working prototype as fast as possible. Then you iterate and evolve and learn more about the problem domain. Once you have a good understanding of the intricacies of the problem domain, then you can redesign your solution to make it more general, extensible, etc. Finally, at the end, you wrap things up by tightening up the code and making performance optimizations. The sequence is "First make it possible. Then make it beautiful. Then make it fast."
Do you have any principle when you programming?
I believe strongly in immutability and referentially transparent functions as ways to vastly simplify software. Mutability creates a web of dependencies in your code – things that can change other things which then change other things – which gets hard to wrap your head around. Code is all about being able to understand what's going on, so anything you do to make that easier is a good thing. Immutability is one such technique to reduce what you need to understand about a particular piece of code to grasp it. Additionally, referentially transparent functions only depend on their arguments (and not on any other state), so they are also easier to understand.
Another important principle I live by is "my code is wrong". I think it's pretty clear that we don't know how to make perfect software – all the code I've ever used or written has had bugs in it. So I assume my code is wrong and design it to work anyway (with higher probability, at least). I've detailed techniques to accomplish this on my blog and in my conference talks this year.
Compared with your early years, what is the biggest change when you programming today?
Since I started off programming graphing calculators, I'd say the biggest change is using full-fledged keyboards to program instead of those tiny keypads :)
Storm development is in a very short time, by few developers, under a limited budget and urgent requirements, what is your secret about how could be so efficient like this?
Storm is the result of following that "suffering-oriented programming" methodology. We didn't jump into Storm out of the blue – we had been doing realtime computation at BackType for a long time by stringing workers together manually with queues. So we had a really solid understanding of our needs for realtime processing. Storm at its core is just a simple set of abstractions with a clever algorithm behind the scenes to guarantee data processing. When I thought long and hard about the realtime problems we were dealing with at the time, the design of Storm was obvious. Additionally, I had a ton of experience with Hadoop and knew some of the mistakes made in the design of that system, so I applied that learning into Storm to make it more robust.
You have interviewed a lot of programmers, what are the best programmers in common do you think?
The best programmers are obsessed with improving as programmers. They love exploring new programming languages and new ideas. Another key trait of great programmers is a "getting stuff done" mentality. It's far more important to get something working than to make the perfect design. Plus a great programmer recognizes that you can't make a perfect design without first having something working that you can learn from.
Are there any myths (layman think something right, but expert do not think so) and traps in Data System and Big Data?
Probably the biggest misconception I see is people placing the relational database, and associate concepts like CRUD, on a pedestal. People treat the RDBMS as if its the ultimate in database technology, and everyone seems to be trying to recreate the RDBMS to work in Big Data land. But this ignores massive problems that have always existed with the RDBMS: they're based on mutability so are extremely susceptible to corruption whenever there's a bug or a human error, and they force you into a horrible situation of needing to either normalize your schema and take performance hits, or denormalize your schema and create a maintenance nightmare (among other problems). When you actually look at data systems from first principles, as I do in my book, you see that there's different ways of architecting data systems that have none of these complexities.
What problem you want to solve in BackType that lead you decide to start design Storm?
There were two problems. The first was how to keep our databases containing social media analytics stats up to date in realtime in a reliable way. The second was the "reach problem" – how to compute the "reach" of a URL on Twitter very quickly. The "reach" is the unique count of all the followers of all the people who tweeted a URL. It's very computationally intensive and hard to precompute. Storm turned out to be a simple abstraction which unified these seemingly unrelated use cases.
What reason or experience make you sure that you will successfully build Storm?
The key was that we had tons of experience with realtime computation so knew the problem domain very well. So there was really no question in my mind that Storm would be successful because I had already learned a majority of the little gotchas.
Why you choose Clojure as the development language of Storm? Could you talk about your long practical experience about using this language (like its advantages and disadvantages)? Which feature won't appear in the Storm, if you were not using Clojure?
Clojure is the best language I've ever used, by far. I use it because it makes me vastly more productive by allowing me to easily use techniques like immutability and functional programming. Its dynamic nature by being Lisp-based ensures that I can always mold Clojure as necessary to formulate the best possible abstractions. Storm would not be any different if I didn't use Clojure, it just would have been far more painful to build.
From your blog, I saw you advocate a lot about writing. Could you share us what do you do in improve your writing skill?
The only way to improve at writing is to write a lot. When other people read my writing and give me feedback, like by commenting on my blog, I carefully think about where that comment came from. If they misunderstood something then that means I'm not communicating correctly – either I'm not clear or I'm not properly anticipating reader objections (whether or not those objections are fallacious is irrelevant). By understanding why my message doesn't get through, I'm able to do a better job the next time.
I also read a lot and try to learn from great writers. As I've mentioned Carl Sagan is one of my favorite writers and I've learned tons from reading him – and I continue to learn tons from him everytime I read his work.
You start using Emacs recent years, could you talk about the programming tools you choose and how they impact you?
I started using Emacs because I found it to be the best environment for programming Clojure (due to its Lisp background). I've been really impressed with how powerful of a tool it is and how much it can be customized to my needs. On top of that, since it was originally written so long ago, it has an incredibly small resource footprint. That's something I really enjoy because modern IDE's tend to be such resource hogs.
Other than that, I think my setup is pretty simplistic. I use a live REPL in my Emacs for exploratory development and interactive testing. I also have tons of text files on my computer with design notes and ideas. For my todo list I literally just use a text file.