Sunday, August 31, 2014

Sudoku and research

I got addicted to Sudoku... again. When I have a chance to rest, I enjoy the challenge of solving Sudoku puzzles (*cough* at the expert level) on my iPhone. I think this practice gives my mind sharpness and clarity (my wife would roll her eyes so hard if she heard me say this :-).

I was recently thinking of how the process of solving a Sudoku puzzle resembles doing research.

1) Sequencing is important. In Sudoku, you take things step by step. You fill out the obvious cells first. Having filled these, you now have more constraints/clues upon which you can fill in other blocks. You have to solve Sudoku step by step from most obvious to what is made obvious having finished that step.

This is also the case in research. You can't rush things; you should start with the simple steps. First you have to attack/complete what you can do currently, so that more things can become available for you to see. You have to climb the stair step by step to see more and do more.

2) Writing is important. You take notes on the Sudoku cells to give you clues, e.g., 4 can go into this cell or this cell. These clues eventually lead to constraints and to solutions. Without taking notes, you wouldn't be able to make any progress on hard Sudoku puzzles.

You are all computer scientists.
You know what FINITE AUTOMATA can do.
You know what TURING MACHINES can do.
For example, Finite Automata can add but not multiply.
Turing Machines can compute any computable function.
Turing machines are incredibly more powerful than Finite Automata.
Yet the only difference between a FA and a TM is that
the TM, unlike the FA, has paper and pencil.
Think about it.
It tells you something about the power of writing.
Without writing, you are reduced to a finite automaton.
With writing you have the extraordinary power of a Turing machine.
(From Manuel Blum's advice to graduate students)

Similarly, writing is very important for research. It leads the way for you. You start writing as you start the research work, and before you do the work/experiments. I think I said this many times before, so I will leave this at that. (How I write, How to write your research paper, My advice to graduate students)

3) Perspective is important. In Sudoku, when you are stuck, you change your perspective and look for alternatives, because there is always another easier way to look at the situation and get unstuck.
A change in perspective is worth 80 IQ points.
Alan Kay.

(Again from Manuel Blum's advice to graduate students)
CLAUDE SHANNON once told me that as a kid, he remembered being stuck on a jigsaw puzzle.
His brother, who was passing by, said to him:
"You know: I could tell you something."
That's all his brother said.
Yet that was enough hint to help Claude solve the puzzle.
The great thing about this hint... is that you can always give it to yourself !!!
I advise you, when you're stuck on a hard problem,
to imagine a little birdie or an older version of yourself whispering
"... I could tell you something..." 
I once asked UMESH VAZIRANI how he was able,
as an undergraduate at MIT,
to take 6 courses each and every semester.
He said that he knew he didn't have the time to work out his answers the hard way.
He had to find a shortcut.
You see, Umesh understood that problems often have short clever solutions.

In research, ... yup, you need to learn to change your perspective, and try different point of views.

4) Finally perseverance is important. In Sudoku, you learn patience and perseverance as you consider different things to make some progress. As you put in more time practicing Sudoku puzzles, you start noticing patterns, and you learn to solve those cases faster. You also develop intuition, which makes you get better at sequencing. In research, patience, perseverance, and practice are also essential.
Whatever you do, you got to like doing it....
You got to like it so much that you're willing to think about it, work on it, long after everyone else has moved on.
(Again from Manuel Blum's advice to graduate students)

Heeding my own warning on reverse scooping, I googled for "Sudoku and research" and found this nice post, which has made similar connections.
After doing a few [Sudoku puzzles], it struck me that these puzzles are a good analogy for the way science research is done. Thomas Kuhn in his classic book The Structure of Scientific Revolutions points out that normal scientific research within a paradigm is largely a puzzle solving exercise in which there is an assurance that a solution exists to the problem and that it is only the ingenuity of the scientist that stands between her and a solution. The sudoku problem is like that. We know that a solution of a particular form exists and it is this belief that makes people persevere until they arrive at a solution.

Tuesday, August 12, 2014

Using TLA+ for teaching distributed systems

I am teaching CSE 4/586 Distributed Systems class again this Fall (Fall 2014). This is the course I have most fun teaching. (I would like to think my students also feel that way :-) I teach the course with emphasis on reasoning about the correctness of distributed algorithms. Here are the topics I cover in sequence:

  1. Introduction, Syntax and semantics for distributed programs, predicate calculus
  2. Safety and progress properties
  3. Proof of program properties
  4. Time: logical clocks, State: distributed snapshots
  5. Mutual exclusion, Dining philosophers
  6. Consensus, Paxos
  7. Fault-tolerance, replication, rollback recovery, self-stabilization
  8. Programming support for distributed systems
  9. Data center computing and cloud computing 
  10. CAP theorem and NOSQL systems
  11. Distributed/WAN storage systems

I put emphasis on reasoning about distributed algorithms because concurrency is very tricky; it truly humbles human brain. More than 3 actions in a distributed program and your intuitions will fail, you won't be able to hand-wave and apply operational reasoning on the program. You may think you could, but you would be very wrong (I know from first-hand experience).

I use invariant-based reasoning of program properties for the first 4 weeks exclusively. But this becomes less applicable when we move into more involved protocols in weeks 5 and beyond. This is where I give up being rigorous and make tell the class: "We could push things down the most rigorous invariant-based reasoning and predicate calculus level but we don't. Instead we give arguments in English, with the appreciation of how these arguments correspond to the proof rules in previous chapters." Yes, this is not very satisfying, but I didn't have much choice.


So for these reasons, the AWS TLA+ article got my attention recently. The article talked about how AWS successfully used invariant-based reasoning and formal methods (in particular TLA) for building robust distributed systems. TLA is a tool for specifying distributed algorithms/protocols and model checking them. AWS used TLA in many key projects: S3, DynamoDB, EBS, and a distributed lock manager. Here is the technical report by AWS. It is a very good read.

TLA+ is Leslie Lamport's brainchild. Of course you know Lamport if you are working on distributed systems. Lamport got a Turing award in 2013; he is famous for logical clocks, Paxos, and several other influential results in distributed systems. As a side-project, he wrote a wrapper around Knuth's TeX, called LaTeX ("La" for Lamport?), which is still the typesetting tool for almost all math/CSE academic papers. Lamport has always been a firm proponent of invariant-based reasoning for distributed algorithms and it seems like he has been dedicating most of his effort on prostelyzing TLA in recent years.

There are other successful model checkers (Spin, SMV, Promela), but TLA is more focused on supporting distributed algorithms reasoning. In addition, the PlusCal language (in the TLA+ toolkit) provides a high-level pseudo language to write distributed algorithms easily.

How I went about learning TLA

This was a straightforward and easy process. This is the main page for TLA, where the other pages can be reached. To download the toolkit, I first went to this page which forwards to this download page.

Then I downloaded the Hyperbook and started following it. The chapters were all straightforward for me, because this is very similar to the material I teach in my 486/586 class for invariant-based reasoning of distributed algorithms. The hyperbook has a lot of examples and is the best place to start learning TLA.

For the PlusCal language reference I downloaded this.

After I got the hang of it,  I decided to get my hands dirty with my own toy programs. I wrote TLA+ specifications for some simple coffee bean problems.  Then using PlusCal, I wrote specifications for Dijkstra's stabilizing token ring algorithm. First without using process abstraction, then with the process abstraction when I finished Chapter 7 in Hyperbook. Finally I wrote specifications for Dijkstra's 3-state and 4-state token ring algorithms, which progressed very smoothly. Next, I will use it on Paxos (here is a TLA+ specification of epaxos) and my own work.


The guarded-command language I use for teaching 4/586 translates very easily to PlusCal, so TLA+ is a good fit for my course. I will start using it in my 4/586 class this coming semester. I think the students will enjoy having hands-on experience with reasoning about non-toy distributed protocols.


My experience with using TLA+ in distributed systems class