Friday, February 27, 2015

Salt: Combining ACID and BASE in a Distributed Database

This paper appeared in OSDI'14. The authors are Chao Xie, Chunzhi Su, Manos Kapritsos, Yang Wang, Navid Yaghmazadeh, Lorenzo Alvisi, and Prince Mahajan, all from The University of Texas at Austin. Here you can watch a video of the OSDI presentation of the paper, and also find the presentation slides and the paper available as open access.  USENIX knows how to do conferences right.

Dropping ACID

ACID (Atomicity, Consistency, Isolation, Durability) approach provides ease-of-programming through its simple transaction abstraction, but loses on the performance. BASE (Basically-Available, Soft state, Eventually consistent) approach, popularized by the NoSQL systems, provides good performance (low-latency, high-throughput, and scalability), but loses on the ease-of-programming due to increased complexity of concurrent execution in distributed systems.

This paper introduces Salt, which aims to find a best-of-both-worlds middle ground: to provide high performance like BASE with modest programming effort like ACID.

Less is more

Salt approach is motivated by the following observation (which is an instance of the Pareto principle). I am quoting from the paper:
"When an application outgrows the performance of an ACID implementation, it is often because of the needs of only a handful of transactions: most transactions never test the limits of what ACID can offer. Numerous applications [2, 4, 5, 10, 11] demonstrate this familiar lopsided pattern: few transactions are performance-critical, while many others are either lightweight or infrequent." 
Of course, a better quantification than saying "numerous" or "many" applications would make the argument for the Salt approach stronger. However, such an extensive study may take several months, and is unfair to ask for a conference paper submission.

It is tempting to increase the concurrency of those critical ACID transactions by splitting them into smaller mini ACID transactions. However, doing so may not be safe as you would lose Atomicity (which frees you from worrying about intermediate states during failures) and Isolation (which regulates which states can be accessed when transactions execute concurrently). The paper calls this a stark choice and cites the YouTube clip from Butch Cassidy to further drive this point home (I am not kidding).

The paper's proposal is this. Instead of worrying about all possible Atomicity & Isolation complications with all other ACID transactions, Salt categorizes transactions into ACID transactions and a small number of BASE transactions, and lets you worry only about checking Atomicity & Isolation complications between the BASE transactions.

Of course there is an unmentioned complication here. You also have to ensure that these BASE transactions leave the database in consistent state. This is not an easy task as well, and opens up Salt's ease-of-programming to debate. On the other hand, you would have to worry about this for all operations if you implement a full-BASE solution.

The Salt approach

Actually Salt's BASE transactions approach gets its inspiration from the ACID nested transactions. The paper says:
"Nested transactions is an abstraction originally introduced to offer, for long-running transactions, atomicity at a finer granularity than isolation. Our purpose in introducing BASE transactions is similar in spirit to that of traditional nested transactions: both abstractions aim at gently loosening the coupling between atomicity and isolation. The issue that BASE transactions address, however, is the flip side of the one tackled by nested transactions: this time, the challenge is to provide isolation at a finer granularity, without either drastically escalating the complexity of reasoning about the application, or shattering atomicity."

By providing isolation at a finer granularity, Salt allows BASE transactions to achieve high concurrency by observing each other's internal states, without affecting the isolation guarantees of ACID transactions. A BASE transaction is made up of multiple mini-transactions, called alkaline subtransactions. (Yes, the paper makes several Chemistry puns.) The committed state of an alkaline subtransaction is observable by other BASE or alkaline subtransactions. On the other hand, the committed state of an alkaline subtransaction is not observable by other ACID transactions until the parent BASE transaction commits.  As such, Salt guarantees that, when BASE and ACID transactions execute concurrently, ACID transactions retain, with respect to all other transactions (whether BASE, alkaline, or ACID), the same isolation guarantees they used to enjoy in a purely ACID environment.

The proverbial Bank example

When you talk about ACID versus BASE the classical example is the bank money transfer example. The paper gives a pure ACID, a pure BASE, and a Salt implementation of this example.




I wonder, which implementation is closer to the way the money transfers are actually handled in real banks. If someone familiar with how banks handle the money transfer can weigh in on this, that may shed a light on this debate.  My money is on the pure BASE approach.

Something about this bank example is odd though. The total-balance operation is an unrealistic big/monolithic operation. Why did that have to be a transaction? The total-balance operation could be done with a consistent/timed snapshot, no? Then why did it have to be a transactions spanning/freezing the entire system?

Now let's check the three implementations more closely. The implementation in Figure 1.a (ACID) uses only one type of locks, ACID locks. Figure 1.b (BASE) uses only one type of locks, local locks. BASE programming looks like classical distributed system programming, it tries to do everything with local asynchronous actions as much as possible. Figure 2 (Salt) uses Alkaline, Saline, and ACID locks.

Why does Salt need 3 different locks? Because the first alkaline transaction is special, as it determines whether the entire BASE transaction it is included in will succeed or not. If that first alkaline subtransaction succeeds, then comes the saline-lock to make the intermediate states in the rest of the BASE transaction available to other BASE and alkaline transactions. But then, this is something I am still not clear on. Salt attributes significance to the first alkaline subtransaction, so its design should also be special. But I am unclear about what should go in that subtransaction? Are there guidelines to design that first subtransaction so it captures the successful completion of the rest of the BASE transaction?

Evaluation

Like any OSDI/SOSP paper worth its salt (see, I can also make "Salt" puns :-), the paper includes a strong evaluation section. The authors implemented a Salt prototype by modifying MySQL Cluster. They evaluated the prototype using the TPC-C benchmark, which consists of 5 types of transactions: new-order (43.5%), payment (43.5%), stock-level (4.35%), order-status (4.35%), and delivery (4.35%). The results show that by BASE-ifying 2 of these transactions, Salt achieves 80% of maximum throughput of a full BASE implementation.


Notice the left side of Figure 7, which shows that ACID version faster than the Salt version under low load. The paper doesn't discuss this but, if we had both ACID and BASE version of transactions, it may be possible to hot-swap between these two versions to optimize the throughput of the system depending on the current system load.

Related work

The paper fails to cite several papers that proposed orthogonal approaches to deal with the same problem: improving the performance of ACID transactions, while keeping the ease-of-programmability.

The Red-Blue Actions paper (OSDI'12) considered the problem of reducing the granularity of ACID transactions by using a generator operation and shadow operation and categorizing the operations  as red (consistency critical) and blue (fast and eventually consistent). That paper also uses a Bank example.

Then there are work from the Berkeley group on this problem. I had summarized the Invariant-based coordination avoidance paper earlier. There is also the Highly Available Transactions paper  (VLDB'13) where they  look at ACID isolation level relaxing for improving availability.

Discussion

The paper proposes the Salt approach assuming the ACID implementation of the application is already present. The application for this is if you already have an ACID system, you can Salt it to fine-tune the performance. Another approach would be to start with a BASE implementation and to increase atomicity/isolation for a couple operations. Maybe for some applications, it will be easier for going from BASE to Salt: Starting with BASE and then adding Alkaline subtransactions and ACID transactions (if necessary).

The Salt approach do require a learning curve and can be tricky. But how do we argue the complexity of the programming effort needed for one approach (say Salt) versus the other (say pure BASE) in an objective manner? One can objectively show performance improvements with evaluations, but it is harder to argue for "ease of programming" in an objective/quantitative manner.

There can also be some complications with the early-committing (after the first alkaline subtransaction) in BASE transactions. This question came up in the OSDI presentation. If deadlock occurs the MySQL Cluster approach is to use timouts and roll the transactions back. In this case, Salt throws exceptions which the developer need to address to re-achive consistency and atomicity.

Tuesday, February 17, 2015

Book report: "Good Prose" and "How to write a lot"

Notes  from the "Good prose"

This book is written by a journalist turned nonfiction author and his editor.
This is a nice book, and a good read.

"Quiet beginnings: You cannot make the reader love/trust you in the first sentence, but you can lose the reader with a grand proposition in the first sentence."

"To write is to talk to strangers. Prepare the reader, tell everything reader needs to know to read on, but no more."

"For a story to live, it is essential only that there be something important at stake, a problem that confronts the characters or reader. The unfolding of the problem and its resolutions are the real payoff."

"Revelation transforms events into a story."

"Don't mess with chronology unless you have a good reason."

"On the topic of essays:  Essayists tend to argue with themselves. Who am I to write this? Who cares to read this? If I knew my own mind, I would not make essays, I would make decisions. --Montaigne."

A main take-away from the book is the respect these writers show for their craft. They spend countless hours drafting/revising/editing with patience. They spend days discussing about improvements on a draft. The important thing is to get it right, do the right thing. At some point, over a dispute on one word with the editor in chief of Atlantic, they consider quitting their dream jobs there on principle.

Notes from "How to write a lot"

This book aims to help academicians to write a lot. It is a short book and it has a single, simple message: "schedule time for writing 2 hours daily in your calendar, and sit down to write at those times".

"Writing is a skill, not an innate gift or special talent. Like any advanced skill, writing must be developed through systematic instruction and practice. Learn the rules and do deliberate practice."

"Schedule time for writing and stick to it."

"Any action that is instrumental in completing a writing project counts as writing."

"No internet. The best kind of self control is to avoid situations that require self control."

Some recommended reading by the book are: "Writers book of hope" and "Professors as writers". I hope to check these later.

Tuesday, February 10, 2015

Paper summary: Perspectives on the CAP theorem

This is a 2012 short paper by Seth Gilbert and Nancy Lynch that appeared in a special issue commemorating the 12th anniversary of the CAP theorem. Gilbert and Lynch get to write in this special issue because they were the ones to first publish a proof the CAP conjecture put forward by Eric Brewer in PODC 2000 keynote.

In this paper, Gilbert and Lynch aim to situate the CAP theorem in the broader context of a family of results in distributed computing theory that shows impossibility of guaranteeing both safety and liveness in an unreliable distributed system. A quick refresher on specifications is in order. Safety says "nothing bad happens". (You can satisfy safety by doing nothing.) Liveness says "eventually something good happens". Unreliability (aka failure model) says "such and such faults happen".

The impossibility results surveyed in relation to CAP concern slightly different problems and slightly different fault models. While it is easy to confuse CAP with those results on a superficial look, on a closer inspection we see that the results are all distinct and none subsume CAP.

The CAP problem 

The CAP theorem does NOT consider the consensus problem, but considers an easier problem: the atomic read/write register (aka atomic storage) problem. Atomic means that the system provides linearizability, a strong type of single-copy consistency that guarantees that a read returns the most recent version of data. The specifications of this problem are as follows. (The first is the safety property, the second one liveness.)
Consistency: The system provides its clients with a single register (emulated by
multiple storage nodes), and each client can read or write from that register.
Availability: Each request for read or write eventually receives a response.

The FLP (Fisher-Lynch-Patterson) and the attacking generals impossibility results consider the consensus problem. The specifications for consensus are as follows. (The first two are safety properties, the last one a liveness property.)
Agreement: No two process can commit different decisions.
Validity (Non-triviality): If all initial values are same, nodes must commit
that value.
Termination: Nodes commit eventually.

So here is the difference between consensus and atomic storage. Consensus is supposed to dutifully remember a value that is anchored (stored by a majority number of nodes). Consensus is loyal to making that value persist as the committed decision. Atomic storage doesn't have that responsibility. The nodes don't need to commit to a decision value, so the system doesn't need to keep track of and remember whether a value is anchored. The atomic storage system as whole accepts new writes as long as the reads don't return results that betray the single register (i.e., single-copy) abstraction.

And what is the implication of this difference? FLP result declares that even under reliable channels assumption, consensus is impossible to solve in an asynchronous system with node crash failures. For example, Paxos loses liveness because it can not converge to a single leader in an asynchronous model. Did the current leader crash? The failure detector cannot be accurate. If the failure detector incorrectly says that the leader (who is supposed to ensure and remember that a value is anchored) is not crashed, liveness is violated since nodes keep waiting on a failed leader. If failure detector incorrectly says that the leader is crashed, then you have multiple leaders, and liveness is violated because of multiple leaders dueling with forever escalating ballot numbers to get the majority to accept their proposal.

On the other hand, since the atomic storage problem doesn't care about remembering whether a value is anchored, it is oblivious to the dueling leaders clients, and as such it is solvable for crashes of up to half of the nodes with the FLP model (i.e., with reliable channels in an asynchronous system). I had blogged about the Attiya, Bar-Noy, Dolev (ABD) algorithm that achieves this feat.

Now that we know atomic storage problem is solvable with reliable channels with up to minority crashes, what can we say about the atomic storage in the presence of unreliable channels? That is covered by the CAP theorem's fault model, which we discuss next.

The CAP fault model 

We discussed the specifications of the problems considered by CAP, FLP, and attacking generals, but we omitted to talk about another important part of the system specification, the unreliability/fault model.

Above I had introduced the FLP fault model when discussing solvability of consensus versus atomic storage in the FLP model. FLP fault model assumes reliable channels, asynchronous system, crash failure. Of course, by assuming reliable channels, you don't get reliable channels in your deployment. That is just wishful thinking. But since the attacking generals impossibility result proved that consensus is not achivable in the presence of unreliability channels, FLP had to consider reliable channels. Even then, we have disappointment; consensus is also impossible in the FLP model.

CAP does something courageous and considers unreliable channels again (as in the attacking generals fault model) in its fault model. Since CAP is concerned with the atomic storage problem, which is a slightly easier problem than consensus, the attacking generals impossibility result does not subsume the CAP result.

CAP result says that atomic storage problem is also impossible to solve under unreliable channels.

Recall that ABD solved the atomic storage problem in the FLP model. If we move to the CAP fault model and allow partitions, we observe from the ABD algorithm that it blocks (loses availability) for a read or write request that arrives to a node in a minority partition. Just as the CAP says, either consistency or availability has to give.

Here is the proof sketch verbatim from Gilbert-Lynch paper.

Similar to the attacking generals result, the CAP result is oblivious to whether the system is synchronous or asynchronous, and holds in both cases.

What is remaining?

Observe from the CAP proof sketch that the CAP fault model is very rough. When it says unreliable channels, it allows you to assume the worst case (i.e., no message makes it through at all), and prove the impossibility result for that worst case.

What if we quantify and limit the unreliability of the channels to more realistic scenarios. Can we prove more refined versions of CAP? What would be the consistency level a system can provide if the system model allows eventual message arrival? A recent technical report from University of Texas Austin, "Consistency availability convergence" paper, looks at that problem. We will discuss that paper next in our distributed systems seminar.

More about CAP tradeoffs

The Gilbert-Lynch paper discusses some of the practical implications of the CAP theorem and says that Consistency versus Availability should not be seen as an absolute and binary tradeoff. Instead you can consider shades of Consistency versus Availability. Also you can make different Consistency versus Availability tradeoffs at the data level, operation level, and subsystem level. These observations are very similar to the suggestions made in Eric Brewer's article in the same special issue: "CAP 12 years later, how the rules have changed".

The Gilbert-Lynch paper also mentions the scalability problems caused due to trying to enforce consistency, but leaves that discussion as future work. PACELC model by Daniel Abadi provides a more detailed explanation for Low-latency versus Consistency tradeoffs in the absence of partitions.

Saturday, February 7, 2015

How to present your work

Presentation is a very important skill. Determining how to present and communicate your results in an effective manner is  as important as doing the research and getting those results. From the same material you can get a killer job talk or a bummed dissertation talk. Presentation skills can make the difference.

Presentation is not a soft skill despite the common misconception among many technical people. It takes a lot of brains, analyzing, and synthesizing to produce a good presentation. You have to understand and internalize your content very well in order to present it clearly in the most engaging and compelling way. So if you fail to present your work clearly, that reflects poorly on you and your work. The audience will think "the work doesn't look significant or promising", or even worse "the presenter doesn't truly understand/internalize his work".

The most essential requirement for a successful presentation is practice. I observe that our graduate students are not good at presenting because they don't get enough practice and rehearsals. As with any skill, you will learn about presenting best by doing it. So don't waste any opportunity to talk about your work. Talk about your work to your relatives, who won't understand. (This is very useful, because you get to think of how to explain your contributions in the context of the real world out there.) Tell it to your friends not in the same field. Tell it to your lab mates. Give talks in the department seminars. You need to learn how to get feedback and gauge your presentation. Eventually, you will get to present your work in conferences.

The "Talk like TED" book 

I recently picked up this book from a library. (Yes, a physical library.) The book was a very easy, lightweight reading. The book's outline consists of the 9 tips it offers for giving successful TED-like talks. It argued that good talks need to be emotional, novel, and memorable, and categorized the 9 tips under those 3 headings.
Emotional:
+ unleash your passion
+ tell a story
+ have a conversation
Novel:
+ teach something new
+ deliver WOW moments
+ use humor
Memorable:
+ keep it short
+ paint a mental picture
+ be authentic

This may be my own failing, but I didn't learn/benefit much from the book. (Shall I say, the book was not very emotional, novel, or memorable?) If you had done some reading about public speaking before, this book does not offer much new insights or content. This book could have been condensed to a long blog post.  For example, read this article by Chris Anderson instead.

Maybe the most interesting take away from the book is how much practice the presenters put into the talk. I knew TED speakers rehearsed a lot, but I was still surprised to learn how much. Six months before the talk, the presentation draft is there. Then it is all practice rehealsals and tweaking to refine and fine-tune. One month before the talk, the final form of the presentation is ready.

Keep your eyes on the message 
I think the best presentation advice I received was from another TED, Ted Herman. After hearing me speak as a graduate student on one of my work, he told me: "Focus on the message. Give your presentation with the sole goal to teach, and not to impress."

If you worry about impressing the audience in a presentation, then your ego gets in the way, you get self-conscious. You start questioning "Did I say that right? Was I too slow? Am I perceived as confident enough? etc." This will get you nervous and self-doubting. If you focus on the message, you will get your point across, even if it takes flapping your arms wildly. And if your content is good, it will impress people after all. As Anderson says "Presentations rise or fall on the quality of the idea, the narrative, and the passion of the speaker. It’s about substance, not speaking style or multimedia pyrotechnics."

Focusing on the message is hard to do using Powerpoint/Keynote. Powerpoint makes it easy to lose the focus on the message. By writing slide after slide and getting things stylistically right (which Powerpoint facilitates immensely), you get the false-sense of security that you are communicating the content. To avoid this and to force yourself to focus on the message/content, you should prepare the story and outline of the talk first in your mind, before sitting down to prepare the slides. How would you give this talk if you were forced not to use any slides. Thinking this way and purposefully omitting the slides as crutches will help you learn, discover, and hone your message. (More on this below, where I talk about framing the talk.)

To reiterate what I said in the beginning of the post, presentation is not a soft skill and requires a lot of brains. In order to produce a clear and condensed message, you need to learn how to abstract the most important lessons from all the work you did. This requires you to first process and internalize your work really well. You should omit a lot of incidental details, and provide a condensed message in a conference presentation or job talk. These talks are there to whet people's appetites and get them to read your papers. This does not mean that these talks should omit technical content. On the contrary they should include technical content that communicates the essence of your technical contribution in a clear and accessible manner.

Frame your talk well

And from my PhD advisor, Anish Arora, I learned how to frame a talk well. Framing means finding the right place to begin and developing the right story and context to pitch the ideas.

When we meet to discuss how to present our work, Anish would declare "let's try to understand what we did in this work". Wow! He was totally comfortable accepting that we could understand our work better, and the context at which we started and performed the work is not necessarily the best context with which we present the work. From Anish, I learned that it is OK to search for a new perspective and a better context to frame and present the work.

In fact, your best presentations are which you also learn and get a new appreciation of your work. If you get that new understanding of your work, that is a good indication that you did enough to frame your work, and you achieved a good understanding and focusing of your message.

Related links

How to package your ideas using the Winston star

Presentation skills is somewhat related to writing skills. So most of the advice here also applies to writing. Similarly some of the advise on writing also applies to preparing a presentation.
How to write your research paper
How I write

What is the biggest rock?
You can get the style right, and give a content-free and superficially interesting talk. But as Lincoln said: "You can fool all the people some of the time and some of the people all of the time, but you cannot fool all the people all the time."