Velocity 2010: Cassandra Workshop

My second session is a workshop about the NoSQL program Cassandra.  Bret Piatt from Rackspace gave the talk.  As always, my comments are in italics. Peco and Robert joined me for this one.

It’s not going to cover “why Cassandra/why NoSQL?” Go to slideshare if you want ACID vs distributed data stores handwringing.  It’ll just be about Cassandra.

He has a helpful spreadsheet on Google Docs that assists with various tasks detailed in this presentation.

We’ll start with the theory of distributed system availability and then move on to model validation.

Distributed System Theory

Questions to ask when designing your Cassandra cluster.

  1. What’s your average record size?
  2. What’s your service level objective for read/write latency?
  3. What are your traffic patterns?
  4. How fast do we need to scale?
  5. What % is read vs write?
  6. How much data is “hot”?
  7. How big will the data set be?
  8. What is my tolerance for data loss?
  9. How many 9’s do you need?

Record Size

Don’t get sloppy.  Keep your record sizes down, nothing’s “cheap” at scale.  Something as simple as encoding scheme (utf32) can increase your data size by a factor of 4.

Response Time

People expect instant response nowadays.  Cassandra’s read and write times are a fraction of, for example, mySQL’s.  A read is 15 ms compared to about 300 ms on a 50 GB database.  And writing is 10x faster than reading!!!

As in the previous session, he recommends you have a model showing what’s required for each part of a Web page (or whatever) load so you know where all the time’s going.

He notes that the goal is to eliminate the need for a memcache tier by making Cassandra faster.

Traffic Patterns

Flat traffic patterns are kinda bad; you want a lull to run maintenance tasks.

Scaling Speed

Instant provisioning isn’t really instant.  Moving data takes a long time.  On a 1 Gbps network, it thus requires 7 minutes to get a 50 GB node and 42 for a 300 GB node.

Read vs Write

Why are people here looking at Cassandra?  Write performance?  Size and sharding?  No clear response from the crowd.

Since writing is so much faster than reading, it drives new behaviors.  Write your data as you want to read it; so you don’ thave to filter, sort, etc.

Hot Data

Hot data!  What nodes are being hit the most?  You need to know cause you can then do stuff about it or something.

Data Loss

Before using Cassandra, you probably want an 8 node cluster, so 200 GB minimum.  And you want to double when it fills up!  If you’re doing smaller stuff, use relational.  It’s easier.  You can run Cassandra on one node which is great for devs but you need 8 or more to minimize impact from adding nodes/node failure and such.

And nodes will fail from bit rot.  You need backups, too, don’t rely on redundancy.  Hard drives still fail A LOT and unless you’re up to W=3 you’re not safe.

Uptime

Loads of 9’s, like anything else, require other techniques beyond what’s integral to Cassandra.

He mentions a cool sounding site called Availability Digest, I’ll have to check it out.

Model Validation

Now, you’re in production.  You have to overprovision until you get some real world data to work with; you’re not going to estimate right.  Use cloud based load testing and stuff too.  Spending a little money on that will save you a lot of money later.  Load test throughout the dev cycle, not just at the end.

P.S.  Again, have backups.  Redundancy doesn’t protect against accidental “delete it all!” commands.

There’s not a lot of tools yet, but it’s Java.  Use jconsole (included in the JDK) for monitoring, troubleshooting, config validation, etc.   It connects to the JVM remotely and displays exposed JMX metrics.  We’ve been doing this recently with OpenDS.  It does depend on them exposing all the right metrics… It doesn’t have auth so you should use network security.  I’ll note that since JMX opens up a random high port, that makes this all a pain in the ass to do remotely (and impossible, out of the Amazon cloud).

He did a good amount of drilling in through jconsole to see specific things about a running Cassandra – under MBeans/org.apache.cassandra.db is a lot of the good stuff; commit logs, compaction jobs, etc.  You definitely need to run compactions, and overprovision your cluster so you can run it and have it complete.  And he let us connect to it, which was nice!  Here’s my jconsole view of his Cassandra installation:

Peco asked if there was a better JVM to use, he says “Not really, I’m using some 1.6 or something.  Oh, OpenJDK 1.4 64-bit.”

“Just because there’s no monitoring tools doesn’t mean you shouldn’t monitor it!”

You should know before it’s time to add a new node and how long it will take.

More tools!

  • Chiton is a graphical data browser for Cassandra
  • Lazyboy – python wrapper/client
  • Fauna – ruby wrapper/client
  • Lucandra – use Cassandra as a storage engine for Lucene/solr

Questions

Any third party companies with good tools for Cassandra?  Answer – yes, Riptano, a Rackspace previous employee thing, does support.

Do you want to run this in the cloud?  Answer – yes if you’re small, no if you’re Facebook.

What about backups?  Answer – I encourage you to write tools and open source them.  It’s hard right now.

This session seemed to strangely be not “about” Cassandra but “around” Cassandra.  Not using Cassandra yet but being curious, it gave me some good ops insights but I don’t feel like I have a great understanding…  Perhaps could have had a bit more of an intro to Cassandra.  He mentioned stuff in passing – I hear it has “nodes”, but never saw an architecture diagrams, and heard that you want to “compact,” but don’t know what or why.

3 Comments

Filed under Conferences, DevOps

3 responses to “Velocity 2010: Cassandra Workshop

  1. Pingback: Where did those numbers come from? | FewBar.com – Make it good

    • Note that there has been clarification on some of the Cassandra vs mySQL performance claims in this presentation – see this pingback for more!

  2. rp

    Don’t agree with data loss and backups. SAN are wide spread and ensure data redundancy, so there is no needs to do backup. On top of that, backup hundreds of tera bytes isn’t easy!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.