Tag Archives: mysql

Velocity 2013 Day 1 Liveblog – Using Amazon Web Services for MySQL at Scale

Next up is Using Amazon Web Services for MySQL at Scale. I missed the first bit, on RDS vs EC2, because I tried to get into Choose Your Weapon: A Survey For Different Visualizations Of Performance Data but it was packed.

AWS Scaling Options

Aside: use boto

vertical scaling – tune, add hw.

table level partitioning – smaller indexes, etc. and can drop partitions instead of deleting

functional partitioning (move apps out)

need more reads? add replicas, cache tier, tune ORM

replication lag? see above, plus multiple schemas for parallel rep (5.6/tungsten). take some stuff out of db (timestamp updates, queues, nintrans reads), pre-warm caches, relax durability

writes? above plus sharding

sharding by row range requires frequent rebalancing

hash/modulus based- better distro but harder to rebalance; prebuilt shards

lookup table based

Availability

In EC2 you have regions and AZs. AZs are supposed to be “separate” but have some history of going down with each other.

A given region is about 99.2% up historically.

RDS has multi-AZ replica failover

Pure EC2 options:

  • master/replicas – async replication. but, data drift, fragile (need rapid rebuild). MySQL MHA for failover. haproxy (see palomino blog)
  • tungsten – replaced replication and cluster manager. good stuff.
  • galera – galera/xtradb/mariadb synchronous replication

Performance

io/storage: provisioned IOPS.  Also, SSD for ephemeral power replicas

rds has better net perf, the block replication affects speed

instance types – gp, cpu op, memory op, storage op.  Tend to use memory op, EBS op.  cluster and dedicated also available.

EC2 storage – ephemeral, epehemeral SSD (superfast!), EBS slightly slower, EBS PIOPS faster/consistent/expensive/lower fail

Mitigating Failures

Local failures should not be a problem.  AZs, run books, game days, monitoring.

Regional failures – if you have good replication and fast DNS flipping…

You may do master/master but active/active is a myth.

Backups – snap frequently, put some to S3/glacier for long term. Maybe copy them out of Amazon time to time to make your auditors happy.

Cost

Remember, you spend money every minute.  There’s some tools out there to help with this (and Netflix released Ice today to this end).

Leave a comment

Filed under Cloud, Conferences, DevOps

Velocity 2010 – Drizzle

Monty Taylor from Rackspace talked about Drizzle, a MySQL variant “built for operations“. My thoughts will be in italics so you can be enraged at the right party.

Drizzle is “a database for the cloud”.  What does that even mean?  It’s “the next Web 2.0”, which is another away of saying “it’s the new hotness, beeyotch” (my translation).

mySQL scaling to multiple machines brings you sadness.  And mySQL deploy is crufty as hell.  So step 1 to Drizzle recovery is that they realized “Hey, we’re not the end all be all of the infrastructure – we’re just one piece people will be putting into their own structure.”  If only other software folks would figure that out…

Oracle style vertical scaling is lovely using a different and lesser definition of scaling.  Cloud scaling is extreme!  <Play early 1990s music>  It requires multiple machines.

They shard.  People complain about sharding, but that’s how the Internet works – the Internet is a bunch of sites sharded by functionality.  QED.

“Those who don’t know UNIX are doomed to repeat it.”  The goal (read about the previous session on toolchains) is to compose stuff easily, string them together like pipes in UNIX.  But most of the databases still think of themselves as a big black box in the corner, whose jealous priests guard it from the unwashed heathen.

So what makes Drizzle different? In summary:

  • Less features
  • Ops driven
  • Sane config
  • Plugins

Less features means less ways for developers to kill you.  Oracle’s “run Java within the database” is an example of totally retarded functionality whose main job is to ruin your life. No stored procedures, no triggers, no prepared statements.  This avoids developer sloppiness.  “Insert a bunch of stuff, then do a select and the database will sort it!” is not appropriate thinking for scale.

Ops driven means not marketing driven, which means driven by lies.  For example, there’s no marketdroids that want them to add a mySQL event scheduler when cron exists.  Or “we could sell more if we had ANSI compliant stored procedures!”  They don’t have a company to let the nasty money affect their priorities.

They don’t do competitive benchmarks, as they are all lies.  That’s for impartial third parties to do.  They do publish their regression tests vs themselves for transparency.

You get Drizzle via distros.  There are no magic “gold” binaries and people that do that are evil.  But distros sometimes get behind.  pandora-build

They have sane defaults.  If most people are doing to set something (like FRICKING INNODB) they install by default that way.  To install Drizzle, the only mandatory thing to say is the data directory.

install from apt/yum works.  Or configure/make/make install and run drizzled.  No bootstrap, system tables, whatever.

They use plugins.  mySQL plugins are pain, more of a patch really.  You can just add them at startup time, no SQL from a sysadmin.  And no loading during runtime – see “less features” above.  This is still in progress especially config file sniblets.  But plugins are the new black.

They have pluggable protocols.  It ships with mySQL and Drizzle, but you can plug console, HTTP/REST or whatever.  Maybe dbus…  Their in progress Drizzle protocol removes the potential for SQL injection by only delivering one query, has a sharding key in the packet header, supports HTTP-like redirects…

libdrizzle has both client and server ends, and talks mySQL and Drizzle.

So what about my app that always auths to the database with its one embedded common username/password?  Well, you can do none, PAM, LDAP (and well), HTTP. You just say authenticate(user, pass) and it does it.   It has pluggable authorization too, none, LDAP, or hard-coded.

There is a pluggable query filter that can detect and stop dumb queries- without requiring a proxy.

It has pluggable logging – none, syslog, gearman, etc. – and errors too.

Pluggable replication!  A new scheme based on Google protocol buffers, readable in Java, Python, and C++.  It’s logical change (not quite row) based.  Combined with protocol redirects, it’s db migration made easy!

Boots!  A new command line client, on launchpad.net/boots.  It’s pluggable, scriptable, pipes SQL queries, etc.

P.S. mySQL can lick me! (I’m paraphrasing, but only a little.)

1 Comment

Filed under Conferences, DevOps