Tag Archives: scalability

by Ernest Mueller | June 24, 2010 · 3:34 pm

Velocity 2010 – Memcached Scalability

After lunch, we start off with Hidden Scalability Gotchas in Memcached and Friends by Neil Gunther (Performance Dynamics Company), Shanti Subramanyam (Oracle Corporation), and Stefan Parvu (Oracle Finland).

Scaling up versus scaling out. Bigger or more. There is no “best approach” – you need to be quantitative, with controlled measurements and numbers to see the cost-benefit.

Data isn’t information, you need to transform it. Capacity planning has “planning” in it. Like with finance, you need a model. Metrics + models = information.

Controlled Measurements

You want to take measurements in a known environment with a specific workload. Using production time series data is like predicting the stock market using Google finance graphs.

You need throughput measured in steady state. No load vs vusers with varying throughput…

So they did some controlled tests.

Memcached scaling is thread limited. Past about 4-6 threads, throughput levels off.

By using a SPARC multicore friendly hash map patch, it did scale up to maybe 30 threads.

Quantifying Scalability

1. Equal bang for the buck. Ideal parallelism is a linear load vs capacity graph, but really it plateaus and degenerates at some point. But there’s an early art of the graph that looks like this.

2. Cost of sharing resources – when the curve falls away from linear.

3. Resource limitation – where the curve tops out (Amdahl’s law)

4. Degradation/negative return – more capacity makes things worse after a point.

Formula: c(N)=N/(1+a(N-1)+bN(N-1))

N is the number of threads. 1=concurrency, a=contention , b= coherency

Run it through excel USL analysis and calculate a and b.

As memcached versions came out, the concurrency was improved, but N didn’t budge. People can say they make improvements, but if it doesn’t affect the data, then bah.

Anyway, the model is semi predictive but not perfect. If you know whether your problem is a contention (like queuing) or coherence (like point to point transfers) issue you know what to look for in your code.

Memcached Gotchas

Throw more hardware at it! Well, current strategies are around old cheap hardware with single CPUs. As multicore arrives, if you can’t use all the cores, you won’t utilize your hardware fully.

As memcached is thread limited it’ll be a problem on multicore.

Take controlled measurements with steady state throughput to analyze data.

Quantify scalability using a model. Reduce contention and coherency.

Follow them at:

There’s a lot of discussion about the model predictability because they had a case where the model predicted one thing until there were higher order data points and then it changed. The more data, the more the model works – but he stresses you need to trust the model with what data you have. You’re not predicting, you’re explaining with the model. It’s not going to tell you exactly what is wrong… Lots of questions, people are mildly confused.

Velocity 2010 – Dueling Cloud Management Suppliers

Two cloud systems management suppliers talk about their bidness! My comments in italics.

Cloud Autoscaling in Enterprise Computing by George Reese (enStratus Networks LLC)

How the Top Social Games Scale on the Cloud by Michael Crandell (RightScale, Inc)

I am more familiar with RightScale, but just read Reese’s great Cloud Application Architectures book on the plane here. Whose cuisine will reign supreme?

enStratus

Reese starts talking about “naive autoscaling” being a problem. The cloud isn’t magic; you have to be careful. He defines “enterprise” autoscaling as scaling that is cognizant of financial constraints and not this hippy VC-funded twitter type nonsense.

Reactive autoscaling is done when the system’s resource requirements exceed demand. Proactive autoscaling is done in response to capacity planning – “run more during the day.”

Proactive requires planning. And automation needs strict governors in place.

In our PIE autoscaling, we have built limits like that into the model – kinda like any connection pool. Min, max, rate of increase, etc.

He says your controls shouldn’t be all “number of servers,” but be “budget” based. Hmmm. That’s ideal but is it too ideal? And so what do you do, shut down all your servers if you get to the 28th of the month and you run out of cash?

CPU is not a scaling metric. Have better metrics tied to things that matter like TPS/response time. Completely agree there; scaling just based on CPU/memory/disk is primitive in the extreme.

Efficiency is a key cloud metric. Get your utilization high.

Here’s where I kinda disagree – it can often be penny wise and pound foolish. In the name of “efficiency” I’ve seen people put a bunch of unrelated apps on one server and cause severe availability problems. Screw utilization. Or use a cloud provider that uses a different charging model – I forget which one it was, but we had a conf call with one cloud provider that only charged on CPU used, not “servers provisioned.”

Of course you don’t have to take it to an extreme, just roll down to your minimum safe redundancy number on a given tier when you can.

Security – well, you tend not to do some centralized management things (like add to Active Directory) in the cloud. It makes user management hard. Or just makes you use LDAP, like God intended.

Cloud bursting – scaling from on premise into the cloud.

Case study – a diaper company. Had a loyalty program. It exceeded capacity within an hour of launch. Humans made a scaling decision to scale at the load balancing tier, and enStratus executed the auto-scale change. They checked it was valid traffic and all first.

But is this too fiddly for many cases? If you are working with a “larger than 5 boxes” kind of scale don’t you really want some more active automation?

RightScale

The RightScale blog is full of good info!

They run 1.2 million cloud servers! hey see things like 600k concurrent users, 100x scaling in 4 days, 15k instances, 1:2000 management ratio…

Now about gaming and social apps. They power the top 10 Facebook apps. They are an open management environment that lives atop the cloud suppliers’ APIs.

Games have a natural lifecycle where they start small, maybe take off, get big, eventually taper off. It’s not a flat demand curve, so flat supply is ‘tarded.

During the early phase, game publishers need a cheap, fast solution that can scale. They use Chef and other stuff in server templates for dynamic boot-time configuration.

Typically, game server side tech looks like normal Web stuff! Apache+HAproxy LB, app servers, db cache (memcached), db (sharded mySQL master/slave pairs). Plus search, queues, admin, logs.

Instance types – you start to see a lot of larger instances – large and extra large. Is this because of legacy comfort issues? Is it RAM needs?

CentOS5 dominates! Generic images, configured at boot. One company rebundles for faster autoscale. Not much ubuntu or Windows. To be agile you need to do that realtime config.

A lot of the boxes are used for databases. Web/app and load balancing significant too. There’s a RightScale paper showing a 100k packets per second LB limit with Amazon.

People use autoscaling a lot, but mainly for web app tier. Not LBs because the DNS changing is a pain. And people don’t autoscale their DBs.

They claim a lot lower human need on average for management on RightScale vs using the APIs “or the consoles.” That’s a big or. One of our biggest gripes with RightScale is that they consume all those lovely cloud APIs and then just give you a GUI and not an API. That’s lame. It does a lot of good stuff but then it “terminates” the programmatic relationship. [Edit: Apparently they have a beta API now, added since we looked at them.]

He disagrees with Reese – the problem isn’t that there is too much autoscaling, it’s that it has never existed. I tend to agree. Dynamic elasticity is key to these kind of business models.

If your whole DB fits into memcache, what is mySQL for? Writes sometimes? NoSQL sounds cool but in the meantime use memcache!!!

The cloud has enabled things to exist that wouldn’t have been able to before. Higher agility, lower cost, improved performance with control, anew levels of resiliency and automation, and full lifecycle support.

1 Comment

Filed under Cloud, Conferences, DevOps

Tagged as Cloud Computing, enstratus, rightscale, scalability, scaling, systems management, velocityconf, velocityconf10

by Ernest Mueller | June 22, 2010 · 1:05 pm

Velocity 2010: Scalable Internet Architectures

My first workshop is Scalable Internet Architectures by Theo Schlossnagle, CEO of OmniTI. He gave a nearly identical talk last year but I missed some of it, and it was really good, so I went! (Robert from our Web Admin team attended as well.)

There aren’t many good books on scalability. Mainly there are three – Art of Scalability, Cal Henderson’s Building Scalable Web Sites, and his, Scalable Internet Architectures. So any tips you can get a hold of are welcome.

Following are my notes from the talk; my own thoughts are in italics.

Architecture

What is architecture? It encompasses everything form power up to the client touchpoint and everything in between.

Of necessity, people are specialized into specific disciplines but you have to overcome that to make a whole system make sense.

The new push towards devops (development/operations collaboration) tries to address this kind of problem.

Operations

Operations is a serious part of this, and it takes knowledge, tools, experience, and discipline.

Knowledge – Is easy to get; Internet, conferences (Velocity, Structure, Surge), user groups

Tools – All tools are good; understand the tools you have. Some of operations encourages hackiness because when there is a disruption, the goal is “make it stop as fast as possible.”

You have to know how to use tools like truss, strace, dtrace through previous practice before the outage comes. Tools (and automation) can help you maintain discipline.

Experience comes from messing up and owning up.

Discipline is hardest. It’s the single most lacking thing in our field. You have to become a craftsman. To learn discipline through experience, and through practice achieve excellence. You can’t be too timid and not take risks, or take risks you don’t understand.

It’s like my old “Web Admin Standing Orders” that tried to delineate this approach for my ops guys – “1. Make it happen. 2. Don’t f*ck it up. 3. There’s the right way, the wrong way, and the standard way.” Take risks, but not dumb risks, and have discipline and tools.

He recommends the classic Zen and the Art of Motorcycle Maintenance for operations folks. Cowboys and heroes burn out. Embrace a Zen attitude.

Best Practices

Version Control everything. All tools are fine, but mainly it’s about knowing how to use it and using it correctly, whether it’s CVS or Subversion or git.
Know Your Systems – Know what things look like normally so you have a point of comparison. “Hey, there’s 100 database connections open! That must be the problem!” Maybe that’s normal. Have a baseline (also helps you practice using the tools). Your brain is the best pattern matcher.
Don’t say “I don’t know” twice. They wrote an open source tool called Reconnoiter that looks at data and graphs regressions and alerts on it (instead of cacti, nagions, and other time consuming stuff). Now available as SaaS!
Management – Package rollout, machine management, provisioning. “You should use puppet or chef! Get with the times and use declarative definition!” Use the tools you like. He uses kickstart and cfengine and he likes it just fine.

Dynamic Content

Our job is all about the dynamic content. Static content – Bah, use akamai or cachefly or panther or whatever. it’s a solved problem.

Premature optimization is the root of all evil – well, 97% of it. It’s the other 3% that’s a bitch. And you’re not smart enough to know where that 3% is.

Optimization means “don’t do work you don’t have to.” Computational reuse and caching, but don’t do it in the first place when possible.
He puts comments for things he decides not to optimize explaining the assumptions and why not.

Sometimes naive business decisions force insane implementations down the line; you need to re-check them.

Your content is not as dynamic as you think it is. Use caching.

Technique – Static Element Caching

Applied YSlow optimizations – it’s all about the JavaScript, CSS, images. Consolidate and optimize. Make it all publicly cacheable with 10 year expiry.

RewriteRule (.*)\.([0-9]+)\.css $1.css makes /s/app.23412 to /s/app.css – you get unique names but with new cached copy. Bump up the number in the template. Use “cat” to consolidate files, freaks!

Images, put a new one at a new URI. Can’t trust caches to really refresh.

Technique – Cookie Caching

Announcing a distributed database cache that always is near the user and is totally resilient! It’s called cookies. Sign it if you don’t want tampering. Encrypt if you don’t want them to see its contents. Done. Put user preferences there and quit with the database lookups.

Technique – Data Caching

Data caching. Caching happens at a lot of layers. Cache if you don’t have to be accurate, use a materialized view if you do. Figuring out the state breakdown of your users? Put it in a separate table at signup or state change time, don’t query all the time. Do it from the app layer if you have to.

Technique – Choosing Technologies

Understand how you’ll be writing and retrieving data – and how everyone else in the business will be too! (Reports, BI, etc.) You have to be technology agnostic and find the best fit for all the needs – business requirements as well as consistency, availability, recoverability, performance, stability. That’s a place where NoSQL falls down.

Technique – Database

Shard your database then shoot yourself. Horizontal scaling isn’t always better. It will make your life hell, so scale vertically first. If you have to, do it, and try not to have regrets.

Do try “files,” NoSQL, cookies, and other non-ACID alternatives because they scale more easily. Keep stuff out of the DB where you can.

When you do shard, partition to where you don’t need more than one shard per OLTP question. Example – private messaging system. You can partition by recipient and then you can see your messages easily. But once someone looks for messages they sent, you’re borked. But you can just keep two copies! Twice the storage but problem solved. Searching cross-user messages, however, borks you.

Don’t use multimaster replication. It sucks – it’s not ready for prime time. Outside ACID there are key-value stores, document databases, etc. Eventual consistency helps. MongoDB, Cassandra, Voldemort, Redis, CouchDB – you will have some data loss with all of them.

NoSQL isn’t a cure-all; they’re not PCI compliant for example. Shiny is not necessarily good. Break up the problem and implement the KISS principle. Of course you can’t get to the finish line with pure relational for large problems either – you have to use a mix; there is NO one size fits all for data management.

Keep in mind your restore-time and restore-point needs as well as ACID requirements of your data set.

Technique – Service Decoupling

One of the most fundamental techniques to scaling. The theory is, do it asynchronous. Why do it now if you can postpone it? Break down the user transaction and determine what parts can be asynchronous. Queue the info required to complete the task and process it behind the scenes.

It is hard, though, and is more about service isolation than postponing work. The more you break down the problem into small parts, the more you have in terms of problem simplification, fault isolation, simplified design, decoupling approach, strategy, and tactics, simpler capacity planning, and more accurate performance modeling. (Like SOA, but you know, that really works.)

One of my new mantras while building our cloud systems is “Sharing is the devil,” which is another way of stating “decouple heavily.”

Message queueing is an important part of this – you can use ActiveMQ, OpenAMQ, RabbitMQ (winner!). STOMP sucks but is a universal protocol most everyone uses to talk to message queues.

Don’t decouple something small and simple, though.

Design & Implementation Techniques

Architecture and implementation are intrinsically tied, you can’t wholly separate them. You can’t label a box “Database” and then just choose Voldemort or something.

Value accuracy over precision.

Make sure the “gods aren’t angry.” The dtrace guy was running mpstat one day, and the columns didn’t line up. The gods intended them to, so that’s your new problem instead of the original one! OK, that’s a confusing anecdote. A better one is “your Web servers are only handling 25 requests per second.” It should be obvious the gods are angry. There has to be something fundamentally wrong with the universe to make that true. That’s not a provisioning problem, that’s an engineering problem.

Develop a model. A complete model is nearly impossible, but a good queue theory model is easy to understand and provides good insight on dependencies.

Draw it out, rationalize it. When a user comes in to the site all what happens? You end up doing a lot of I/O ops. Given traffic you should then know about what each tier will bear.

Complexity is a problem – decoupling helps with it.

In the end…

Don’t be an idiot. A lot of scalability problems are from being stupid somewhere. High performance systems don’t have to scale as much. Here’s one example of idiocy in three acts.

Act 1 – Amusing Error By Marketing Boneheads – sending a huge mailing with an URL that redirects. You just doubled your load, good deal.

Act 2 – Faulty Capacity Planning – you have 100k users now. You try to plan for 10 million. Don’t bother, plan only to 10x up, because you just don’t understand the problems you’ll have at that scale – a small margin of error will get multiplied.

Someone into agile operations might point out here that this is a way of stating the agile principle of “iterative development.”

Act 3 – The Traffic Spike – I plan on having a spike that gives me 3000 more visitors/second to a page with various CSS/JS/images. I do loads of math and think that’s 5 machines worth. Oh whoops I forgot to do every part of the math – the redirect issue from the amusing error above! Suddenly there’s a huge amount more traffic and my pipe is saturated (Remember the Internet really works on packets and not bytes…) .

This shows a lot of trust in engineering math… But isn’t this why testing was invented? Whenever anyone shows me math and hasn’t tested it I tend to assume they’re full of it.

Come see him at Surge 2010! It’s a new scalability and performance conference in Baltimore in late Sep/early Oct.

A new conference, interesting! Is that code for “server side performance, ” where Velocity kinda focuses on client side/front end a lot?

Velocity 2009 – Scalable Internet Architectures

OK, I’ll be honest. I started out attending “Metrics that Matter – Approaches to Managing High Performance Web Sites” (presentation available!) by Ben Rushlo, Keynote proserv. I bailed after a half hour to the other one, not because the info in that one was bad but because I knew what he was covering and wanted to get the less familiar information from the other workshop. Here’s my brief notes from his session:

Online apps are complex systems
A siloed approach of deciding to improve midtier vs CDN vs front end engineering results in suboptimal experience to the end user – have to take holistic view. I totally agree with this, in our own caching project we took special care to do an analysis project first where we evaluated impact and benefit of each of these items not only in isolation but together so we’d know where we should expend effort.
Use top level/end user metrics, not system metrics, to measure performance.
There are other metrics that correlate to your performance – “key indicators.”
It’s hard to take low level metrics and take them “up” into a meaningful picture of user experience.

He’s covering good stuff but it’s nothing I don’t know. We see the differences and benefits in point in time tools, Passive RUM, tagging RUM, synthetic monitoring, end user/last mile synthetic monitoring… If you don’t, read the presentation, it’s good. As for me, it’s off to the scaling session.

I hopped into this session a half hour late. It’s Scalable Internet Architectures (again, go get the presentation) by Theo Schlossnagle, CEO of OmniTI and author of the similarly named book.

I like his talk, it starts by getting to the heart of what Web Operations – what we call “Web Admin” hereabouts – is. It kinda confuses architecture and operations initially but maybe that’s because I came in late.

He talks about knowledge, tools, experience, and discipline, and mentions that discipline is the most lacking element in the field. Like him, I’m a “real engineer” who went into IT so I agree vigorously.

What specifically should you do?

Use version control
Monitor
Serve static content using a CDN, and behind that a reverse proxy and behind that peer based HA. Distribute DNS for global distribution.
Dynamic content – now it’s time for optimization.

Optimizing Dynamic Content

Don’t pay to generate the same content twice – use caching. Generate content only when things change and break the system into components so you can cache appropriately.

example: a php news site – articles are in oracle, personalization on each page, top new forum posts in a sidebar.

Why abuse oracle by hitting it every page view? updates are controlled. The page should pull user prefs from a cookie. (p.s. rewrite your query strings)
But it’s still slow to pull from the db vs hardcoding it.
All blog sw does this, for example
Check for a hardcoded php page – if it’s not there, run something that puts it there. Still dynamically puts in user personalization from the cookie. In the preso he provides details on how to do this.
Do cache invalidation on content change, use a message queuing system like openAMQ for async writes.
Apache is now the bottleneck – use APC (alternative php cache)
or use memcached – he says no timeouts! Or… be careful about them! Or something.

Scaling Databases

1. shard them
2. shoot yourself

Sharding, or breaking your data up by range across many databases, means you throw away relational constraints and that’s sad. Get over it.

You may not need relations – use files fool! Or other options like couchdb, etc. Or hadoop, from the previous workshop!

Vertically scale first by:

not hitting the damn db!
run a good db. postgres! not mySQL boo-yah!

When you have to go horizontal, partition right – more than one shard shouldn’t answer an oltp question. If that’s not possible, consider duplication.

IM example. Store messages sharded by recipient. But then the sender wants to see them too and that’s an expensive operation – so just store them twice!!!

But if it’s not that simple, partitioning can hose you.

Do math and simulate it before you do it fool! Be an engineer!

Multi-master replication doesn’t work right. But it’s getting closer.

Networking

The network’s part of it, can’t forget it.

Of course if you’re using Ruby on Rails the network will never make your app suck more. Heh, the random drive-by disses rile the crowd up.

A single machine can push a gig. More isn’t hard with aggregated ports. Apache too, serving static files. Load balancers too. How to get to 10 or 20 Gbps though? All the drivers and firmware suck. Buy an expensive LB?

Use routing. It supports naive LB’ing. Or routing protocol on front end cache/LBs talking to your edge router. Use hashed routes upstream. User caches use same IP. Fault tolerant, distributed load, free.

Use isolation for floods. Set up a surge net. Route out based on MAC. Used vs DDoSes.

Service Decoupling

One of the most overlooked techniques for scalable systems. Why do now what you can postpone till later?

Break transaction into parts. Queue info. Process queues behind the scenes. Messaging! There’s different options – AMQP, Spread, JMS. Specifically good message queuing options are:

ActiveMQ (Java)
OpenAMQ (C)
RabbitMQ (erlang)

Most common – STOMP, sucks but universal.

Combine a queue and a job dispatcher to make this happen. Side note – Gearman, while cool, doesn’t do this – it dispatches work but it doesn’t decouple action from outcome – should be used to scale work that can’t be decoupled. (Yes it does, says dude in crowd.)

Scalability Problems

It often boils down to “don’t be an idiot.” His words not mine. I like this guy. Performance is easier than scaling. Extremely high perf systems tend to be easier to scale because they don’t have to scale as much.

e.g. An email marketing campaign with an URL not ending in a trailing slash. Guess what, you just doubled your hits. Use the damn trailing slash to avoid 302s.

How do you stop everyone from being an idiot though? Every person who sends a mass email from your company? That’s our problem – with more than fifty programmers and business people generating apps and content for our Web site, there is always a weakest link.

Caching should be controlled not prevented in nearly any circumstance.

Understand the problem. going from 100k to 10MM users – don’t just bucketize in small chunks and assume it will scale. Allow for margin for error. Designing for 100x or 1000x requires a profound understanding of the problem.

Example – I plan for a traffic spike of 3000 new visitors/sec. My page is about 300k. CPU bound. 8ms service time. Calculate servers needed. If I varnish the static assets, the calculation says I need 3-4 machines. But do the math and it’s 8 GB/sec of throughput. No way. At 1.5MM packets/sec – the firewall dies. You have to keep the whole system in mind.

So spread out static resources across multiple datacenters, agg’d pipes.
The rest is only 350 Mbps, 75k packets per second, doable – except the 302 adds 50% overage in packets per sec.

Last bonus thought – use zfs/dtrace for dbs, so run them on solaris!

1 Comment

Filed under Conferences, DevOps

Tagged as conference, Operations, performance, scalability, velocity, velocityconf, velocityconf09, web