Author Archives: karthequian

About karthequian

I love creating products, and full stack dev comfortable creating things from scratch, and know a bunch about containers, kubernetes, auth and agile development. I live in Austin and organize @devopsdays, @container_days and @cloud_austin. Follow me on

ReInvent 2013: Day 2 Keynote

I didn’t cover the day 1 keynote, but fortunately it can be found here. The day 2 keynote was a lot more technical and interesting though. Here are my notes from it:

First, we began by talking about how aws plans its projects.

Lots of updates every year!

Before any project is started, and teams are in the brainstorming phase. A few key things are always done.

  • Meeting minutes
  • FAQ
  • Figure out the ux
  • Before any code is written

“2 Pizza Teams”: Small autonomous teams that had roadmap ownership with decoupled lauch schedules.

Customer collaboration

Get the functionality in the hands of customers as soon as possible. It may be feature limited, but it’s in the hands of customers so that they can get feedback as soon as possible. Iterate iterate iterate based on feedback. Different from the old guard where everything is engineering driven and it is unnecessarily complex.

Netflix platform….

Netflix is on stage and we’re taking about the Netflix cloud prizes and talking about the enhancements to the different tools…looks pretty cool, and will need to check them out. There are 14 chaos monkey “tests” to run now instead of just 1 from before.

Cloud prize winners

Werner is back is breaking down the different facets that AWS focuses on:

  • Performance- measure everything; put performance data in log files that can be mined.
  • Security
  • Reliability
  • Cost
  • Scalability

Illya sukhar CEO from Parse is on stage now (platform for mobile apps)
-parse data: store data; it’s 5 lines of code instead of a bunch of code.
-push notification

Parse started with 1 aws instance
From 0-180,000 apps

180,000 collections in mongodb; shows differences between pre and post piops


IAM and IAM roles to set boundaries on who can access what.
How to do this from a db perspective?
Apparently you can have fine grained access controls on dynamodb instead of writing your own code.
Each data block is encrypted in redshift
Talking about how customers are using the spot instances to save $.

We transfer usecase, who take care of transferring large files.

Airbnb on stage with mike curtis, VP of engineering
-350k hosts around the world
-4 millions guests (jan 2013)
-9 million guests today.

Host of aws services
1k ec2 instances
Million RDS rows
50tb for photos in s3

“The ops team at Airbnb is with a 5 person ops team.”

Helps devote resources to the real problem.

AirBnB in 2011

AirBnB in 2012

Dropcam came on stage after that to talk about how they use the AWS platform. Nothing too crazy, but interestingly more inbound videos are sent to dropcam than YouTube!


They keynote ended with an Amazon Kinesis demo (and a deadmau5 announcement for the replay party), which on the outside looks like a streaming API and different ways to process data on the backend. A prototype of streaming data from twitter and performing analytics was shown to demonstrate the service.


  • RDS for PostgreSQL
  • New instance types-i2 for much better io performance
  • Dynamo db- global secondary indexes!!
  • Federation with saml 2.0 for IAM
  • Amazon RDS- cross region read replicas!
  • G2 instances for media and video intensive application
  • C3 instances are new with fastest processors- 2.8 gig intel e5 v2
  • Amazon kinesis- real time processing, fully managed. It looks like this will help you solve issues of scalability when you’re trying to build realtime streaming applications. It integrates with storage and processing services.


Incase you want to watch it, the day 2 keynote is here:

And also, the day 1 keynote:


Filed under Cloud, Conferences

ReInvent 2013- Scaling on AWS for the First 10 Million Users

This was the first talk by @simon_elisha I went to at ReInvent, and was a packed room. It was targeted towards developers going from inception of an app to growing it to 10 million users. Following are the notes I took…

– We will need a bigger box is the first issue, when you start seeing traffic to an application. Single box is an anti pattern because of no failover etc. move out your db from the web server etc…you could use RDS or something too.

– SQL or NoSQL?
Not a binary decision; maybe use both? A blended approach can reduce technical debt. Maybe just start with SQL because it’s familiar and there are clear patterns for scalability. Nosql is great for super low latency apps, metadata data sets, fast lookups and rapid ingesting data.

So for 100 users…
You can get by using route53, ELB, multiple web instances.

For 10000 users…
– Use cloud front to cache any static assets.
– Get your session state out of the webservers. Session state could be stored in dynamo db because it’s just unrelated data.
– Also might be time for elastic cache now which is just hosted redis or memcached.

Auto scaling…
Min, max servers running in multiple az zones. AWS makes this really simple.

If you end up at the 500k users situation you probably really want:
– metrics and alarms
– automated builds and deploys
– centralized logging

must haves for log metrics to collect:
– host level metrics
– aggregate level metrics
– log analysis
– external site performance

Use a product for this, because there are plenty available, and you can focus on what you’re really trying to accomplish.

Create tools to automate so you save your time especially to manage your time. Some of the ones that you can use are: elastic beanstalk, aws opsworks more for developers and cloud formation and raw ec2 for ops. The key is to be able to repeat those deploys quickly. You probably will need to use puppet and chef to manage the actual ec2 instances..

Now you probably need to redesign your app when you’re at the million user mark. Think about using a service oriented architecture. Loose coupling for the win instead of tight coupling. You can probably put a queue between 2 pieces

Key tip: don’t reinvent the wheel.

Example of what to do when you have a user uploading a picture to a site.

Simple workflow service
– workers and deciders: provides orchestration for your code.

When your data tier starts to break down 5-10 mill users
– federation
Split by function or purpose
Gotcha- You will have issues with join queries
– sharding
This  works well for one table with billions of rows.
Gotcha- operationally confusing to manage
– shift to nosql
Sorta similar to federation
Gotcha- crazy architecture change. Use dynamo db.

Final Tips

Leave a comment

Filed under Cloud, Conferences

The AWS ReInvent Conference Recap

Last week I attended AWS ReInvent in Las Vegas. It was the largest conference I’ve been to with 9000 people, and a crazy number of sessions. When I was trying to decide what sessions to go to, I realized I had multiple conflicts at every slot (a good problem to have). It was also one of the funnest conferences I’ve been to, and I’ll be back next year (a bit more prepared next time around).

I’ll post about the sessions I went to, but the following are my favorite highlights from the conference:

  • Day 2 keynote with Werner Vogels: After a more marketing and “C” centric keynote on day 1, the day 2 keynote was tuned more to the large developer crowd in the audience, and I left inspired. Check out all my notes here.
  • Expo Hall: Holy cow! I got tired after walking around just 1/2 this hall. According to the booklet, there were maybe over 170 sponsors, and it took a while to walk through and check out what everyone was doing. The expo hall was also packed the first couple of days, so I went on the 3rd day when things were a lot quieter (pro tip: If you want the best swag, go the first day), but it also gave me a chance to talk to more of the folks in a leisurely manner! My 2 favorite highlights about the expo hall were Datadog (Most enthusiastic even on day 3), and Cloudability (who knew that I was a customer of theirs even though I didn’t realize another team at Mentor used the product; I thought that was pretty awesome!).
  • Crazy number of sessions: I’m glad these are all on YouTube now. I hear the slides are also going to be online in a bit. This will give me a way to catch up on the sessions that I missed out on.
  • AWS Hands on labs: This was pretty cool! You could skip a session or two and do a hands on lab on an AWS technology. I spent some time doing a hands on learning AWS beanstalk, and it was totally worthwhile.
  • Day 1 (Tuesday): I only got in on Tuesday, but next time I’ll need to register in time to be a part of the hackathon or gameday. I talked to a bunch of folks who attended these, and they ended up having a great time at both of these. The Gameday was pretty cool as well and was targeted at DevOps folks. You ended up forming a team with a bunch of other folks and had to build an application infrastructure that was resilient to any kind of breakage. Then, you’d swap your credentials with another team, and they would try to break your infrastructure; you can imagine how this would end up being entertaining!
  • Meeting up with folks, and catching up with people I hadn’t seen in a while.
  • VEGAS! It was good to not lose at the roulette tables this time around 🙂

A lot of developer friends commented that the talks were light on technical side of things, which I thought was true; the way I got more out of these was actually talking to the product managers and the customer at the end of the talk to ask and understand some of the more technical concepts. This is true for most conferences, but was especially true for this one.

Stay tuned for a bunch of post conference session updates!

Leave a comment

Filed under DevOps

Velocity 2013 Day 3: benchmarking the new front end

By Emily Nakashima and Rachel Myers

Talking about their experiences at mod cloth…..

Better performance is more user engagement, page views etc…

Basically, we’re trying to improve performance because it improves user experience.

A quick timeline on standards and js mvc frameworks from 2008 till present.

NewRelic was instrumented to get an overview of performance and performance metrics; the execs asked for a dashboard!! Execs love dashboard 🙂

Step 1: add a cdn; it’s an easy win!
Step 2: The initial idea was to render the easy part of the site first- 90% render.
Step 3: changed this to a single page app

BackboneJS was used to redesign the app to a single page app from the way the app was structured before.

There aren’t great tools for Ajax enabled sites to figure out perf issues. Some of the ones that they used were:
– LogNormal: rebranded as Soasta mpulse
– newrelic
– yslow
– webpagetest
– google analytics (use for front end monitoring, check out user timings in ga)- good 1st step!
– circonus (which is the favorite tool of the presenters)

Asynchronous world yo! Track:
– featurename
– pagename
– unresponsiveness

Velocity buzzwords bingo! “Front end ops”

Leave a comment

Filed under Cloud, Conferences

Velocity 2013 Day 3: DevOps and metrics

We’re talking about the devops survey with Gene Kim, James Turnbull and Jez Humble

4039 survey responses!

Lessons learned
– Don’t change questions midway in a survey
– Get a data analyst for survey analysis

Key findings
– devops teams are more agile; 30x more deployments, 8000x shorter lead times
– devops teams are more reliable;
– most teams use version control- 89%
– most teams use automated code deployments- 82%
– the longer you do devops, the better you get!

Hilarious John Vincent quote on devops:


Measuring culture
Trust and verify

26% of folks who responded to the survey were from the enterprise, and 16% were from 10k and plus.
The biggest barriers to devops was culture because people didn’t get it- whether it was your manager, team or outside the group. Tell people more, and wear more devops shirts!!!

DevOps continues to be a culture issue versus an issue in terms of tools and processes. James Turnbull wants us to go out there and talk to people and figure out people skills!

And join the devops google+ community!


Filed under DevOps

Velocity 2013 Day 2 Liveblog: mobile performance and engagement

Guilin Chen from Facebook is the presenter…

UX is important and mobile users are less tolerant than desktop developers.

What does performance mean for the Facebook study
– page load times
– scroll performance (how smooth)
– prefetch delay (infinite scrolling)

– page load times showed a strong correlation between slowness and user drop off.
– consistent scrolling experience matters more; slower scrolling is better than jittery scrolling.
– prefetch delay studies weren’t as conclusive and thus, didn’t matter as much..

Leave a comment

Filed under Conferences, DevOps

Velocity 2013 Day 2 Liveblog: a baseline for web performance with phantomjs

The talk I’m most excited about for today is next! I made sure to get here early…

@wesleyhales from apigee and @ryanbridges from CNN

Quick overview on load testing tools- firebug, charles, har viewers and whatnot; but its super manual.
Better- selenium, but it’s old yo and not hip anymore.
There are services out there:, harviewer etc that you can use too. is pimped again, but apparently caused an internal argument in CNN.

Performance basics
– caching
– gzip: don’t gzip that’s already compressed (like jpegs)
– know when to pull from cdn’s

Ah! New term bingo! “Front end ops”- aka sucks to code something and then realize there need to be a ton of things to do to make things perform even more. Continued definition:
– keep an eye on perf
– manager of builds and dependencies (grunt etc)
– expert on delivering content from server to browser
– critical of new http requests/file sizes and load times

I’m realizing that building front ends is a lot more like building server side code….

Wes recommends having a front end performance ops position and better analytics.

A chart of CNN’s web page load times is shown.

So basically, every time is built by bamboo, the page load time is analyzed, saved and analyzed. They use phantomjs for this which became Loadreport.js. is the URL for it.

Filmstrip is a cool idea that stores filmstrips of all pages loaded.
Speed reports is another visualization that was written.

Hard parts
– performance issues needs more thought; figure out your baseline early
– advertisers use document.write
– server location
– browser types: DIY options are harder
– CPU activity: use a consistent environment

All in all, this has many of the same concerns when you’re doing server side performance

CI setup
– bamboo
– Jenkins
– barebones Linux without x11
– vagrant

Demo was shown that used Travis ci as the ci system.

All in all, everyone uses phantomjs for testing; check it out; look at fluxui on github for more!

Leave a comment

Filed under Conferences, DevOps

Velocity 2013 Day 2 Liveblog: CSS and gpu cheat sheet

I was headed to the CSS and gpu talk by Colt McAnlis (#perfmatters on twitter)

CSS properties and their paint times aren’t free. Depending on what properties you use, you could end up with slow rendering speeds. Box shadows and border radius strokes are the slowest (1.09ms) per render. That is pretty crazy, and I didn’t realize that it could be that slow.

We’re mostly taking about CSS optimizations that can be used by using the gpu, CPU on chrome.

Kinds of Layering controls
– load time layer promotion: some elements get their own layer by default. (Ex canvas, plugins, video, I frame)
– assign time layer promotion: (translate z, rotatex/y/z)
– animations
– stacking context and relative scrolling

– Too many layers uses additional memory; and you fill up the gpu tile cache.
– chrome prepaints tiles that are visible and not yet visible.

Side note: Colt loves ducks, and is sad about losing his hair 😦

– large images resized take forever. The resized images aren’t cached in the gpu. Think more about this for mobile devices.

– turn on show layer borders in devtools in chrome. It’ll help with translate z issues etc.
– use continuous paint mode to continuously paint the page to see

– gpu and layers helps with faster rendering
– too many layers is a bad idea
– CSS tags impact page loads and rendering

Leave a comment

Filed under Conferences, DevOps

Velocity 2013 Day 1 liveblog: Avoiding web performance regression

Avoiding web performance regression

By Marcel Duran (@marcelduran)

Works on the #web-core team at twitter.
Check out #flight

Problem: After a new release, apps get slower sometimes…

Monitoring is a reactive solution to solve performance issues.

Tools used: http archive (har) for yslow, yslow, cuzillion, fiddler, showslow

Har’s can be generated by- fiddler, phantomjs, yslow,
Install yslow locally (needs nodejs)

Ci and cd at yahoo
Crazy amounts of tests but no performance tests…

Phantomjs is a simple repeatable way to test web page performance times amongst other things.

Make performance tests a part of your ci process….

Next up: instead of just having perf tests in your ci process, graduate to a new level by measuring custom metrics on each performance run…

Peregrine is a tool used in twitter based on webpagetest
Peregrine takes code and deploys to performance boxes and integrates with webpagetest to run perf tests.

Peregrin will likely be open sourced soon..

Leave a comment

Filed under DevOps