Evolution of Bazaarvoice’s Architecture to 500M Unique Users Per Month

Check out this article by @victortrac on High Scalability on how we have scaled our infrastructure at Bazaarvoice to be serving out a billion product reviews a day!

Leave a comment

by | December 2, 2013 · 3:07 pm

A DevOps Thanksgiving

This last week at the Agile Austin DevOps SIG, our topic was simple – “A DevOps Thanksgiving.” We all shared what we’re thankful for from the DevOps world this year – things that have made our lives better.

It was a nice and refreshing discussion!  People mentioned the things making their lives better. Group members expressed their thanks for such diverse things as DevOps Weekly, rspec-puppet, The Phoenix Project, Vagrant, Docker, test-kitchen with serverspec and bats, provisioned IOPS in AWS, DevOps Cafe, The Ship Show, increasing crossplatform support in DevOps tools and thinking, DevOps tracks springing up at conferences like Agile 2013 and AppSec, DevOpsDays… Thanks to all the people who put in lots of their hard work to make them all possible!

In retrospect we have a lot to be thankful for.  Even though the techno-hipsters don’t even want to say the word “DevOps” any more, it’s a very real change bringing better things to our tools, products, and even lives. I know I’ve seen a lot of change in the teams I’ve worked with that have implemented it – fewer “all hands overnight releases,” less psychotic oncall, less inter-group hatefulness – DevOps has brought us all a lot of good things, and it’s just starting to take hold out there in the industry.

How about you?  What DevOps thing were you thankful for this year?  Add into the comments here, blog it up yourself, tweet it (I suggest #devopsthanksgiving as the hashtag)…  Spread the thanks!

Leave a comment

Filed under DevOps

ReInvent – Fireside Chat: Part 1

One of the interesting sessions at ReInvent was a fireside chat with Werner Vogels., where CEO’s or CTO’s of different companies/startups who use AWS talked about their applications/platforms and what they liked and wanted form AWS. It was a 3 part series with different folks, and I was able to attend the 1st one, but I’m guessing videos are available for the others online.  Interesting session, giving the audience a window into the way C level people think about problems and solutions…

First up, the CTO of mongodb…

Lots of people use mongo to store things like user profiles etc for their applications. Mongo performance has gotten a lot better because of ssd’s

Recently funded 150 million, and wanting to build out a lot of tools to be able to administer mongo better.

Apparently being a mongodb dba is a really high paying job these days!

User roles may be available in mongo next year to add more security.

Werner and Eliot want to work together to bring a hosted version of mongo like RDS.

Next up twilio’s Jeff Lawson

Jeff is ex amazon.

Untitled

Software people want building blocks and not some crazy monolithic thing to solve a problem. Telecom had this issue, and that is why I started Twilio.

Everyone is agile! We don’t have answers up front, but we figure out these answers as we go.

Started with voice, then moved to SMS followed by a global presence. Most customers of ours wanted something that didn’t want boundaries and just wanted an API to communicate with their customers.

Werner: It’s hard to run an API business. Tell us more…
Lawson: It is really hard. Apis are kinda like webapps when it comes to scaling. REST helps a lot from this perspective. Multi tenancy issues gets amplified when you have an API business.

Twilio apparently deploys 20 times a day. Aws really helps with deployment because you can bring brand new environments that look exactly like prod and then tear it down when things aren’t needed.

When it comes to api’s, we write the documentation first and show our customers first before actually implementing the API. Then iterate iterate iterate on the development.

Jeff asks: Make it easier to make vpc up and running.

Next up: Valentino with adroll (realtime bidding)

Untitled

There’s a data collection pipe which gets like 20 tb of data everyday.

Latency is king: Typically latency is like 50ms and 100ms. This is still a lot for us. I wish we had more transparency when it comes to latency inside aws and otherwise…

Why dynamo db? Didn’t find something simple at the time, and it was nice to be able to scale something without having to worry about it. We had 0 ops people at the time to work on scaling at the time.

Read write rates: 80k reads per second (not consistent), 40k writes per second.

Why erlang? You’re a python god.
I started working on Python with the twisted framework. But I realized that Python didn’t fit our use case well; the twisted system worked just as well but it would be complicated to manage it and needed a bit of hacks..

Today it would be hard to pick between erlang and go….

Leave a comment

Filed under Cloud, Conferences

ReInvent 2013: Day 2 Keynote

I didn’t cover the day 1 keynote, but fortunately it can be found here. The day 2 keynote was a lot more technical and interesting though. Here are my notes from it:

First, we began by talking about how aws plans its projects.

Lots of updates every year!

Before any project is started, and teams are in the brainstorming phase. A few key things are always done.

  • Meeting minutes
  • FAQ
  • Figure out the ux
  • Before any code is written

“2 Pizza Teams”: Small autonomous teams that had roadmap ownership with decoupled lauch schedules.

Customer collaboration

Get the functionality in the hands of customers as soon as possible. It may be feature limited, but it’s in the hands of customers so that they can get feedback as soon as possible. Iterate iterate iterate based on feedback. Different from the old guard where everything is engineering driven and it is unnecessarily complex.

Netflix platform….

Netflix is on stage and we’re taking about the Netflix cloud prizes and talking about the enhancements to the different tools…looks pretty cool, and will need to check them out. There are 14 chaos monkey “tests” to run now instead of just 1 from before.

Cloud prize winners

Werner is back is breaking down the different facets that AWS focuses on:

  • Performance- measure everything; put performance data in log files that can be mined.
  • Security
  • Reliability
  • Cost
  • Scalability

Illya sukhar CEO from Parse is on stage now (platform for mobile apps)
-parse data: store data; it’s 5 lines of code instead of a bunch of code.
-push notification

Parse started with 1 aws instance
From 0-180,000 apps

180,000 collections in mongodb; shows differences between pre and post piops

Security

IAM and IAM roles to set boundaries on who can access what.
How to do this from a db perspective?
Apparently you can have fine grained access controls on dynamodb instead of writing your own code.
Each data block is encrypted in redshift
Cost:
Talking about how customers are using the spot instances to save $.

Scalability:
We transfer usecase, who take care of transferring large files.

Airbnb on stage with mike curtis, VP of engineering
-350k hosts around the world
-4 millions guests (jan 2013)
-9 million guests today.

Host of aws services
1k ec2 instances
Million RDS rows
50tb for photos in s3

“The ops team at Airbnb is with a 5 person ops team.”

Helps devote resources to the real problem.

AirBnB in 2011

AirBnB in 2012

Dropcam came on stage after that to talk about how they use the AWS platform. Nothing too crazy, but interestingly more inbound videos are sent to dropcam than YouTube!

Dropcam

They keynote ended with an Amazon Kinesis demo (and a deadmau5 announcement for the replay party), which on the outside looks like a streaming API and different ways to process data on the backend. A prototype of streaming data from twitter and performing analytics was shown to demonstrate the service.

Announcements

  • RDS for PostgreSQL
  • New instance types-i2 for much better io performance
  • Dynamo db- global secondary indexes!!
  • Federation with saml 2.0 for IAM
  • Amazon RDS- cross region read replicas!
  • G2 instances for media and video intensive application
  • C3 instances are new with fastest processors- 2.8 gig intel e5 v2
  • Amazon kinesis- real time processing, fully managed. It looks like this will help you solve issues of scalability when you’re trying to build realtime streaming applications. It integrates with storage and processing services.

Announcements

Incase you want to watch it, the day 2 keynote is here: http://www.youtube.com/watch?v=Waq8Y6s1Cjs

And also, the day 1 keynote: http://www.youtube.com/watch?v=8ISQbdZ7WWc

2 Comments

Filed under Cloud, Conferences

ReInvent 2013- Scaling on AWS for the First 10 Million Users

This was the first talk by @simon_elisha I went to at ReInvent, and was a packed room. It was targeted towards developers going from inception of an app to growing it to 10 million users. Following are the notes I took…

– We will need a bigger box is the first issue, when you start seeing traffic to an application. Single box is an anti pattern because of no failover etc. move out your db from the web server etc…you could use RDS or something too.

– SQL or NoSQL?
Not a binary decision; maybe use both? A blended approach can reduce technical debt. Maybe just start with SQL because it’s familiar and there are clear patterns for scalability. Nosql is great for super low latency apps, metadata data sets, fast lookups and rapid ingesting data.

So for 100 users…
You can get by using route53, ELB, multiple web instances.

For 10000 users…
– Use cloud front to cache any static assets.
– Get your session state out of the webservers. Session state could be stored in dynamo db because it’s just unrelated data.
– Also might be time for elastic cache now which is just hosted redis or memcached.

Auto scaling…
Min, max servers running in multiple az zones. AWS makes this really simple.

If you end up at the 500k users situation you probably really want:
– metrics and alarms
– automated builds and deploys
– centralized logging

must haves for log metrics to collect:
– host level metrics
– aggregate level metrics
– log analysis
– external site performance

Use a product for this, because there are plenty available, and you can focus on what you’re really trying to accomplish.

Create tools to automate so you save your time especially to manage your time. Some of the ones that you can use are: elastic beanstalk, aws opsworks more for developers and cloud formation and raw ec2 for ops. The key is to be able to repeat those deploys quickly. You probably will need to use puppet and chef to manage the actual ec2 instances..

Now you probably need to redesign your app when you’re at the million user mark. Think about using a service oriented architecture. Loose coupling for the win instead of tight coupling. You can probably put a queue between 2 pieces

Key tip: don’t reinvent the wheel.

Example of what to do when you have a user uploading a picture to a site.

Simple workflow service
– workers and deciders: provides orchestration for your code.

When your data tier starts to break down 5-10 mill users
– federation
Split by function or purpose
Gotcha- You will have issues with join queries
– sharding
This  works well for one table with billions of rows.
Gotcha- operationally confusing to manage
– shift to nosql
Sorta similar to federation
Gotcha- crazy architecture change. Use dynamo db.

Final Tips

Leave a comment

Filed under Cloud, Conferences

The AWS ReInvent Conference Recap

Last week I attended AWS ReInvent in Las Vegas. It was the largest conference I’ve been to with 9000 people, and a crazy number of sessions. When I was trying to decide what sessions to go to, I realized I had multiple conflicts at every slot (a good problem to have). It was also one of the funnest conferences I’ve been to, and I’ll be back next year (a bit more prepared next time around).

I’ll post about the sessions I went to, but the following are my favorite highlights from the conference:

  • Day 2 keynote with Werner Vogels: After a more marketing and “C” centric keynote on day 1, the day 2 keynote was tuned more to the large developer crowd in the audience, and I left inspired. Check out all my notes here.
  • Expo Hall: Holy cow! I got tired after walking around just 1/2 this hall. According to the booklet, there were maybe over 170 sponsors, and it took a while to walk through and check out what everyone was doing. The expo hall was also packed the first couple of days, so I went on the 3rd day when things were a lot quieter (pro tip: If you want the best swag, go the first day), but it also gave me a chance to talk to more of the folks in a leisurely manner! My 2 favorite highlights about the expo hall were Datadog (Most enthusiastic even on day 3), and Cloudability (who knew that I was a customer of theirs even though I didn’t realize another team at Mentor used the product; I thought that was pretty awesome!).
  • Crazy number of sessions: I’m glad these are all on YouTube now. I hear the slides are also going to be online in a bit. This will give me a way to catch up on the sessions that I missed out on.
  • AWS Hands on labs: This was pretty cool! You could skip a session or two and do a hands on lab on an AWS technology. I spent some time doing a hands on learning AWS beanstalk, and it was totally worthwhile.
  • Day 1 (Tuesday): I only got in on Tuesday, but next time I’ll need to register in time to be a part of the hackathon or gameday. I talked to a bunch of folks who attended these, and they ended up having a great time at both of these. The Gameday was pretty cool as well and was targeted at DevOps folks. You ended up forming a team with a bunch of other folks and had to build an application infrastructure that was resilient to any kind of breakage. Then, you’d swap your credentials with another team, and they would try to break your infrastructure; you can imagine how this would end up being entertaining!
  • Meeting up with folks, and catching up with people I hadn’t seen in a while.
  • VEGAS! It was good to not lose at the roulette tables this time around 🙂

A lot of developer friends commented that the talks were light on technical side of things, which I thought was true; the way I got more out of these was actually talking to the product managers and the customer at the end of the talk to ask and understand some of the more technical concepts. This is true for most conferences, but was especially true for this one.

Stay tuned for a bunch of post conference session updates!

Leave a comment

Filed under DevOps

LASCON Interview: Jason Chan

 IMG_1513Jason Chan (@chanjbs) is an Engineering Director of the Cloud Security team at Netflix.

Tell me about your current gig!

I work on the Cloud Security team at Netflix, we’re responsible for the security of the streaming service at Netflix.  We work with some other teams on platform and mobile security.

What are the biggest threats/challenges you face there?

Protecting the personal data of our members of course.  Also we have content we want to protect – on the client side via DRM, but mainly the pipeline of how we receive the content from our studio partners. Also, due to the size of the infrastructure, its integrity – we don’t want to be a botnet or have things injected to our content that can our clients.

How does your team’s approach differ from other security teams out there?

We embody the corporate culture more, perhaps, than other security teams do. Our culture is a big differentiator between us and different companies.  So it’s very important that people we hire match the culture. Some folks are more comfortable with strong processes and policies with black and white decisions, but here we can’t just say now, we have to help the business get things done safely.

You build a security team and you have certain expertise on it.  It’s up to the company how you use that expertise. They don’t necessarily know where all the risk is, so we have to provide objective guidance and then mutually come to the right decision of what to do in a given situation.

Tell us about how you foster your focus on creating tools over process mandates?

We start with recruiting, to understand that policy and process isn’t the solution.  Adrian [Cockroft] says process is usually organizational scar tissue. By doing it with tools and automation makes it more objective and less threatening to people. Turning things into metrics makes it less of an argument. There’s a weird dynamic in the culture that’s a form of peer pressure, where everyone’s trying to do the right thing and no one wants to be the one to negatively impact that.  As a result people are willing to say “Yes we will” – like, you can opt out of Chaos Monkey, but people don’t because they don’t want to be “that guy.”

We’re starting to look at availability in a much  more refined way.  It’s not just “how long were you down.”  We’re establishing metrics over real impact – how many streams did we miss?  How many start clicks went unfulfilled.  We can then assign rough values to each operation (it’s not perfect, but based on shared understanding) and then we can establish real impact and make tradeoffs. (It’s more story point-ish instead of hard ROI). But you can get what you need to do now vs what can wait.

Your work  – how much is reactive versus roadmapped tool development?

It’s probably 50/50 on our team.  We have some big work going on now that’s complex and has been roadmapped for a while.  We need to have bandwidth as things pop up though, so we can’t commit everyone 100%. We have a roadmap we’ve committed to that we need to build, and we keep some resource free so that we can use our agile board to manage it. I try to build the culture of “let’s solve a problem once,” and share knowledge, so when it recurs we can handle it faster/better.  I feel like we can be pretty responsive with the agile model, our two week sprints and quarterly planning give us flexibility. We get more cross-training too, when we do the mid-sprint statuses and sprint meetings. We use our JIRA board to manage our work and it’s been very successful for us.

What’s it like working at Netflix?

It’s great, I love it.  It’s different because you’re given freedom to do the right thing, use your expertise, and be responsible for your decisions. Each individual engineer gets to have a lot of impact on a pretty large company.  You get to work on challenging problems and work with good colleagues.

How do you conduct collaboration within your team and with other teams?

Inside the team, we instituted once a week or every other week “deep dives” lunch and learn presentation of what you’re working on for other team members. Cross-team collaboration is a challenge; we have so many tools internally no one knows what they all are!

You are blazing trails with your approach – where do you think the rest of the security field is going?

I don’t know if our approach will catch on, but I’ve spent a lot of my last year recruiting, and I see that the professionalization of the industry in general is improving.  It’s being taught in school, there’s greater awareness of it. It’s going to be seen as less black magic, “I must be a hacker in my basement first” kind of job.

Development skills are mandatory for security here, and I see a move away from pure operators to people with CS degrees and developers and an acceleration in innovation. We’ve filed three patents on the things we’ve built. Security isn’t’ a solved problem and there’s a lot left to be done!

We’re working right now on a distributed scanning system that’s very AWS friendly, code named Monterey. We hope to be open sourcing it next year.  How do you inventory and assess an environment that’s always changing? It’s a very asynchronous problem. We thought about it for a while and we’re very happy with the result – it’s really not much code, once you think the problem through properly your solution can be elegant.

1 Comment

Filed under Cloud, Conferences, Security

LASCON Interview: Nick Galbreath

IMG_1509Nick Galbreath (@ngalbreath) is VP of Engineering with client9, LLC.

What are you doing nowadays since leaving Etsy?

I am managing a small DevOps team for a company whose engineering team is based in Moscow, from Tokyo, Japan. Some other executives and our biggest customer is from there. And, I love Japan!

I know you from Velocity and the other DevOps conferences. Why are you here at a security conference?

I’ve been active at Black Hat, DEFCON, etc. as well as DevOps conferences. I’ve found that if your company is in operational chaos you don’t need security.  Once you have a good operational component and it’s not in chaos – standardized infrastructure, automation – you get up to the level where you can be effective at security.  I used the same approach at Etsy – I started there working on security, stopped, worked in infrastructure until that was basically squared away, and only then started working on security again. You have to work your way up Maslow’s hierarchy.

It’s the same with development. My background is originally development and when you’re programming in C/C++ your main effort is stability, but all those NPEs and other bugs are also security issues.  I don’t know any company doing well at security and not well at development, I’m not sure you can do it. Nail the basics and then the advanced topics are achievable.

What’s your opinion on how much the security space has left developers behind?

Look at the real core issues behind security. Dev teams have trouble with writing secure code, ops folks have problems with patching – at security conferences you don’t see anything for solving those problems.  Working on offense/breaking and blocking tools is lucrative but inhibits us from going after the root causes.

For many security pros, working in a team instead of solo is a different skill set. “We don’t want to bother the developers with this” – siloed approaches are killing us.

What do you see as the most interesting thing going on in the security landscape right now?

What has happened in the last 3-4 months, as much as I hate to say it, with all the leaking of documents – we’ve been lazy about encryption and privacy and other foundational elements and we assumed it worked, now we’re doing some healthy review to do a next generation of those. It brought that discussion to the forefront. The certificate authority problems, and the NSA stuff – we need to spend some time and think about this.  The next generation of SSL and certificate transparency are very interesting.

In terms of pure language work… Improvement of cryptography. Also, we’re making more business level APIs for common problems like PHP5’d password hashing APIs.  If your’e building a Web app and need auth you’re starting from zero most of the time and now you’re starting to see things put into the languages that solve these problems.

Out in the larger DevOpsey world, what are the things to watch, what is your team excited about?

Stuff that we’re excited about is traditional devops stuff like really treating our infrastructure like code.  No button clicking, infrastructure completely specified in config files in source control, code reviews, and then the file pushed to production to allocate/deallocate hardware and deploy software.  That’s a big change.

How do we disseminate best practices/prevent worst practices through those who aren’t the technical “1%?”

Well, best practices are harder

People went into server programming because they don’t like doing user interface stuff. But the joke’s on us, there is still a user interface, it’s configuration files, installers, etc. which are nontrivial. We should either be bundling audit software or server-side config healthchecks to provide warnings. “Why do you have SSL v2 enabled?” “Why are your .htaccess files visible by default?” [Ed: Where the hell did apache chkconfig go?]

People in ops can write these but retroactively folks won’t use them… But the future can have them.  If you at least get warned that your Apache config is using suboptimal security configs it’s your deliberate negligence to not do it right.

Maybe take the module approach (Apache wouldn’t want it in their core I’m sure) – if you want to work on it give me a call!

What message do you want to send to other security folks?

For security people, the message is, “It’s really important you start bringing your  non-security friends to these security conferences.” Devs and ops and business and QA. They’ll find it interesting and get involved. It’s really important.

Last year, we had a dozen people from my company come out to AppSec. But except for me and our security team, they’re not back this year. There just wasn’t enough content to hold the interest of the devs. What can we do about that?

Really!  Interesting.  Maybe we need more of a proper dev track, with more things like Karthik’s talk.

A project I’ve wanted to do for a very long time – most people in business and development don’t have  real idea of how much damage can be done, it’s why we have Red Teams. If someone’s really good at SQLi, etc. do a talk showing how much damage can be done.

Also – if you work at any company, you depend on an immense set of open source software and they don’t have a security person or anything.  Get involved in their process, try to help them and make it better and it’ll improve quality of everyone’s systems. We could do a hackathon during the convention to improve some existing projects.

Leave a comment

Filed under Conferences, DevOps, Security

LASCON 2013 Report – Second Afternoon

I’m afraid I only got to one session in the afternoon, but I have some good interviews coming your way in exchange!

User Authentication For Winners!

I didn’t get to attend but I know that Karthik’s talk on writing a user auth system was good, here are the slides. When we were at NI he had to write the login/password/reset system for our product and we were aghast that there was no project out there to use, you just had to roll your own in an area where there are so many lurking security flaws.  He talks about his journey and you should read it!

AWS CloudHSM And Why It Can Revolutionize Cloud

Oleg Gryb (@oleggryb), security architect at Intuit, and Todd Cignettei, Sr. Product Manager with AWS Security.

Oleg says: There are commonly held concerns about cloud security – key management, legal liability, data sovereignty and access, unknown security policies and processes…

CloudHSM makes objects in partitions not accessible by the cloud provider. It provides multiple layers of security.

[Ed. What is HSM?  I didn’t know and he didn’t say.  Here’s what Wikipedia says.]

Luckily, Todd gets up and tells us about the HSM, or Hardware Security Module. It’s a purpose built appliance designed to protect key material and perform secure cryptographic operations. The SafeNet Luna SA HSM has different roles – appliance administrator, security officer. It’s all super certified and if tampered with blows up the keys.

AWS is providing dedicated access to SafeNet Luna SA HSM appliances. They are physically in AWS datacenters and in your VPC. You control the keys; they manage the hardware but they can’t see your goodies. And you do your crypto operations there. Here’s the AWS page on CloudHSM.

They are already integrated with various software and APIs like Java JCA/JCE.

It’s being used to encrypt digital content, DRM, securing financial transactions (root of trust for PKI), db encryption, digital signatures for real estate transactions, mobile payments.

Back to Oleg. With the HSM, there’s some manual steps you need to do, Initialize the HSM, configure a server and generate server side certs, generate a client cert on each client, scp the public portion to the server to register it.

Normal client cert generation requires an IP, which in the cloud is lame. You can isntead use a generic client name and use the same one on all systems.

You put their LunaProvider,jar in your Java CLASSPATH and add the provider to java/security and you’re good to go.

Making a Luna HA array is very important of course. If you get two you can group them up.

Suggested architecture – they ahve to run in a VPC. “You want to put on Internet? Is crazy idea! Never!”

Crypto doesn’t solve your problem, it just moves it to another place. How do you get the secrets onto your instances? When your instance starts, you don’t want those creds in S3 or the AMI…

So at instance bootstrap, send a request to a server in an internal DC with IP, instance ID, public and local hostanmes, reservation ID, instance type… Validate using the API including instance start time, validate role, etc. and then pass it back. Check for dupes.  This isn’t perfect but what are ya gonna do?  You can assign a policy to a role and have an instance profile it uses.

He has written a Python tool to help automate this, you can get it at http://sf.net/p/lunamech.

1 Comment

Filed under Conferences, Security

LASCON 2013 Report – Second Morning

Everyone shuffles in slowly on the second morning of the con. I spent the pre-keynote hour with other attendees sitting around looking tired and comparing notes on gout symptoms.  (PSA: if the ball of your foot starts hurting really bad one day, it’s gout, take a handful of Advil and go to your doctor immediately.)

  • Impact Security
  • NetIQ
  • SWAMP

You can also see a bunch of great pictures from the event courtesy Catherine Clark!

Blindspots

The keynote this morning is from Robert “RSnake” Hansen, now of White Hat. It’s about blind spots we all have in security.  Don’t take this as an attack, be self reflective.

Blindspot #1 – Network & Host Security

Internetworked computers is a very complex system and few of us 100% understand every step and part of it.

How many people do network segregation, have their firewall on an admin network, use something more secure than a default Linux install for their webservers, harden their kernel, log off-host and log beyond standard logs? These are all cheap and useful.

Like STS, it was only considered very tightly and the privacy considerations weren’t identified.

Blindspot #2 – Travel and OPSEC

Security used to be more of a game. Now the internet has become militarized. Don’t travel with your laptop. Because – secret reasons I’ll tell you if you ask. (?)

[Ed. Apparently I’m not security 3l33t enough to know what this is about, he really didn’t say.]

Blindspot #3 – Adversaries

You seed to be able to see things from “both sides” and know your adversary (personally ideally). Some of them want to talk! Don’t send them to jail, talk and learn. Yes, you can.

Blindspot #4 – Target Fixation

Vulnerabilities aren’t created equal. Severities vary. DREAD calculations vary widely. Don’t trust a scanner’s DREAD. Gut check but then do it on paper because your gut is often not correct. Often we have “really bad!” vulnerabilities we obsess about that aren’t really that severe.

Download Fierce to do DNS enumeration, do bing IP search, nmap/masscan/unicornscan for open ports.

Blindspot #5 – Compliance vs Security

These aren’t very closely related.  Compliance gets you little badges and placated customers. Security actually protects your systems and data. Some people exercise willful negligence when they choose compliance over security. Compliance also pulls spend to areas that don’t help security. Compliance doesn’t care about what hackers do and it doesn’t evolve quickly.

Blindspot #6 – The Consumer

Consumers don’t really understand the most rudimentary basics of how the Internet works and really don’t understand the security risks of anything they do. They’re not bad or stupid but they can’t be expected to make well informed decisions. So don’t make security opt in.

We the security industry are not pro-consumer – we’re pro-business. Therefore we may be the first ones against the wall when the revolution comes. Give them their privacy now.

So pick one, work on it, we’ll be less blind!

Big Data, Little Security?

By Manoj Tripathi from PROS in Houston.

Big Data is still emerging and doesn’t have the mature security controls that older data platforms have.

Big data is a solution to needs for high volume, high velocity, and/or rich variety of data.  Often distributed, resilient, and not hardware constrained (but sometimes is).

Hadoop is really a framework, with HDFS, Zookeeper, mapreduce, pig/hive, hbase (or cassandra?). He’ll talk a lot about this framework because it’s so ubiquitous.

NoSQL – Cassandra (eventually consistent, highly available, partition tolerant), MongoDB (consistent, partition tolerant).

Security is an afterthought in Big Data.  It can be hard to identify sensitive data (schemaless). He says there’s provenance issues and enhanced insider attacks but I don’t know… Well, if you consider “Big Data” as just large mineable data separate from the actual technology, then sure, aggregate data insights are more valuable to steal… His provenance concern is that data is coming from less secured items like phones/sensors but that’s a bit of a strawman, the data sources for random smaller RDBMSes aren’t all high security either…

Due to the distributed architecture of hadoop etc. there’s a large attack surface. Plus Hadoop has multiple communication protocols, auth mechanisms, endpoint types… Most default settings in Hadoop on all of these are “no security” and you can easily bypass most security mechanisms, spoof, accidentally delete data… Anonymous access, username in URL, no perm checking, service level auth disabled, etc.

Hadoop added Kerberos support, this helps a lot. You can encrypt data in transit, use SSL on the admin dashboards.

But – it’s hard to configure, and enterprises might not like “another” auth infrastructure. It also has preconditions like no root access to some machines and no communication over untrusted networks. And it has a lot of insecure-by-default choices itself (symmetric keys, http SPNEGO has to be turned on in browsers, Oozie user is a super-user with auth disabled by default). No encryption at rest Kerberos RPC is unencrypted. Etc, etc, etc.

To Cassandra.  Same deal. CLI has no auth by default. Insecure protocols.

NoSQL vulns – injections just like with SQL. Sensitive data is copied to various places, you can add new attributes to column families.

Practical Steps To Secure It

Cassandra – write your own authorization/authentication plugin.  [Ed. Really?] But this has keyspace and column family granularity only. 1.2 has internal auth. Enable node-node and client-node encryption. If you do this at least it’s not naiively vulnerable. Also, use disk support for encryption.

Hadoop – basically wait for Project Rhino. Encryption, key mgmt, token based unified auth, cell level auth in hbase. Do threat modeling. Eliminate sensitive data, use field level encryption for sensitive fields, use OS or file level encryption mechanisms. Basically, run it in a secured environment or you’re in trouble.  Apache Knox can enforece a single point of access for auth to Hadoop services but has scalability/reliability issues. Can turn on kerberos stuff if you have to…

Also. commercial hadoop/cassandra have more options.

Leave a comment

Filed under Conferences, Security