Jason Chan (@chanjbs) is an Engineering Director of the Cloud Security team at Netflix.
Tell me about your current gig!
I work on the Cloud Security team at Netflix, we’re responsible for the security of the streaming service at Netflix. We work with some other teams on platform and mobile security.
What are the biggest threats/challenges you face there?
Protecting the personal data of our members of course. Also we have content we want to protect – on the client side via DRM, but mainly the pipeline of how we receive the content from our studio partners. Also, due to the size of the infrastructure, its integrity – we don’t want to be a botnet or have things injected to our content that can our clients.
How does your team’s approach differ from other security teams out there?
We embody the corporate culture more, perhaps, than other security teams do. Our culture is a big differentiator between us and different companies. So it’s very important that people we hire match the culture. Some folks are more comfortable with strong processes and policies with black and white decisions, but here we can’t just say now, we have to help the business get things done safely.
You build a security team and you have certain expertise on it. It’s up to the company how you use that expertise. They don’t necessarily know where all the risk is, so we have to provide objective guidance and then mutually come to the right decision of what to do in a given situation.
Tell us about how you foster your focus on creating tools over process mandates?
We start with recruiting, to understand that policy and process isn’t the solution. Adrian [Cockroft] says process is usually organizational scar tissue. By doing it with tools and automation makes it more objective and less threatening to people. Turning things into metrics makes it less of an argument. There’s a weird dynamic in the culture that’s a form of peer pressure, where everyone’s trying to do the right thing and no one wants to be the one to negatively impact that. As a result people are willing to say “Yes we will” – like, you can opt out of Chaos Monkey, but people don’t because they don’t want to be “that guy.”
We’re starting to look at availability in a much more refined way. It’s not just “how long were you down.” We’re establishing metrics over real impact – how many streams did we miss? How many start clicks went unfulfilled. We can then assign rough values to each operation (it’s not perfect, but based on shared understanding) and then we can establish real impact and make tradeoffs. (It’s more story point-ish instead of hard ROI). But you can get what you need to do now vs what can wait.
Your work – how much is reactive versus roadmapped tool development?
It’s probably 50/50 on our team. We have some big work going on now that’s complex and has been roadmapped for a while. We need to have bandwidth as things pop up though, so we can’t commit everyone 100%. We have a roadmap we’ve committed to that we need to build, and we keep some resource free so that we can use our agile board to manage it. I try to build the culture of “let’s solve a problem once,” and share knowledge, so when it recurs we can handle it faster/better. I feel like we can be pretty responsive with the agile model, our two week sprints and quarterly planning give us flexibility. We get more cross-training too, when we do the mid-sprint statuses and sprint meetings. We use our JIRA board to manage our work and it’s been very successful for us.
What’s it like working at Netflix?
It’s great, I love it. It’s different because you’re given freedom to do the right thing, use your expertise, and be responsible for your decisions. Each individual engineer gets to have a lot of impact on a pretty large company. You get to work on challenging problems and work with good colleagues.
How do you conduct collaboration within your team and with other teams?
Inside the team, we instituted once a week or every other week “deep dives” lunch and learn presentation of what you’re working on for other team members. Cross-team collaboration is a challenge; we have so many tools internally no one knows what they all are!
You are blazing trails with your approach – where do you think the rest of the security field is going?
I don’t know if our approach will catch on, but I’ve spent a lot of my last year recruiting, and I see that the professionalization of the industry in general is improving. It’s being taught in school, there’s greater awareness of it. It’s going to be seen as less black magic, “I must be a hacker in my basement first” kind of job.
Development skills are mandatory for security here, and I see a move away from pure operators to people with CS degrees and developers and an acceleration in innovation. We’ve filed three patents on the things we’ve built. Security isn’t’ a solved problem and there’s a lot left to be done!
We’re working right now on a distributed scanning system that’s very AWS friendly, code named Monterey. We hope to be open sourcing it next year. How do you inventory and assess an environment that’s always changing? It’s a very asynchronous problem. We thought about it for a while and we’re very happy with the result – it’s really not much code, once you think the problem through properly your solution can be elegant.