Category Archives: Conferences

LASCON Interview: Jason Chan

 IMG_1513Jason Chan (@chanjbs) is an Engineering Director of the Cloud Security team at Netflix.

Tell me about your current gig!

I work on the Cloud Security team at Netflix, we’re responsible for the security of the streaming service at Netflix.  We work with some other teams on platform and mobile security.

What are the biggest threats/challenges you face there?

Protecting the personal data of our members of course.  Also we have content we want to protect – on the client side via DRM, but mainly the pipeline of how we receive the content from our studio partners. Also, due to the size of the infrastructure, its integrity – we don’t want to be a botnet or have things injected to our content that can our clients.

How does your team’s approach differ from other security teams out there?

We embody the corporate culture more, perhaps, than other security teams do. Our culture is a big differentiator between us and different companies.  So it’s very important that people we hire match the culture. Some folks are more comfortable with strong processes and policies with black and white decisions, but here we can’t just say now, we have to help the business get things done safely.

You build a security team and you have certain expertise on it.  It’s up to the company how you use that expertise. They don’t necessarily know where all the risk is, so we have to provide objective guidance and then mutually come to the right decision of what to do in a given situation.

Tell us about how you foster your focus on creating tools over process mandates?

We start with recruiting, to understand that policy and process isn’t the solution.  Adrian [Cockroft] says process is usually organizational scar tissue. By doing it with tools and automation makes it more objective and less threatening to people. Turning things into metrics makes it less of an argument. There’s a weird dynamic in the culture that’s a form of peer pressure, where everyone’s trying to do the right thing and no one wants to be the one to negatively impact that.  As a result people are willing to say “Yes we will” – like, you can opt out of Chaos Monkey, but people don’t because they don’t want to be “that guy.”

We’re starting to look at availability in a much  more refined way.  It’s not just “how long were you down.”  We’re establishing metrics over real impact – how many streams did we miss?  How many start clicks went unfulfilled.  We can then assign rough values to each operation (it’s not perfect, but based on shared understanding) and then we can establish real impact and make tradeoffs. (It’s more story point-ish instead of hard ROI). But you can get what you need to do now vs what can wait.

Your work  – how much is reactive versus roadmapped tool development?

It’s probably 50/50 on our team.  We have some big work going on now that’s complex and has been roadmapped for a while.  We need to have bandwidth as things pop up though, so we can’t commit everyone 100%. We have a roadmap we’ve committed to that we need to build, and we keep some resource free so that we can use our agile board to manage it. I try to build the culture of “let’s solve a problem once,” and share knowledge, so when it recurs we can handle it faster/better.  I feel like we can be pretty responsive with the agile model, our two week sprints and quarterly planning give us flexibility. We get more cross-training too, when we do the mid-sprint statuses and sprint meetings. We use our JIRA board to manage our work and it’s been very successful for us.

What’s it like working at Netflix?

It’s great, I love it.  It’s different because you’re given freedom to do the right thing, use your expertise, and be responsible for your decisions. Each individual engineer gets to have a lot of impact on a pretty large company.  You get to work on challenging problems and work with good colleagues.

How do you conduct collaboration within your team and with other teams?

Inside the team, we instituted once a week or every other week “deep dives” lunch and learn presentation of what you’re working on for other team members. Cross-team collaboration is a challenge; we have so many tools internally no one knows what they all are!

You are blazing trails with your approach – where do you think the rest of the security field is going?

I don’t know if our approach will catch on, but I’ve spent a lot of my last year recruiting, and I see that the professionalization of the industry in general is improving.  It’s being taught in school, there’s greater awareness of it. It’s going to be seen as less black magic, “I must be a hacker in my basement first” kind of job.

Development skills are mandatory for security here, and I see a move away from pure operators to people with CS degrees and developers and an acceleration in innovation. We’ve filed three patents on the things we’ve built. Security isn’t’ a solved problem and there’s a lot left to be done!

We’re working right now on a distributed scanning system that’s very AWS friendly, code named Monterey. We hope to be open sourcing it next year.  How do you inventory and assess an environment that’s always changing? It’s a very asynchronous problem. We thought about it for a while and we’re very happy with the result – it’s really not much code, once you think the problem through properly your solution can be elegant.

1 Comment

Filed under Cloud, Conferences, Security

LASCON Interview: Nick Galbreath

IMG_1509Nick Galbreath (@ngalbreath) is VP of Engineering with client9, LLC.

What are you doing nowadays since leaving Etsy?

I am managing a small DevOps team for a company whose engineering team is based in Moscow, from Tokyo, Japan. Some other executives and our biggest customer is from there. And, I love Japan!

I know you from Velocity and the other DevOps conferences. Why are you here at a security conference?

I’ve been active at Black Hat, DEFCON, etc. as well as DevOps conferences. I’ve found that if your company is in operational chaos you don’t need security.  Once you have a good operational component and it’s not in chaos – standardized infrastructure, automation – you get up to the level where you can be effective at security.  I used the same approach at Etsy – I started there working on security, stopped, worked in infrastructure until that was basically squared away, and only then started working on security again. You have to work your way up Maslow’s hierarchy.

It’s the same with development. My background is originally development and when you’re programming in C/C++ your main effort is stability, but all those NPEs and other bugs are also security issues.  I don’t know any company doing well at security and not well at development, I’m not sure you can do it. Nail the basics and then the advanced topics are achievable.

What’s your opinion on how much the security space has left developers behind?

Look at the real core issues behind security. Dev teams have trouble with writing secure code, ops folks have problems with patching – at security conferences you don’t see anything for solving those problems.  Working on offense/breaking and blocking tools is lucrative but inhibits us from going after the root causes.

For many security pros, working in a team instead of solo is a different skill set. “We don’t want to bother the developers with this” – siloed approaches are killing us.

What do you see as the most interesting thing going on in the security landscape right now?

What has happened in the last 3-4 months, as much as I hate to say it, with all the leaking of documents – we’ve been lazy about encryption and privacy and other foundational elements and we assumed it worked, now we’re doing some healthy review to do a next generation of those. It brought that discussion to the forefront. The certificate authority problems, and the NSA stuff – we need to spend some time and think about this.  The next generation of SSL and certificate transparency are very interesting.

In terms of pure language work… Improvement of cryptography. Also, we’re making more business level APIs for common problems like PHP5’d password hashing APIs.  If your’e building a Web app and need auth you’re starting from zero most of the time and now you’re starting to see things put into the languages that solve these problems.

Out in the larger DevOpsey world, what are the things to watch, what is your team excited about?

Stuff that we’re excited about is traditional devops stuff like really treating our infrastructure like code.  No button clicking, infrastructure completely specified in config files in source control, code reviews, and then the file pushed to production to allocate/deallocate hardware and deploy software.  That’s a big change.

How do we disseminate best practices/prevent worst practices through those who aren’t the technical “1%?”

Well, best practices are harder

People went into server programming because they don’t like doing user interface stuff. But the joke’s on us, there is still a user interface, it’s configuration files, installers, etc. which are nontrivial. We should either be bundling audit software or server-side config healthchecks to provide warnings. “Why do you have SSL v2 enabled?” “Why are your .htaccess files visible by default?” [Ed: Where the hell did apache chkconfig go?]

People in ops can write these but retroactively folks won’t use them… But the future can have them.  If you at least get warned that your Apache config is using suboptimal security configs it’s your deliberate negligence to not do it right.

Maybe take the module approach (Apache wouldn’t want it in their core I’m sure) – if you want to work on it give me a call!

What message do you want to send to other security folks?

For security people, the message is, “It’s really important you start bringing your  non-security friends to these security conferences.” Devs and ops and business and QA. They’ll find it interesting and get involved. It’s really important.

Last year, we had a dozen people from my company come out to AppSec. But except for me and our security team, they’re not back this year. There just wasn’t enough content to hold the interest of the devs. What can we do about that?

Really!  Interesting.  Maybe we need more of a proper dev track, with more things like Karthik’s talk.

A project I’ve wanted to do for a very long time – most people in business and development don’t have  real idea of how much damage can be done, it’s why we have Red Teams. If someone’s really good at SQLi, etc. do a talk showing how much damage can be done.

Also – if you work at any company, you depend on an immense set of open source software and they don’t have a security person or anything.  Get involved in their process, try to help them and make it better and it’ll improve quality of everyone’s systems. We could do a hackathon during the convention to improve some existing projects.

Leave a comment

Filed under Conferences, DevOps, Security

LASCON 2013 Report – Second Afternoon

I’m afraid I only got to one session in the afternoon, but I have some good interviews coming your way in exchange!

User Authentication For Winners!

I didn’t get to attend but I know that Karthik’s talk on writing a user auth system was good, here are the slides. When we were at NI he had to write the login/password/reset system for our product and we were aghast that there was no project out there to use, you just had to roll your own in an area where there are so many lurking security flaws.  He talks about his journey and you should read it!

AWS CloudHSM And Why It Can Revolutionize Cloud

Oleg Gryb (@oleggryb), security architect at Intuit, and Todd Cignettei, Sr. Product Manager with AWS Security.

Oleg says: There are commonly held concerns about cloud security – key management, legal liability, data sovereignty and access, unknown security policies and processes…

CloudHSM makes objects in partitions not accessible by the cloud provider. It provides multiple layers of security.

[Ed. What is HSM?  I didn’t know and he didn’t say.  Here’s what Wikipedia says.]

Luckily, Todd gets up and tells us about the HSM, or Hardware Security Module. It’s a purpose built appliance designed to protect key material and perform secure cryptographic operations. The SafeNet Luna SA HSM has different roles – appliance administrator, security officer. It’s all super certified and if tampered with blows up the keys.

AWS is providing dedicated access to SafeNet Luna SA HSM appliances. They are physically in AWS datacenters and in your VPC. You control the keys; they manage the hardware but they can’t see your goodies. And you do your crypto operations there. Here’s the AWS page on CloudHSM.

They are already integrated with various software and APIs like Java JCA/JCE.

It’s being used to encrypt digital content, DRM, securing financial transactions (root of trust for PKI), db encryption, digital signatures for real estate transactions, mobile payments.

Back to Oleg. With the HSM, there’s some manual steps you need to do, Initialize the HSM, configure a server and generate server side certs, generate a client cert on each client, scp the public portion to the server to register it.

Normal client cert generation requires an IP, which in the cloud is lame. You can isntead use a generic client name and use the same one on all systems.

You put their LunaProvider,jar in your Java CLASSPATH and add the provider to java/security and you’re good to go.

Making a Luna HA array is very important of course. If you get two you can group them up.

Suggested architecture – they ahve to run in a VPC. “You want to put on Internet? Is crazy idea! Never!”

Crypto doesn’t solve your problem, it just moves it to another place. How do you get the secrets onto your instances? When your instance starts, you don’t want those creds in S3 or the AMI…

So at instance bootstrap, send a request to a server in an internal DC with IP, instance ID, public and local hostanmes, reservation ID, instance type… Validate using the API including instance start time, validate role, etc. and then pass it back. Check for dupes.  This isn’t perfect but what are ya gonna do?  You can assign a policy to a role and have an instance profile it uses.

He has written a Python tool to help automate this, you can get it at http://sf.net/p/lunamech.

1 Comment

Filed under Conferences, Security

LASCON 2013 Report – Second Morning

Everyone shuffles in slowly on the second morning of the con. I spent the pre-keynote hour with other attendees sitting around looking tired and comparing notes on gout symptoms.  (PSA: if the ball of your foot starts hurting really bad one day, it’s gout, take a handful of Advil and go to your doctor immediately.)

  • Impact Security
  • NetIQ
  • SWAMP

You can also see a bunch of great pictures from the event courtesy Catherine Clark!

Blindspots

The keynote this morning is from Robert “RSnake” Hansen, now of White Hat. It’s about blind spots we all have in security.  Don’t take this as an attack, be self reflective.

Blindspot #1 – Network & Host Security

Internetworked computers is a very complex system and few of us 100% understand every step and part of it.

How many people do network segregation, have their firewall on an admin network, use something more secure than a default Linux install for their webservers, harden their kernel, log off-host and log beyond standard logs? These are all cheap and useful.

Like STS, it was only considered very tightly and the privacy considerations weren’t identified.

Blindspot #2 – Travel and OPSEC

Security used to be more of a game. Now the internet has become militarized. Don’t travel with your laptop. Because – secret reasons I’ll tell you if you ask. (?)

[Ed. Apparently I’m not security 3l33t enough to know what this is about, he really didn’t say.]

Blindspot #3 – Adversaries

You seed to be able to see things from “both sides” and know your adversary (personally ideally). Some of them want to talk! Don’t send them to jail, talk and learn. Yes, you can.

Blindspot #4 – Target Fixation

Vulnerabilities aren’t created equal. Severities vary. DREAD calculations vary widely. Don’t trust a scanner’s DREAD. Gut check but then do it on paper because your gut is often not correct. Often we have “really bad!” vulnerabilities we obsess about that aren’t really that severe.

Download Fierce to do DNS enumeration, do bing IP search, nmap/masscan/unicornscan for open ports.

Blindspot #5 – Compliance vs Security

These aren’t very closely related.  Compliance gets you little badges and placated customers. Security actually protects your systems and data. Some people exercise willful negligence when they choose compliance over security. Compliance also pulls spend to areas that don’t help security. Compliance doesn’t care about what hackers do and it doesn’t evolve quickly.

Blindspot #6 – The Consumer

Consumers don’t really understand the most rudimentary basics of how the Internet works and really don’t understand the security risks of anything they do. They’re not bad or stupid but they can’t be expected to make well informed decisions. So don’t make security opt in.

We the security industry are not pro-consumer – we’re pro-business. Therefore we may be the first ones against the wall when the revolution comes. Give them their privacy now.

So pick one, work on it, we’ll be less blind!

Big Data, Little Security?

By Manoj Tripathi from PROS in Houston.

Big Data is still emerging and doesn’t have the mature security controls that older data platforms have.

Big data is a solution to needs for high volume, high velocity, and/or rich variety of data.  Often distributed, resilient, and not hardware constrained (but sometimes is).

Hadoop is really a framework, with HDFS, Zookeeper, mapreduce, pig/hive, hbase (or cassandra?). He’ll talk a lot about this framework because it’s so ubiquitous.

NoSQL – Cassandra (eventually consistent, highly available, partition tolerant), MongoDB (consistent, partition tolerant).

Security is an afterthought in Big Data.  It can be hard to identify sensitive data (schemaless). He says there’s provenance issues and enhanced insider attacks but I don’t know… Well, if you consider “Big Data” as just large mineable data separate from the actual technology, then sure, aggregate data insights are more valuable to steal… His provenance concern is that data is coming from less secured items like phones/sensors but that’s a bit of a strawman, the data sources for random smaller RDBMSes aren’t all high security either…

Due to the distributed architecture of hadoop etc. there’s a large attack surface. Plus Hadoop has multiple communication protocols, auth mechanisms, endpoint types… Most default settings in Hadoop on all of these are “no security” and you can easily bypass most security mechanisms, spoof, accidentally delete data… Anonymous access, username in URL, no perm checking, service level auth disabled, etc.

Hadoop added Kerberos support, this helps a lot. You can encrypt data in transit, use SSL on the admin dashboards.

But – it’s hard to configure, and enterprises might not like “another” auth infrastructure. It also has preconditions like no root access to some machines and no communication over untrusted networks. And it has a lot of insecure-by-default choices itself (symmetric keys, http SPNEGO has to be turned on in browsers, Oozie user is a super-user with auth disabled by default). No encryption at rest Kerberos RPC is unencrypted. Etc, etc, etc.

To Cassandra.  Same deal. CLI has no auth by default. Insecure protocols.

NoSQL vulns – injections just like with SQL. Sensitive data is copied to various places, you can add new attributes to column families.

Practical Steps To Secure It

Cassandra – write your own authorization/authentication plugin.  [Ed. Really?] But this has keyspace and column family granularity only. 1.2 has internal auth. Enable node-node and client-node encryption. If you do this at least it’s not naiively vulnerable. Also, use disk support for encryption.

Hadoop – basically wait for Project Rhino. Encryption, key mgmt, token based unified auth, cell level auth in hbase. Do threat modeling. Eliminate sensitive data, use field level encryption for sensitive fields, use OS or file level encryption mechanisms. Basically, run it in a secured environment or you’re in trouble.  Apache Knox can enforece a single point of access for auth to Hadoop services but has scalability/reliability issues. Can turn on kerberos stuff if you have to…

Also. commercial hadoop/cassandra have more options.

Leave a comment

Filed under Conferences, Security

LASCON 2013 Report – First Afternoon

We move into the afternoon of LASCON. The vendor room was all abuzz, complete with lockpicking village.

IMG_1477IMG_1478

Stupid Webappsec Tricks

Zane Lackey, Security Engineer Manager from Etsy (@zanelackey)

XSS

Data driven security – look at your data instead of using your presuppositions about how attacks work.

Overwrite common methods but only phone home on interesting payloads.

8477 XSS attempts with mostly alert(), prompt(), confirm() (or multiples thereof). The payloads are mostly what you’d expect, “XSS,” document.cookie, integers (from scanners). Note you can’t match on “document.cookie” because it’ll already be expanded, so look for your domains, unique cookies, etc.

What else detects XSS well?  Chrome’s XSS Auditor. Works great.  But it can defend the user but doesn’t fix the XSS.

Server side attempt –

  1. Scan input for HTML esscapes/tag creation.
  2. If found, set flag to true and create array of hostile input.
  3. At output time, check flag, see if any hostile input is being output as valid HTML.
  4. If hostile input is being output, alert!

Need to fail open, stripping will break your app… And it should take you 20 minutes to push to production so detect to fix is a short path!

SQL Injection

These are attack chains that can be instrumented. Detection step then exploit step.

Alert on SQL syntax errors showing up in your application today. It’s a bug even if it’s not an exploit.

Watch logs for unique sensitive db table names in requests.  Occasional false positives are OK.

A SQL injection exploit response will be huge sized, often larger than is normal, detect that. Whitelist stuff that is supposed to give huge responses.

The more alerts you have in an attack chain the more visibility you have, but false positives happen. But if it’s happening in order down the chain, it’s probably not false.

“Temporary” debug stuff is permanent. How do you find this automatically? Access logs.

Map access logs to code paths. Endpoints that don’t get requests are anomalous. Alert off it then go take it out.

Attacker Trix

Cheapest way to find webapp vulns – Automation. Your best attackers are doing it manually anyway, but may as well beat out the kiddies. Break off-the-shelf scanners. They give off strong detection signals. User agents, request patterns, requests for stuff that doesn’t exist (*.asp or php on a Java site, for example).

Blocking IPs is easy but dangerous. You’ll break lots of legit things. IPs are not a strong correlation to identity.

  1. Classify a request as being from a scanner
  2. If yes, weight based on confidence
  3. Feed request into rate limiter (see Nick G’s rate limiting at scale talk) and drop if above threshold. They return a 439 “Request Not Handmade” 🙂

This doesn’t impact browsing but does scripting. Set your thresholds high; allows for false positives but a scanner will definitely peg it.

Be ready for the weirdness that is the Internet! Tried auto-banning accounts that do scanning. They saw 437 scanners over the last week and only 10 were authenticated and 5 were false positives. Browser plugins is our guess. So don’t auto-ban.

Attacks don’t always happen like you’d expect.  Look at the data before you make decisions. Get the instrumentation you need to make those decisions.

“Run a bug bounty program and the Internet shows up!”

And of course you can then insert false data sets to screw with people and increase the cost of attack.

We don’t run scanners of our own because it’s a time sink and requires manual babysitting. We have taken WAF concepts and build them into the apps; since we deploy 30x/day we don’t need the “coverage in the meanwhile” functionality they provide.

Stalking a City for Fun and Frivolity

By Brendon O’Connor, CTO of Malice Afterthought and law student. About CreepyDOL wifi surveillance. He was wearing a kilt and started out by telling us we’d “lost the mandate of heaven.” Why is this? Well…

Everything leaks too much data. Privacy has been disregarded. Fundamental changes are needed to fix this. We need to democratize security – the government is the worst way to do this.

Especially the case of the US persecuting legitimate security researches like Weev for doing things like accessing public information on Web sites.

Wireless.  Your devices advertise networks they know for all our convenience. His little doodads find your probe list of wifi locations and gps location. Now we need a distributed way of doing this on a large scale with no centralized control. Academic sensor networks are kinda like this, but expensive. Hence, the F-BOMB hardware gizmo.

Raspberry Pi based, 5W, $57.08. Uses connection to municipal wifi to phone home, with automatic portal-clickthrough. Reticle, leaderless command and control software. Uses TOR to go out.

CreepyDOL is distribute computation for distributed systems. Want to digest on the nodes to minimize net traffic. Centralized querying for centralized questions only.  Filters include Nosiness, Observation, and Mining. Visualization using Unity (the game engine). Oh look, you can see a map mashup of people wandering around and click on them and find their name and other useful info.

Bottom line is that all these technologies leak info about you like it’s going out of style and it’s pretty simple to get Orwellian levels of visibility on you for one low price.

Gauntlt

I missed this in favor of the next talk; I’ve seen about a dozen gauntlt presentations over time since I know James, but here’s the slides! Integrate security into your CI pipeline you freaks!

Penetration Testing: The Other Stuff

David Hughes, OWASP Austin president and Red Team analyst for GM.

This started as being about organizational skills… It’s general tips on making your life as a pen tester easier.

  • Clients aren’t always right about their environment and scope creep can happen.
  • Don’t assume you’ll have Internet, there’ll be proxies…
  • Prep your tools and do updates and test it ahead of time.
  • Rehearse your toolchain
  • Title your terminals
  • Use mind maps (Freemind), outline tools (NoteCase Pro) to organize tools, systems
  • HTTP-Screenshot module does screenshots as nmap scans
  • Use output options or pipe to a file
  • Reporting – keep organized, do it as you go, use ASCIIdoc to take text to pdf
  • Do things the easy way – look for low hanging fruit. DEfault credentials, bad passwords, cleartext, social engineering, dumpster diving, open wireless. Easy stuff is higher risk and the client cares more than esoteric crap.
  • Don’t rush recon, look for clues, broken windows
  • Have a plan (PTS framework) but range off as needed
  • Protect your customer’s data
  • Encrypt your stuff
  • Have backups
  • Learn and use a scripting language
  • Don’t rub it in with the client
  • Get involved with the community!

And that’s everything but the drinking… Time for happy hour and the mechanical bull!

Here’s some pictures of the volunteers hard at work, the speakers’ green room (there were chair massages there in the afternoon!), and organizer Josh Sokol with Robert “RSnake” Hansen!

IMG_1479 IMG_1480 IMG_1476

Leave a comment

Filed under Conferences, Security

LASCON 2013 Report – First Morning

IMG_1475Arriving at #LASCON 2013, hosted as usual at the Norris Conference Center, the first thing you see is the vintage video games throughout the lobby! As usual it’s well run and you get your metal badge and other doodads without any folderol; volunteers packed the venue ready to help folks with anything. I got a lovely media badge since I’m on the hook to blog/tweet it up while I’m there! It’s in a nice central location on Anderson Lane so getting there took a lot less time than my normal commute to work did.

IMG_1481The MCs, James Wickett and David Hughes, got us kicked off. Thanks went out to many the LASCON sponsors!

  • White Hat
  • Qualys
  • Gemalto
  • Trustwave/Spider Labs
  • Critical Start
  • Sourcefire
  • SOS Security

IMG_1482Then everyone stood and raised their right hand to say the “LASCON pledge,” which consists of “I will not hack the Wi-fi,” “I will not social engineer other attendees and the nice Norris Conference Center staff who are hosting us,” and similar.

Then, the keynote!

Keynote- Nick Galbreath, The Origins of Insecurity

IMG_1488Nick Galbreath (@ngalbreath), VP of Engineering at Iponweb. He used to work for Etsy, now he works in Tokyo for a Russia-based ad infrastructure company.  Suck that, Edward Snowden.

Slides at speakerdeck.com/ngalbreath!

If you’re in security, you should be bringing someone else from dev or ops or something here! We can’t get much done by ourselves.

Crypto

There’s a lot of consternation about crypto and SSL and PKI lately. The math is sound!  See FP’s “The NSA’s New Code Breakers” – it’s way easier to get access other ways. I don’t know of any examples of brute forcing SSL keys – it’s attacking data at rest or bypassing it altogether.

But what about the android/bitcoin break and alleged fix re: Java SecureRandom PRNG? I can’t find the fix checked in anywhere.  Let’s look at SHA1PRNG. Where’s the spec? You’re forced to use it, where’s the open implementation, tests…

Basically everything went wrong in specification, implementation, testing, review, postmortem… Then the NIST’s Dual-EC-DRBG spec – slow and with a potential backdoor – but at least it’s not required by FIPS!  It’s broken but not mandatory and we know it’s broken, so fair enough. It’s a “standard turd.” Standards aren’t a replacement for common sense. Known turdy in 2007.  Why are you just removing it now? TLS 1.2 was approved in 2008, why don’t all browsers support it and no browsers support GCM mode? Old standards need augmentation and updates.

Fixing the CA system – four great ways, certificate pinning, pruning, HTTPS Strict-Transport-Security, certificate-transparency.org.

Everything Else

  • Network Security – stuff you didn’t write
  • App Security – stuff you did write
  • Endpoint Security – stuff you run

IT internal tech is mostly Windows/Mac CM and patching, 99% C-based stuff.

Tech Ops – Routers, Linux, Core server (all C too)

Dev:

  • Input validation – not hard
  • Configuration problems
  • Logical problems – more interesting
  • Language platform problems (most patches here also in C!)

Reactive work is patching, CM, fixing apps, patching infrastructure. You can focus your patching though – Win7 at current patches, Flash, Adobe, Java will get 99% of your problems, focus there – but it’s hard to do. But either you can do it trivially or it’s really hard.

Learn from the hardest apps to deploy.  The Chrome model of self updating gets 97% of people within a version in 4-6 weeks. Android, not so good- driven more by throwing out phones than any ability to upgrade. They’re chipping stuff away from the OS and making more into apps to speed it up. Apple/iOS just figured out app auto-update. Desktop lags though. WordPress is starting background updates. BSD is automatically installing security updates at first boot.

Releasing faster and safely is a competitive advantage AND makes you more secure.

For desktop upgrades, can’t we do something with containers? Why only one version installed? How can we find out about problems from users faster? How do we make patching and deployment easy for the dumbest users?

Even info on “How do I configure Apache securely” is wide and random on the Web. Silently breaks all the time, and it’s simple compared to firewalls, ssh, VPN, DNS… Rat’s nests full of crap, while it gets easier and easier to put servers on the internet. How can we make it safe to configure a server and keep it secure?

Can we do this for application development? Ruby BrakeMan is great, it does static analysis on commit and sends you email about rookie mistakes. Why not for apache config? (Where did chkconfig go?)

PHP Crypt – great for legacy passwords and horrible for new ones. Approximately 0% chance of a dev getting its configuration right.

See @manicode’s best practices – have a business level API for that.

By default, every language has a non-crypto, insecure PRNG. So people use them. They are used for some science stuff, but seriously if you’re doing physics you’re going to link something else in. Being slightly slower for toy apps that don’t care about security isn’t a big deal. Make the default PRNG secure! And, there’s 100x more people interested in making things fast than making them secure, so make the default language PRNG secure and people will make it faster.

libinjection.client9.com to try to eliminate SQL injection! It’s C, fast, low false positives, plug in anywhere.

Products focus on blocking and offense/intrusion, but leave these areas (actual fixing) uncovered.  Think globally, act locally. Even if you’re not a dev, most open source doesn’t have a security anything – join in!
Write fuzzers, compile with different flags, etc.

So think big, get involved, bring your friends!

Malware Automation

IMG_1490By Christopher Elisan from RSA, aka @tophs.

Total discovered malware is growing geometrically year over year. There are a lot of “DIY malware creation kits” nowadays; SpyEye, Zeus… These are more oriented around online crime; the kits of yesteryear were more about pissing contests about “mine is better than yours” (VCL, PS-MPC). The variation they can create is larger as well.

Armoring tools exist now – PFE CX for example, claims to encrypt, compress, etc. your executable – but all the functions don’t always work and buyers don’t check.  Indetecitbles.net is online and will do it! It was free but now it’s “hidden.”

Use a tool like ExeBundle to bundle up your malware and then share it out via whatever route (file sharing, google play, whatever). Or hacking and overwriting good wares – even those that bother publishing a hash to verify their software often keep it on the same Web site that is already getting hacked to change the executable, so the hash just gets changed too.

So you make your malware with a kit, put it through a crypter a realtime packer, an EXE binder, other armoring tools, then run through QA in terms of on premise and cloud AV, then you’re ready to go.

Targeted vs opportunistic attacks… Delivery is a lot easier when you can target.

Anyway, many of those new malware samples are really just the same core malware run through a different variety of armoring tools. They’re counted as different malware but should get grouped into families; he’s working on that at RSA now.

Besides the variation in malware, domains serving malware can rotate in minutes. Since the malware can be created so quickly it effectively defeats AV by generating too many unique signatures. Reversing has to be done but it takes weeks/months.

Demo: Creating Malware in 2 Minutes!

ZeuS Builder – bang, bot.exe, one every couple seconds. Unique but not hash-unique at this point. They look different on disk and in memory. Then runs Saw Crypter, in seconds it creates multiple samples from one ZeuS sample. Bang, automated generation of billlllllyuns of armored samples.

There’s really just a handful of kits behind all the malware, need new solutions that go after the tools and do signature-less detection.

From Gates to Guardians: Alternate Approaches to Product Security

IMG_1493Jason Chan, Director of Engineering from Netflix, in charge of security for the streaming product. Here are his slides on Slideshare!

Agile, cloud, continuous delivery, DevOps – traditional security doesn’t adapt well to these. We want to move fast and stay safe at Netflix.

The challenges are speed (rapid change) and scale. To address these…

  • Culture – If your culture has moved towards rapid delivery, it’s innovation first. Don’t be “Doctor No” and go against your company culture, you won’t be successful.  Adapt.
  • Visibility – you need to be able to see whats going on in a big distributed system.
  • Automation – no checklists and spreadsheets

At Netflix we do ~200+ pushes to production a day, 40M subscribers, 1000+ devices supported.

Culture

We have a lot of stuff on our site about this, it’s a big differentiator.  “Freedom and responsibility” is the summary. No buck passing. Responsible disclosure program externally.

We’re moving towards “full stack engineers” that know some about appsec, online operations, monitoring and response, infrastructure/systems/cloud – that can write some kind of code. The security industry seems to be moving towards superspecialists, we don’t see that as successful.

2 week sprint model, JIRA Scrum workflow (CLDSEC project!). No standups, weekly midsprint meeting. Bullpen shared-space model.

Visibility

Use their internal security dashboard (VPC, crypto, other services plug in and display their security metrics). Alerts send emails with descriptive subjects, the alert config, instructions/links as to where to check/what to do. Chat integration.

NSA asks, how do you verify software integrity in production?  How do you know you’re not backdoored?

They have their Mimir dashboard that is a CI/CD dashboard, that tracks source code to build to deploy to JIRA ticket. Traceability!

Canary testing because code reviews don’t catch much.  Deploy a new version and test it (regression, perf, security) and see if it’s OK. Automatic Canary Analyzer gets a confidence level – “99% GO!”

Simian Army does ongoing testing. Go to prod… Then the monkeys test it.

Security Monkey shows config change timestamps of security groups and stuff.

So they have Babou (the ocelot from Archer) that does file integrity monitoring. They use the immutable server pattern so checking is kinda easy, but you still can be running multiple canary versions at the same time so there’s not one “golden master.” This allows multiple baselines.

Q: How long did it take to make this change and implement? What were the triggers?
A: This push started when he started in 2011; previously IT security handled product security. He hired his first person last year and now they’re up to 10.

Q: What do you do earlier on in the lifecycle in arch and design (threat modeling etc.)?
A: Can’t be automated, the model here is optionally come engage us (with more aggressiveness for stuff that’s clearly sensitive/SOXey).

Q: So this finds problems but how do people know what to do in the first place, share mistakes cross teams?
A: As things happen, added libraries with training and documentation. But think of it as “libraries.”

Q: Competing with Amazon while renting their hardware? (Laaaaaame, the CEO has talked about this in multiple venues.)
A: AWS is the only real choice. Our CEOs talked.

Next – Lunch!  No liveblog of lunch, you foodie voyeurs!

Leave a comment

Filed under Conferences, Security

Velocity 2013 Wrapup

Whew, we’re all finally back home from the conferencing. Fun was had by all.

@iteration1, @ernestmueller, @wickett

@iteration1, @ernestmueller, @wickett

Over the next week I’ll go back to the liveblog articles and put in links to slides/videos where I can find them (feel free and post ones you know in comments on the appropriate post!). We’ll also try to sum up the best takeaways into a Velocity 2013 and DevOpsDays Silicon Valley 2013 quick guide, for those without the patience to read the extended dance remix.

Leave a comment

Filed under Conferences, DevOps

DevOpsDays Silicon Valley 2013 Day 2 Liveblog

Woooo!  Last day of a week of conferencing.  DevOpsDays Day 1 was good and I have even more openspace topics I plan to propose next time.  As usual this is being livestreamed and will be viewable later as well at bmc.com/devops.

Sponsor Watch… Got to talk to our friends at PagerDuty (alert management) and Datadog (monitoring/dashboarding), we use them and love them. And I got to see Stormpath again, they first showed up at last DevOpsDays with a SaaS hosted auth solution (not like PingIdentity and Okta, they actually store the usernames/passwords for you, Les Hazlewood the Apache Shiro guy started it) and they’re growing quickly. Also talked to SaltStack which does salt, a remote command execution framework. 10gen was here with a MongoDB SaaS backup solution (nice!) and monitoring solution.

Leading the Horses to Drink

By Damon Edwards (@damonedwards) from DTO and now #SimplifyOps.

How to spread DevOps in enterprises.  There’s silos you know.  The term DevOps may work against you – it’s evangelical and being overused/washed already.

There is no ‘why’ other than the why of the business. Read your Deming/Collins/Four Steps to the Epiphany/etc.

Go ask people… Something.

Develop a common DevOps vision. Not a process because they’ll get blinders on. [Ed: I believe this is a false dichotomy – you should teach both. Vision without process lacks focus and process without vision lacks direction.  It’s like accuracy and precision.]

  1. See the system
  2. Focus on flow
  3. Recognize feedback loops

Do a value stream mapping – read Learning to See.  OK, this is the meat of the preso – very hard to read though.

Take your information flow and turn it into an artifact flow

Do a timeline analysis, find waste

Metrics.  Establish the metric chain of what matters to the business, driven down to a capability which influences what matters to the business, and driven down to an activity over which an infividual can cause/influence outcomes.

Doesn’t require saying “devops.”

  1. Teach concepts
  2. Analysis
  3. Metrics chains
  4. Do something
  5. Iterate

Only takes like 3 days to bootcamp it. Then put in continuous improvement loops.

You can only break silos by brute-force being the boss, but misalignment will reassert itself. Have to change the alignment.

Q&A: Do it with everyone in the same room with whiteboards/postits, it works better than getting fancy

Beyond the Pretty Charts

Toufic Boubez from Metafor.  Cofounded Layer 7 and escaped when CA acquired them.

Came from a popular DOD Austin Openspace – see the blog post!

  1. We’ve moved beyond static thresholds – or, at least, everyone thinks they suck. Need more dynamic analytics.
  2. Context is important – planned and known (or should be known) events cause deviation. Correlate events with metric gathering.
  3. Don’t just look at timelines. Check the thinking round Etsy’s Kale and Skyline, many eval methods assume normal metric distribution and that’s uncommon. Look at a histogram of any given data – like latency is usually gamma not gaussian.
  4. Is all data important to collect? There’s argument over that.  Get it all and analyze vs figure out what’s important to not waste time.
  5. We all want to automate. Need detection before it’s critical. Can’t always have a human in the loop. Whipping out the control theory – open loop control systems, closed loop – to get self healing systems we need current state/desired state diffing from our monitoring systems and taking action. [Ed. We experimented with this back at NI, we had Sitescope going to a homegrown system called “monolith” that would take actions. Hard to account for all factors though and eventually was discontinued.] Also supervised vs unsupervised loops [Ed: – we might have kept monolith around if it SMSed us and said “memory is high on this server I believe I should restart the java process, is that OK” and we could PagerDuty-like say yea or nay.]

How much data do you need?  No more res that twice your highest frequency (Nyquist-Shanon). Most algorithms will smooth/average/etc.

Q&A: Are control systems more appropriate for small not large systems?  No – just like in industry, as long as you design for that then it’s not just for toys.

And now I step in for the vendor pitch for Riverbed.  Agile Admin Peco left yesterday and the other Riverbed booth guys made themselves scarce, so I did their shout-out for them. They have Zeus EC2 LBs, Aptimize web front end optimizer, and Opnet Appinternals Xpert APM tool!  Very cool.

Identifying Waste in your Build Pipeline

Scott Turnquest from Thoughtworks

Tools: Value stream mappings, fishbone analysis, “5 Whys”

So how do we do that value stream mapping? Here we go!  [Ed: Oh, this is nice, I was sad that in the DTO presentation they mentioned them and threw some up but didn’t really dive down into one.]

A day of analysis of one small feature –  a day of wait, 4 days of dev, 2 mins of wait, 1 hour of acceptance tests, 4 hours of deploy, 1 day in staging, 4 hours to deploy to prod. Note the waste areas – “4 days in dev?  Really?” and the long ass deploy windows [Ed: Our value stream looks depressingly like this.] Process cycle efficiency of 75% (value creation time/total time)

So to determine the source of those waste areas, use the fishbone diagram. Had long feedback cycles from structure of code and build/deploy pipelines. Couldn’t test w/o AWS and can’t test individual components, provisioning was serial and repos were flaky.

Fix underlying cause (most impact first) – deploy pipelines. Reduce failure rate of deployments. Half were failing, and failing slow. Moved to AMI baking for reliability. [Ed: They said I was crazy a couple years ago when I said this, “no it’s a foil ball…” Bake when you can!] So this got them from 4 hours to 2 hours, and then parallelized and got down to 25 minutes. This cut down the staging and prod deploys but also the dev time. Process cycle efficiency up to 83%.

5 Whys root cause analysis method. Figured out manual hard to automate deployments were at the root, automated them – don’t be afraid to restructure/redesign when complexity gets in the ways.

Analysis techniques are not just for analysts!

Read Jez Humble’s “Continuous Delivery”, Poppendieck’s “Lean Soft Dev”/”Implementing Lean Soft Dev” , Derby/Larsen “Agile Retrospectives”

Clusters, developers, and the complexity in Infrastructure Automation

Antoni Batchelli of PalletOps. Complexity, essential and accidental. Building a system is simple but the systems are complex at runtime, and “complexity of a system is the degree of difficulty in predicting the properties of the system given the properties of the system’s parts.”

In DevOps we see infrastructure-aware software and concepts moving up into dev processes.

Devs want to run “their own” cluster with all the setups they need – productionlike, but with specific versions/timings/data/code/etc. Don’t care about infra details but want consistent envs/code.

Software has to be infrastructure aware now to autoscale, self-heal, etc. The app is the best informed actor to make/orchestrate infra decisions.

[Ed: This late into a conference week, I get a little irritated about presentations that are not really clear *why* they are telling you what they’re telling you.]

He hates incidental complexity. Me too.

OK, maybe we’re getting to a thesis. Let people solve problems where they are less complex: at the right level of abstraction. Build layers of abstraction – infrastructure, OS, services, actions. Make them into modules, make them functional and polymorphic.

Ignites!

James Wickett (@wickett) on Rugged DevOps and gauntlt for security + DevOps. gauntlt is a gem for continuous security testing as part of your build cycle. BDD your app’s security! Knock Out!!! go to gauntlt.org to get started.

Karthik Gaekwad (@iteration1) on DevOps Culture in the CIA. Devops is culture/automation/measurement/sharing. Seen Zero Dark Thirty? Well, the true story behind that details the COA’s transformation from a split between analysts and operatives especially using Sisterhood, a group of female analysis tracking Bin Laden since 1980. Post 9/11 there was a mass reorg to become more tactical – analysts became Targeters and worked with Operatives hand in hand. Same kind of silo busting. The Phoenix Project is Zero Dark Thirty for DevOps!

Dave Mangot (@davemengot) for DevOps Do’s and Don’ts from Salesforce.  Do give everyone the tools they need to do their jobs. Don’t make ops the constraint, Do lots of communicating. Don’t forget to include everyone. Do get ops involved early. Don’t create a front door (loaded) process. Do have integration environments, Don’t forget config management. Do have blameless post-mortems. Don’t use the Phoenix Project as a bludgeon. Do use Agile as a cultural tool. Don’t rely on tools to change culture. Do get executive sponsorship. Don’t do shadow IT. Do use Damon Edward’s levers. Don’t just lecture, it’s a participation sport. Do structure the org around delivery. Don’t make separate DevOps teams or jackets. Do get the whole company involved, DevOps is for everyone.

Jonathan Thorpe – Preventing DevOps success. Not planning for scale. Not having unit tests. Not designing automated tests to scale. Not managing your capacity. Not using your resources effectively. Not using same deployment process for all environments. Not knowing what/where/when/who (activity tracking). Getting covered in ants.

DevOps is the future – John Esser from ancestry.com. What keeps CIOs up at night? Besides ants? IT. Need time to value. Transform mindset/processes/tools/etc. Strangler pattern.

DevOps productivity survey by Oliver White from ZeroTurnaround. DevOps oriented teams spend more time on infrastructure improvements and less on firefighting and support. Problem recoveries are shorter. Release software faster. use more custom tools. Make love for longer time. @rebel_labs

Nathan Harvey on leveling up your skills. Quit!  Go to a conference. Try new things. Do a project somewhere. Always be interviewing.

Leave a comment

Filed under Conferences, DevOps

DevOpsDays Silicon Valley Day 1 Presentations

All right, the corporatey part of the week (Velocity) is over, and the tech Illuminati have stayed for DevOpsDays Silicon Valley (used to be Mountain View) – with like 500 people!

The hashtag is #devopsdays and all the presentations was live streamed at the usual place for DevOpsDays live streaming, www.bmc.com/devops.  The videos are now all up on Vimeo.

To open, a funny Point Break DevOps parody!  Ah, makes me want to watch that movie again.

DevOps + Agile = Business Transformation

The first talk is from Jesse Robbins (@jesserobbins)!  Ex-Velocity co-host and co-founder of Opscode, he has now found a home in the TECH UNDERGROUND which is DevOpsDays. He started Velocity because he couldn’t share all the secret stuff they were doing at Amazon but knew it was so important and crucial to the Web and thus the world… Sometimes frighteningly so.

DevOps, he says, is the ability to consistently create and deploy reliable software to an unreliable platform that scales horizontally. The right tools and culture are critical to doing this successfully.

The Internet is becoming pervasive.  Applications became customer service vehicles. Walmart and Amazon both understand this. Email killed the post office. These rips in the social fabric reveal something better. These changes are coming faster and faster and the technology that does this is ours – we build and run it.

Misaligned incentives cause conflict. “Operant conditioning.” People know what the good guys are doing, but they just can’t change themselves to do it – elephants can’t fly just by flapping their ears harder.

You can do your keep-it-small DevOps effort, but eventually you have to say “if we don’t do this everywhere we will fail” – that’s not a business or technology problem, it’s a culture problem.  He’s given this speech inside a bunch of organizations and knows how much resistance there is to change because they all wriggle around like itchy bear cubs when he says it.

Circuit City’s downfall and Blockbuster’s downfall due to Netflix are examples of cultures making agility impossible.  You can’t “agile out” of that. You can provide tools and culture but the overall foundation has to spread. True story, Blockbuster decided the brilliant way to get out of its death spiral was to buy Circuit City, which was also in a death spiral.  And “make it up in volume” I guess.  Is “being a meathead” a culture problem? I reckon.

Conway’s Law – you make things that are copies of your org structure.

Fundamental attributes of successful cultures:

  1. shared mission and incentives
  2. infrastructure as code
  3. application as services
  4. dev+ops+all as teams

Successful practices:

Full stack automation, commodity hw or cloud, reliability in the software, infrastructure APIs, code infra services – infra as product, app as customer.

Service orientation. versioned APIs, resiliency (design for failure), storage abstraction, push complexity up the stack, deep instrumentation

Agile, trust basis, shared metrics and monitoring, incident management, service owners on call, tight integration (maybe you end up with dedicated network or sec oncall, like SREs,  but at the core still collaborative), continuous integration, SRE/SRO to spread concepts, game days.

It takes time – amazon.com didn’t switch to EC2 till Nov 10, 2010.

Changing culture:

  • Start small, build trust & safety
  • Create champions
  • Use metrics to build confidence
  • Celebrate successes
  • Exploit compelling events – cause moments of openness

Continuous Quality: What DevOps Means for QA

By Jeff Sussna, @jeffsussna.

Old definition of quality – “does the software meet the spec?” But agile is about delivering value and cloud is about turning software into services. New definition of quality is “does the service help customers accomplish their jobs-to-be-done.”

A restaurant isn’t just about food delivery, there’s a lot of value creation in the whole chain. A service needs functionality, operability, deliverability, coherency (does it engage me throughout my journey).

New approaches include user-centered design, test-driven development, continuous delivery, MTTR over MTBF; build in testing and learn from failure.

So QA changes. Boundaries blur and automation takes over the manual activities. New and more valuable role: represent the “service not software” perspective, watchdog those 4 attributes.

QA engineers need to lift their gaze above the mechanics of testing, treat tests as code, focus on building quality into the system (quality advocate). New skills include understanding and thinking about service (e.g. outage comms), ops (sec, monitoring), process/automation

[Ed: If we add all this devopsy stuff into the definition of done, QA should look at all of it.]

Good testers see systems and their prts (and gaps), ask probing questions, design good tests, engage that proficiency in design and test plan critiques.

So we need this new kind of testing as well as the old kinds so with continuous delivery how do we catch up?  And people give a lot of “buts” about automated functional testing, But there are frameworks and DSLs that allow you to make changeable, encapsulated testing.  And it’s a process problem – no one asks what it’ll take to test the system. Write code and tests together, commit them together.

Operability and deliverability need testing. Design for internal users too…

You still want QA (instead of obsoleting them) as attached to the customer and as an antidote to confirmation bias.

Continuous Quality – everyone is  testing all the time, quality infused, QA is a mirror for the organization. There is still specialization.

Shout out to @guidostompff of designinteams.com.

Is your team instrument rated?

By J. Paul Reed (@SoberBuildEng) – also see the podcast theshipshow.com!

Culture. Is it hugs and beer? No, it’s incentives + human factors.

Why aviation as a DevOps analogy? It progressed from craft to trade to science to industry. DevOps is in th e”late trade” phase of that development.

Incident response is good but the house is already on fire by then. In aviation there’s a lot of scale and you want to avoid the incidents in the first place.

Learning to fly – first, you learn visual flight rules. Use your eyeballs. Then you move to instrument flight rules – flying in the system.

Flying by instrument relies on standardization, communication (precision), expectations (responsibilities in a situation), remediation. It is not static, blindly relying on automation or process, or fun-verboten.

How to get there?  Define your current process even if it’s weird, focusing on operational requirements, derive primitives, define operational dictionary, and make sure the nonfunctional requirements) are owned.

Formalize roles, responsibilities. There should be clear transfer of control on who “has the ball.” Drill/train and delegate. Priority classes. Fly|navigate|communicate.

Understand your org’s limitations.

Holding patterns/WIP are bad because it adds chaos to the system.

Investigate outcomes. Should you have an external team investigate? “No blame” postmortems aren’t about not being a jerk and making people sad, but  because it’s very unlikely a failure is “one guy’s fault” and it’s a red herring to think so. [I made a lovely “Root cause is a myth” custom t-shirt at the con! -Ed.]

You should have a day-to-day operational model that accounts for incentives and the human factors that make people able to deliver on them.

Leveling Up a New Engineer in a Devops Culture; Healthy Sustainability

By Gary Foster and Mercedes Coyle from Scripps

You want to hire a new engineer, teach them “our way,” inculcate a devops mindset from the beginning, add good practices and training to the local labor pool, and pay it forward.

Identify needs and outcomes desired and get a mentor.  THEN go hire! They go to hackathons and stuff to hire. Incubators, boot camps (e.g. Hackbright).

And now the new engineer! She was looking for a place where she could get up to speed quickly, support and challenge but no hand holding, senior engineers to help a new engineer grow. She had basics skills in coding/testing/deploying and willingness to learn.

What to do on the job as a new person? Question what you don’t understand, avoid perfectionism, and speak up.

The mentor’s responsibility is patience, giving them responsibility like seasoned engineers, ask them for ideas, teach problem solving not syntax and don’t give the answer.

So train ’em, listen to ’em, form a cult around ’em. Take responsibility for bad habits.

Ignite Time!!!!

Adrian Cockroft of Netflix (@adrianco) on beer pineapples and bottlenecks. “Cockroft headroom plot” helps you see when there’s serialization due to a bottleneck.

Peco!!! @bproverb on how we have an incident driven culture and effectively reward failure.  “Actionable alerts” are reactive and often we have sparse bad data. Analyze and track close calls, reward for prevention. Spend time with your data. No need to theorize if you have data, you can track close calls and pursue root cause. Find analyst ninjas. Close call focused analysis.

David Hatten from UrbanCode/IBM. The positive powers of negative thinking. And nihilism. And criticism.  Somebody needs a nap. Read Be Nice To Programmers.

Chantell Smith from ITSM Academy (subbing in for Jayne Groll) – what is devops culture? a multicultural society of frameworks and tools and standards and whatnot. There’s evangelists and detractors. Need communication to get over the cultural divide. Use the “git r done” scrum, not just for devs. Pair with a kanban board. ITSM is still a good thing and not lame! Let’s get a common dictionary/vocabulary. [Ed: So Gene and co., stop slacking on the devops cookbook!]

Systems theory for enjoyment of AWS – read John Gall’s Systemantics/The Systems Bible. Systems in general work poorly or not at all. Complex systems are always broken somewhere. Some simple services don’t even really work… Start simple and working and grow to complex and working. @whirlycott from Stackdriver (Philip Jacob)

Openspaces

I attended three openspaces on Day 1 afternoon.

Women in DevOps

The first was on getting more women into DevOps/related tech jobs. There were a lot of people and so we didn’t get too deep into any specific area of that. It was noted that benefits and especially maternity leave were super important and a good thing to stress in your job postings. Also that women are likely to not apply to”you must know all these 20 things” job descriptions. Though I’ve known guy engineers that fall into the same trap.  “I only meet 19 of those, I’d best not apply.” Hint as a hiring manager – if you meet like half of the things, you’re well advised to put a resume in!

We churned a bit over the fact that largely, people are hiring by exercising their known-people networks, and since historically more engineers are male, that tends to be self-reinforcing. You can go deliberately look into female-tech boot camps and the like.

In the end, I think the core problem is that we’re working at a fast pace. We put out a job posting (or a call for DevOpsDays presenters, as was brought up as an example) and we look through the responses we get.  If there’s no responses from women, then we can’t include them. But to break through that, we have to take the time to deliberately reach out (and figure out where to reach out to).

There was some talk about “wiping identifying information off” but I think that’s a blind alley.  I’m personally more likely to interview/etc. a woman or minority for an engineering position to try to level the playing field, if all the resumes are “Candidate 26” then so much for that.  I mean, maybe it’s true there’s a lot of old school tech companies out there who are like “wimmen on the front lines with us? Never!” but I have to say I’ve never seen that.

My main takeaway was that we should probably get the female engineers we have on staff, ask them to super-plumb their social networks, and get their views on what aspects of job descriptions/interviews/work environments are or are not attractive to them and double down on it.

Where the Hell are the Product Managers?

The second was one I proposed, entitled “Where the Hell are the Product Managers?” DevOps is nominally about bringing Ops into the agile team that is already a mashup of Product, Development, and QA. But unfortunately, despite 10 years post-agile manifesto, I find that healthy PM embedding into the agile team is honored more in the breach than in the observance.  Furthermore, in terms of owning “nonfunctional” requirements, or God forbid, an entire platform-type product, they tend to not want to do that.

We had a good discussion; some people had good PM engagement and others didn’t. Few had success with PMs doing effective prioritization of nonfunctional requirements and most “platform teams” didn’t have a PM, though some did and reported that it was super awesome. In fact, Bryan Dove from here at Bazaarvoice talked about one team he worked where the designer and marketing person came to colocate with the team as well and it was very effective.

The main takeaway was to continue to try to push the agile practice of crossfunctional, embedded and ideally colocated teams, because the results are so much better. And if one needs to hire more X (PMs, Ops, whatever) so that there can be one per product team, do it.

Running a DevOpsDays

The third was about running a DevOpsDays event.  Since I helped run DevOpsDays Austin I went to that to share the love.  If you’re looking to run one, we’ve made our budget and planning docs and everything available for others to crib from. My short playbook is:

  1. Get around 8 people as organizers, from a mix of companies.  2 will punk out and the other 6 will be able to share the load.
  2. Find a venue, that’s the most important thing.  It’ll give you a capacity and whether you’re planning on charging. We did a free DevOpsDays and had a large (>30%+) no show rate, and then did a $120 DevOpsDays and had a small (<10%) no show rate.
  3. Don’t get fancy.  Nail the hard requirements and then if you get excess sponsor money, add on other goodies.  For DOD Austin we added a band and a movie and more swag and more snacks later as our bank account swelled, but we could have cut off after venue/internet/some food and been done with it.
  4. BMC loves to do the A/V and stream the event! But beware, once they leave after the morning events you’ll be without mikes and stuff.
  5. Patrick Debois has the usual schedule/format for you to use.
  6. Don’t worry about sponsor money, they’re lining up to pay you. It’s more important to set expectations – this isn’t a “high traffic sales leads” event and you don’t get the attendees’ emails – it’s better to send engineers than salespeople, you’re trying to affect influencers.

And that’s Day 1, expanded!

Leave a comment

Filed under Conferences, DevOps

Velocity 2013 Day 3 Liveblog: Retooling Adobe: A DevOps Journey from Packaged Software to Service Provider

Retooling Adobe: A DevOps Journey from Packaged Software to Service Provider

Srinivas Peri, Adobe and Alex Honor, SimplifyOPS/DTO

Adobe needed to move from desktop, packaged software to a cloud services model and needed a DevOps transformation as well.

Srini’s CoreTech Tools/Infrastructure group tries to transform wasted time to value time (enabling tools).

So they started talking SaaS and Srini went around talking to them about tooling.

Dan Neff came to Adobe from Facebook as operations guru from Facebook.  He said “let’s stop talking about tools.” He showed him the 10+ deploys a day at Flickr preso. Time to go to Velocity!  And he met Alex and Damon of DTO and learned about loosely coupled toolchains.

They generated CDOT, a service delivery platform. Some teams started using it, then they bought Typekit and Paul Hammond thought it was just lovely.

And now all Adobe software is coming through the cloud.  They are not the CoreTech Solution Engineering team – who makes enabling services.

Do something next week! And don’t reinvent the wheel.

How To Do It

First problem to solve. There are islands of tools – CM, package, build, orchestration, package repos, source repos. Different teams, different philosophies.

And actually, probably in each business unit, you have another instantiation of all of the above.

CDOT – their service delivery platform, the 30k foot view

Many different app architectures and many data center providers (cloud and trad). CDOT bridges the gap.

CDOT has a UI and API service atop an integration layer  It uses jenkins, rundeck, chef, zabbix, splunk under the covers.

On the code side – what is that? App code, app config, and verification code. But also operations code! It is part of YOUR product. It’s an input to CDOT.

So build (CI).  Takes from perforce/github to pk/jenkins, into moddav/nexus, for cloud stuff bake to an AMI, promote packages to S3 and AMIs to an AMI repo.

For deploy (CD), jenkins calls rundeck and chef server. Rundeck instantiates the cloudformation or whatever and does high level orchestration, the AMis pull chef recipes and packages from S3, and chef does the local orchestration.  Is it pull or push?  Both/either. You can bake and you can fry.

So feature branches – some people don’t need to CD to prod, but they sure do to somewhere.  So devs can mess with feature branches on dev boxes, but then all master checkins CD to a CD environment.  You can choose how often to go to prod.

Have a cool “devops workbench” UI with the deployment pipeline and state. So everyone has one-click self service deployment with no manual steps, with high confidence.

Now, CDOT video! It’s not really for us, it’s their internal marketing video to get teams to uptake CDOT.  Getting people on board is most of the effort!

What’s the value prop?

  • Save people time
  • Alleviate their headaches
  • Understand their motivations (for when they play politics)
  • Listen to and address their fears

Bring testimonials, data, presentations, do events, videos!  Sell it!

“Get out of your cube and go talk to people”

Think like a salesperson. Get users (devs/PMs) on board, then the buyers (managers/budget folks), partners and suppliers (other ops guys).

Leave a comment

Filed under Conferences, DevOps