Monitoring and Observability

Ah, observability, the new buzzword of the day. Monitoring vendors aplenty are using the word, to basically mean “better monitoring!” You know, #monitoringlove not #monitoringsucks. Because monitoring doesn’t help with debugging and doesn’t have app instrumentation right?

Well, I have to say “bah” to that.  So here’s the thing.  I’m an electrical engineer by education, and I spent a lot of time working at National Instruments, an engineering test and measurement company.  You may be surprised to know these terms have actual definitions that don’t require Twitter arguments to discover.

Monitoring is an activity you perform. It’s simply observing the state of a system over a period of time.

Why do we monitor? For three reasons, in general.

  • Problem Detection – you know, alerting, or seeing issues on dashboards.
  • Problem Resolution – root cause and troubleshooting.
  • Continuous Improvement – capacity planning, financial planning, trending, performance engineering, reporting.

How do we monitor?  Well, that’s called instrumentation. You can instrument your systems and get CPU and stuff, you can use synthetic probes, you can use JavaScript bugs to get end user monitoring, you can emit metrics from applications, you can introspect services and apps via whatever parts are exposed (from JMX to nginx stats to sysdig traces), you can take network traces… (Some folks are similarly trying to redefine “instrumentation” to just mean application instrumentation, which is lame, and in defiance of the fact that application performance management tools that do app instrumentation have existed for decades.)

You can instrument metrics or events; metrics have certain sampling frequency and resolution…

So what is observability?  This isn’t a new term. It comes from system control theory. You know, the stuff that makes your A/C system and electrical plants and your car work.

Observability is a measure of how well the internal states of a system can be inferred from knowledge of its external outputs.

Observability is a property of a system. You can monitor a system using various instrumentation, but if the system doesn’t externalize its state well enough that you can figure out what’s actually going on in there, then you’re stuck.

So is observability hippy bullcrap?  No, of course not. In a DevOps world, it’s very important that the apps and systems concentrate on making themselves both observable and controllable (I leave it to the reader to research controllability, unless I get agitated enough to post about that too). Do you make yourself “easy to monitor”?

Externalizing custom metrics contributes to observability (you know, like with dropwizard metrics).  So does good logging.  So does proper architecture!  Take a system that sticks all kinds of messages into one message queue rather than using separate queues for separate types – the latter is more observable; you can more readily see how many of what is flowing through.  (It’s more controllable too, as you can shut off one queue or another.)

Making your system observable is therefore important, so that if you monitor it with appropriate instrumentation, you understand the state of the system and can make short or long term plans to change it.

While a monitoring tool can definitely contribute to this via its innovation in instrumentation, analysis, and visualization, in large part observability is a battle won or lost before you start sticking tools on top of the system. It’s very important to take it into account when designing and implementing services. No tool is going to “give you” observability and that’s the usual silver bullet fallacy heard from someone who wants to sell you something.

I’m not saying every vendor is using the term wrongly (in fact I just came across this New Relic post that is very well done), but I have to say I am less than impressed when common engineering terms are so widely misused and misunderstood widely in our industry.

Would you like to know more?  Peco and I are working on a new lynda.com course on monitoring and observability!  There’ll be real engineering, a broad canvas of the different kinds of monitoring instrumentation, tips on implementation and use… We’ve both been using and/or building monitoring tools for decades now so we hope to have some useful info for you.

Leave a comment

Filed under DevOps, Monitoring

CNCF and K8s 101’s

I never make New Year’s resolutions, but I want to do something different for 2018!

One thing I’m learning a lot about is Kubernetes and the CNCF ecosystem around it over the past couple of years and often find myself having a hard time keeping up with ecosystem sometimes. There are almost weekly releases on the many projects, and getting started content for all the new tools and technology is hard to find.

So! I plan to do quick 101 blogs on different topics under the Container/Kubernetes/CNCF umbrella. My first blog article will be on Prometheus- The monitoring tool that integrates GREAT with k8s! It’ll be based on my GitHub code here: https://github.com/karthequian/prometheus-demo (shhh sneak peak).

But, I need your help! Give me a list of things you are confused about in the container space, or want more info on, and I’ll be happy to do the legwork on it!

So, give me input here, or on twitter!

1 Comment

Filed under Cloud, DevOps, k8s, Monitoring

Released! Learning Kubernetes and K8s: Native Tools

 

I’ve been working on the managed Kubernetes Engine at Oracle as described here by my StackEngine CEO Bob Quillin.

Being knee deep in the Kubernetes and CNCF ecosystem is very exciting, and it reminds me a lot of the early days of the Docker ecosystem. Kubecon in December had a lot going on with a plethora of projects and lots of vendors. In the future, I believe Kubernetes will be the defacto platform that many large enterprises will use as their orchestration and IT platform when they look to modernize their architecture. It is either all Kubernetes, or all cloud native, or serverless, or somewhere in between.

And speaking of K8s, my Lynda courses on Kubernetes just released! I had filmed them late last year at Lynda’s campus in Carpinteria, CA- Learning Kubernetes and Kubernetes: Native Tools!

Learning Kubernetes covers all the information you’ll need to get started using Kubernetes- the concepts, examples, install and everything you’ll need to get started rocking with k8s!

Kubernetes: Native Tools is a shorter course that covers the different tools available in the k8s ecosystem.

Let me know what you think- and, reach out if you have questions or issues! K8s might be overwhelming initially, but stick with it, and it’ll make your container management life so much easier!

9 Comments

Filed under DevOps

DevOpsDays Summit Austin 2018 – “DevOps Unplugged”

Hey all!  We’re starting work on next year’s DevOpsDays Austin – our seventh here in the ATX.  Many of you have come out to the event (or another of the great DevOpsDays around the world). Well, we have some changes in store this year!

Last year’s DevOpsDays Austin, “Monsters of DevOps” was bigger than ever and had a stadium rock theme – we had a huge venue,  all the DevOps VIPs we could pull down (including the first time all 4 authors of the DevOps Handbook managed to get together at an event), multiple content tracks, killer swag, great food, a hackathon, the best Happy Hour I’ve attended at a conference, we invited in and comped local user groups to give talks…  Part of our continuing trajectory to make DoDA more all encompassing and awesome.

But – every year we sit down and discuss vision before we launch into the conference.  What do we want to accomplish and why?  Who are we serving and why?  Why are we, personally, putting in huge amounts of unpaid work to serve the community? “Because it’s there and we did it last year” isn’t a good answer, so we like to really put some thought into it.

This time when we talked about it, first in our core group and then with the rest of the 2017 organizers, we realized that we’ve been concentrating on “bigger” but we’ve been putting more and more money and effort into the parts of the event that aren’t really of high DevOps value. Here in Texas, it’s easy to conflate bigger with better, since we’re both the biggest and the best!  But we’re not sure that’s right. Many of the more expert people we know here in Austin don’t really come out to the event any more, unless they are giving a talk or recruiting for their current gig.  Talks and openspaces have kept focused on introducing new people to DevOps, enterprise folks, “horses and donkeys,” and so on.

And as we talked, we said “Well – what do we personally get out of the conference nowadays as attendees?”  The answer was “not much.” Openspaces are huge and end up being a couple people talking.  Talks are either pretty familiar from the conference circuit or also designed for new folks.  We have more content but it’s more passive content, sit and watch.  It’s good for the newbies but not as much for the experienced folks.

We contrasted this to the first couple DevOpsDays we went to in Silicon Valley.  The first couple were just in a big auditorium at LinkedIn.  There weren’t any sponsor booths. More of the event was focused on the openspaces and interaction between the highly driven participants. We ate box lunches wherever we could perch in the parking lot outside – and swag was just a t-shirt.  Heck, the third one was in a weird abandoned building Dave Nielsen had access to, we had to carry our own chairs around to talks and the food and stuff was in a concrete-and-cage loading dock. But it’s those events we got the most out of.

Therefore, this year DevOpsDays Austin is going to go to what we call a “Summit” format.  We’re reducing the size of the event, and focusing more on local, motivated practitioners.  What does this mean?

  1. No sponsor tables.  We’d love sponsors to participate, but in recent years we’ve gotten more folks who have either just sent aggressive marketers, or sent people we enjoy and then locked then down behind tables. So we’ve come up with a sponsorship package that gets them exposure and value but lets them actually participate in the event.  Folks that just want to churn leads will self-select out.  The sponsorships are less expensive, and we’ll just have venue food etc. instead of premium.
  2. No preselected talks.  Well, OK, maybe we’ll have one keynote a day.  But I went to a ProductCamp here in Austin and they did something brilliant – they had a RFC but don’t do a final selection – finalists show up and the audience votes on what talks they want to hear (kinda like openspaces but more prepared).  This means people who say ‘well… I’ll come to your event if I can talk (or sponsor if I can talk, or…)’ will self-select out. You come because you want to be here, and you can give a talk!
  3. Smaller headcount.  We’re lowering the cap (including sponsors and organizers and volunteers) to 400. We’re going to get openspaces to be the kind of highly engaged discussions that make the so valuable.  We’re going to be up front with people that attendees are expected to engage.  DoD used to be the only thing around to learn from.  But now, if you’re an enterprise person that wants to have some DevOps talked at them – you have  variety of options now, like you can go to DevOps Enterprise Summit (also a great event), or to another DevOpsDays like the one in Dallas using the conference format, or one of a dozen events either completely DevOps or DevOps-tracked.  But for here in Austin this year, we need something where the unicorns can also have an event meaningful to them, so they can gather and refresh on what’s going on. Not to say only “unicorns” are welcome, but frankly we’d prefer people only come out if they intend to discuss, share, and engage; this will not be a passive-learning friendly event.
  4. No streaming.  Every year we put a lot of work and money into live-streaming and/or recording the event.  But it’s often problematic, and doesn’t get viewed a lot – there’s so much content out there now.  But even worse, we end up having to degrade the experience of real attendees around the requirements of broadcast – space, money, schedule, the presenter has to stay in a little box… So we’re not going to do it.  You want to participate – come out and participate.

But How Can This Work???

That was everyone’s initial reaction to this plan.  But that’s silly – it has worked.  We’re just doing things that DevOpsDays has already done, that ProductCamp has already done, and so on. It’s just not what’s become customary.  After the organizers had a little time for it to sink in, they all rallied behind it with a vengeance.

We’ve run the numbers and just the basic $200/head attendee fees can pay for the venue, basic food, and a shirt, even if we get zero sponsors.  (We won’t have zero sponsors, we just put our sponsor page up and someone bought in the first hour it was live.) As we get more funding we’ll pump up the event, but deliberately focus on the core experience of highly skilled techies learning from each other, instead of adding distractions.

How Dare You Dis My Format???

This is the format we’d like to try this year.  Other events will use other formats and that’s fine. Here at DoDA we try something different every year!  We were the first to have multiple content tracks (over the complaints of some purists).  We added a hackathon, we added a local user group track… Last year we went big with a vengeance, and it was cool.  Now we’re going to do more small and exclusive, and that’ll be cool.  Next year, it’ll be different. Whatever your event is doing, more power to you, don’t confuse us having a vision we believe in with us thinking you’re “wrong.”

Come on down!

We’d love to see everyone out at DevOpsDays Austin 2018!  Come ready to interact and share.  Come ready to give a talk, with the risk it won’t make.  Come sponsor your company, just you won’t have a table to lounge at. This change has gotten us excited about running our seventh DevOpsDays, and we bet you’ll love it!

9 Comments

Filed under Conferences, DevOps

Assigning Fault To Human Error Is A Human Error

fitz-1024x683

We all know from DevOps blameless retrospective wisdom that there is no such thing as a single “root cause.”  One of the most common root causes people like to assign blame to is “human error”.  Not to mince words, this is usually political, buck-passing CYA of the highest order.

I just read a great article on the recent U.S. Navy ship collision issues I wanted to pass on.  If you have been keeping up with the news, there has been a rash of Navy ships colliding with other ships causing fatalities. When you go Google it up, you see a whole bunch of “Navy attributes it to human error…”

But now go read this article, Something’s Wrong In The Surface Fleet And We’re Not Talking About It.  It’s written by Capt. Michael Junge, an experienced Naval officer. The TL;DR is that you can say “human error” all you want, fire someone, and call it case closed, but these accidents are a systemic amount of understaffing of Naval surface ships and massive undertraining and maintenance that is a leading indicator of even worse to come should an actual wartime deployment be necessary.

Even in engineering, we are tempted to push the problem down onto the person that made a mistake.  Fully engaging with the system that caused the need for the action that caused the mistake, the lack of validation that makes mistakes possible, and so on is hard thinkin’.  It is threatening when people point out flaws in processes and systems and code you had a hand in.  But the only way to actually improve your situation is to soberly assess what the actual contributors to issues are, and work towards fixing them.

4 Comments

Filed under DevOps

LASCON 2017 Conference Notes

Well, last Thursday and Friday I went to LASCON, our local Austin application security convention! It started back in 2010; here’s the videos from previous years (the 2017 talks were all recorded and should show up there sometime soon.  Some years I get a lot out of LASCON and some I don’t, this one was a good one and I took lots and lots of notes!  Here they are in mildly-edited format for your edification.  Here’s the full schedule, obviously I could only go to a subset of all the great content myself.  They pack in about 500 people to the Norris Conference Center in Austin.

Day 1 Keynote

The opening keynote was Chris Nickerson, CEO of LARES, on pen testing inspired thoughts.  Things I took away from his talk:

  • We need more mentorships/internships to get the skills we need, assuming someone else is going to prep them for us (school?) is risible
  • Automate and simplify to scale and enable lower skill folks to do the job – if you need all security geniuses to do anything that’s your fault
  • There’s a lack of non made up measurements – most of the threat severities etc. are in the end pure judgement calls only loosely based on objective measures
  • Testing – how do we know it’s working?
  • How do all the tools fit together? Only ops knows… 2017-10-26 09.43.34.jpg
  • Use an attack inventory and continually test your systems
  • Red team automation plus blue team analytics gives you telemetry
  • Awareness of ego:2017-10-26 09.49.18.jpg

Security for DevOps

2017-10-26 10.19.27

Then the first track talk I went to was on Security for DevOps, by Shannon Lietz, DevSecOps Leader at Intuit. She’s a leader in this space and I’ve seen her before at many DevOps conferences.

Interesting items from the talk:

  • Give security defects to your devs, but characterize adversary interest so they can prioritize.
  • Reduce waste in providing info to devs.
  • 70-80% of bad guys return in 7 days – but 20% wait 30d till your logs roll

She likes to use the killchain metaphor for intrusion and the MITRE severity definitions.2017-10-26 10.24.58

But convert those into “letter grades” for normal people to understand!  Learn development-ese to communicate with devs, don’t make them learn your lingo.2017-10-26 10.36.15
Read the Google Beyondcorp white papers for newfangled security model:
1. zoning and containment
2. Asset management
3. Authentication/authorization
4. Encryption

Vendors please get to one tool per phase, it’s just too much.

2017-10-26 10.48.52.jpg
Other things to read up on…

Startup Security: Making Everyone Happy

2017-10-26 11.14.29By Mike McCabe and Brian Henderson of Stratum Security (stratumsecurity.com, github.com/stratumsecurity), this was a great talk that reminded me of Paul Hammond’s seminal Infrastructure for Startups talk from Velocity. So you are getting started and don’t have a lot of spare time or money – what is highest leverage to ensure product security?

They are building security SaaS products (sold one off already, now making XFIL) and doing security consulting. If we get hacked no one wants our product.

The usual startup challenges – small group of devs, short timelines, new tech, AWS, secrets.

Solutions:

  • Build security in and automate it
  • Make use of available tools, linters, SCA tools, fuzzing
  • Continuous testing
  • AWS hardening
  • Alerting
  • Not covering host security, office security, incident response here
    2017-10-26 11.24.12

They use AWS, codeship, docker (benefits – dev like in prod, run tools local, test local). JavaScript, golang, no more rust (too bleeding edge). Lack of security tooling for the new stuff.

Need to not slow down CI, so they want tooling that will advise and not block the build. The highest leverage areas are:

  • Linting – better than nothing. ESLint with detect-unsafe-regex and detect-child-process. Breaks build. High false positives, have to tweak your rules. Want a better FOSS tool.
  • Fuzzing – gofuzz based on AFL fuzz, sends random data at function, use on custom network protocols
  • Source code analysis – HP Gas
  • Automated dynamic testing – Burp/ZIP
  • Dependency checking. Dependencies should be somewhat researched – stats, sec issues (open/closed and how their process works)
  • Pull requests – let people learn from each other

Continuous integration – they use codeship pro and docker
Infrastructure is easy to own – many third party items, many services to secure

AWS Tips:

  • Separate environments into AWS accounts
  • Don’t use root creds ever
  • Alert on root access and failed logins with cloudwatch. [Ed. Or AlienVault!]
  • All users should use MFA
  • Rigorous password policy
  • Use groups and roles (not direct policy assignment to user)
  • Leverage policy conditions to limit console access to a single IP/range so you know you’re coming in via VPN
  • Bastion host – alert on access in Slack
  • Duo on SSH via PAM plugin
  • Must be on VPN
  • Use plenty of security groups
  • AWS alering on failed logins, root account usage, send to slack

See also Ken Johnson’s AWS Survival Guide

Logging – centralize logs, splunk/aws splunk plugin (send both direct and to Cloudwatch for redundancy), use AWS splunk plugin.

Building the infrastructure – use a curated base image, organize security groups, infra as code, manage secrets (with IAM when you can). Base image using packer. Strip down and then add splunk, cloudwatch, ossec, duo, etc. and public keys. All custom images build off base.

Security groups – consistent naming. Don’t forget to config the default sec group even if you don’t intend to use it.

Wish we had used Terraform or some other infrastructure as code setup.

Managing secrets – don’t put them in plain test in github, docker, ami, s3. Put them into KMS, Lambda, parameter store, vault. They do lambda + KMS + ECS. The Lambda pulls encrypted secrets out of s3, pushes out container tasks to ecs with secrets. See also “The Right Way To Manage Secrets With AWS” from the Segment blog about using the new Parameter Store for that.2017-10-26 11.42.38
Next steps:

  • more alerting esp. from the apps (failed logins, priv escalation)
  • terraform
  • custom sca (static analysis)
  • automate and scale fuzzing maybe with spot instances

Security is hard but doesn’t have to be expensive – use what’s available, start from least privilege, iterate and review!

Serverless Security

2017-10-26 13.54.30

By fellow Agile Admin, James Wickett of Signal Sciences.  Part one is introducing serverless and why it’s good, and then it segues to securing serverless apps halfway in.

Serverless enables functions as a service with less messing with infrastructure.

What is serverless? Adrian Cockroft – “if your PaaS can start instances in 20ms that run for half a second, it’s serverless.” AWS Lambda start time is 343 ms to start and 84 ms on subsequent hits, not quite the 20ms Cockroft touts but eh. Also read https://martinfowler.com/articles/serverless.html and then stop arguing about the name for God’s sake.  What’s wrong with you people.  [James is too polite to come out and say that last part but I’m not.]

Not good for large local disk space, long running jobs, big IO, super super latency sensitive. Serverless frameworks include serverless, apex, go sparta, kappa. A framework really helps. You get an elastic, fast API running at very low cost. But IAM is complicated.

So how to keep it secure?

  • Externalize stuff out of the app/infra levels – do TLS in API gateway not the app, routing in API gateway not the app.
  • There’s stack element proliferation – tends to be “lambda+s3+kinesis+auth0+s3+…”
  • Good talk on bad IAM roles – “Gone in 60 seconds: Intrusion and Exfiltration in Serverless Architectures” – https://www.youtube.com/watch?v=YZ058hmLuv0
  • good security pipeline hygeine
  • security testing in CI w/gauntlt
  • DoS challenges including attack detection…
  • github/wickett/lambhack is a vulnerable lambda+api gateway stack like webgoat. you can use it to poke around with command execution in lambda… including making a temp file that persists across invocations
  • need to monitor longer run times, higher error rate occurrences, data ingestion (size), log actions of lambdas
  • For defense: vandium (sqli wrapper), content security policies

And then I was drafted to be in the speed debates!  Less said about that the better, but I got some free gin out of it.

Architecting for Security in the Cloud
2017-10-27 10.18.40

By Josh Sokol, Security Spanker for National Instruments! He did a great job at explaining the basics. I didn’t write it all down because as an 3l33t Cloud Guru a lot wasn’t new to me but it was very instructive in reminding me to go back to super basics when talking to people.  “Did you know you can use ssh with a public/private key and not just a password?” I had forgotten people don’t know that, but people don’t know that and it’s super important to teach those simple things!

  • Code in private GitHub repo
  • Automation tool to check updates and deploy
  • Use a bastion to ssh in
  • Good db passwords
  • Wrap everything in security groups
  • Use vpcs
  • Understand your attack surfaces – console, github, public ports
  • Analyze attack vectors from these (plus insiders)
  • Background checks for employees
  • Use IAM, MFA, password policies
  • Audit changes
  • The apps are the big one
  • Https, properly configured
  • Use an IPS/WAF
  • Keys not just passwords for SSH
  • Encrypt data before storing in db

Digital Security For Nonprofits

2017-10-27 10.58.21

2017-10-27 11.00.23

Dr. Kelley Misata was an MBA in marketing and then got cyber stalked.  This led to her getting an InfoSec Ph.D from Spaf at Purdue! Was communications director for Tor, now runs the org that manages Suricata.

Her thesis was on the gap of security in nonprofits, esp. violence victims, human trafficking. And in this talk, she shares her findings.

Non-profits are being targeted for same reasons as for-profits as well as ideology, with int’l attackers. They take money and cards and everything like other companies.
63% of nonprofits suffered a data breach in a 2016 self report survey.  Enterprises vet the heck out of their suppliers… But hand over data to nonprofits that may not have much infosec at all.

ISO 27000, Cobit 5… normal people don’t understand that crap. NIST guidance is more consumable – “watered down” to the infosec elite but maps back to the more complex guidelines.

She sent out surveys to 500 nonprofits expecting the normal rate of return but got 222 replies back… That’s an extremely high response rate indicating high level of interest.
Nonprofits tend to have folks with fewer tech skills, and they more urgent needs than cyber security like “this person needs a bed tonight.”  They also don’t speak techie language – when she sent out a followup a common question was “What does “inventory” mean?”

90% of nonprofits use Facebook and 53% use Twitter.  They tend to have old systems. Nonprofit environments are different because what they do is based on trust. They get physical security but don’t know tech.

2017-10-27 11.21.16.jpgThey are not sure where to go for help, and don’t have much budget. Many just use PayPal, not a more general secure platform, for funds collection. And many outsource – “If we hand it off to someone it must be secure!”

The scary but true message for nonprofits is that it’s not if but when you will have a breach. Have a plan. Cybersecurity insurance passes the buck.

You can’t be effective if you can’t message effectively to your audience. She uses “tinkerer” not hacker for white hats, because you can complain all you want about “hacker not cracker blah blah” but sorry, Hollywood forms people’s views, and normal people don’t want a “hacker” touching their stuff period.

Even PGP encrypting emails, which is very high value for most nonprofits, is ridiculously complicated for norms.

What to do to improve security of nonprofits? Use an assessment tool in an engaging way. Help them prioritize.
She is starting a nonprofit, Sightline Security for this purpose. Check it out! This was a great talk and inspires me to keep working to bring security to everyone not just the elite/rich – we’re not really safe until all the services we use are secure.

2017-10-27 11.42.09.jpg

Malware Clustering
2017-10-27 13.03.01

By Srini (Srivathsan Srinivasagopalan), a data scientist from my team at AlienVault!

Clustering malware into groups helps you characterize how families of it work, both in general and as they develop over time.

To cluster, you need to know what behavior you want to cluster on, it’s too computationally challenging to tell the computers “You know… group this stuff similarly.”

You make signatures to match samples on that behavior. Analyzed malware (like by cuckoo) generally gives you static and dynamic sections of behavior you can use as inputs. There’s various approaches, which he sums up.  If you’re not into math you should probably stop reading here so as to not hurt yourself.

To hash using shingling – concatenate a token sequence and hash them.2017-10-27 13.12.07.jpg
Jaccard similarity is computationally challenging.
Min-hashing2017-10-27 13.28.39
Locality sensitive hash based clustering

Hybrid approach: corpus vectorization

2017-10-27 13.37.16
Next…Opscode clustering! Not covered here.

TL;DR, there’s a lot of data to be scienced around security data, and it takes time and experimentation to find algorithms that are useful.

Cloud Ops Master Class

2017-10-27 14.00.48By @mosburn and @nathanwallace
Trying to manage 80 teams and 20k instances in 1 account – eek!  Limits even AWS didn’t know about.
They split accounts, went to bakery model. Workload isolation.
They wrote tooling to verify versions across accounts. It sucked.
Ride the rockets – leverage the speed of cloud services.
Change how the team works to scale – teach, don’t do to avoid bottlenecking. App team self serves. Cloud team teaches.

2017-10-27 14.29.04.jpgPolicies: Simple rules. Must vs should. Always exceptions.
The option requirement must be value in scope.
Learn by doing. Guardrails – detect and correct.
2017-10-27 14.29.10Change control boards are evil – use policy not approval.
Sharing is the devil.
Abstracting removes value – use tools natively.

  • Patterns at scale
  • Common language and models
  • Automate and repeat patterns
  • Avoid custom central services
  • Accelerate don’t constrain
  • Slice up example repos
  • Visibility
  • Audit trail
  • Git style diff of infra changes
  • Automate extremely – tickets and l1-2 go away
  • All ops automated, all alerts go to apps so things get fixed fast

He’s created Turbot to do software defined ops – https://turbot.com/features/

  • Cross account visibility
  • Make a thing in the console… then it applies all the policies. Use native tools, don’t wrap.
  • Use resource groups for rolling out policies
  • Keep execution mostly out of the loop

2017-10-27 14.22.32.jpg

And that was my LASCON 2017! Always a good show, and it’s clear that the DevOps mentality is now the cutting edge in security.

Leave a comment

Filed under Conferences, Security

Java Docker Pull Travails

Just had a problem that I thought I’d document the solution to for the world…

In our build pipeline at work, we use maven and the fabric8 docker-maven-plugin to manage our builds.  We love it, developers can just “mvn install” locally and then the Atlassian Bamboo build system just “mvn deploy”s in the exact same way.

Well, so we had some builds that suddenly weren’t able to pull the base images specified in our Dockerfiles down from Dockerhub, breaking the build with 500 error messages like:

[ERROR] DOCKER> Unable to pull 'library/debian:sid' from registry 'docker.io' : received unexpected HTTP status: 500 Server Error (Internal Server Error: 500) [received unexpected HTTP status: 500 Server Error (Internal Server Error: 500)]

But it worked fine on our local box. And it could pull our custom images from Artifactory fine. What’s the problem here?  Bamboo?  The plugin? Well, some helpful community folks helped home in on it, it turns out that for some versions of Java 1.8, 8u131 and prior at least going back to 112, where there’s some problem (TLS? Root certs? Not really sure) that messes up when pulling a docker.io container from inside Java during our docker build step.  My team’s microservices aren’t Java based so the Java version doesn’t come up much – but of course maven uses Java.

Upgrading the JDK version to 8u144 made the problem go away.  We actually have an up to date curated Java version we use in Bamboo for our Java builds, but folks doing Python builds were just using the default “JDK 1.8” that Atlassian is putting on their Bamboo build agent AMI, which is of course old and suffers from this issue.

 

Leave a comment

Filed under DevOps