Understanding and Preventing Computer Espionage – TRISC 2010

Talk given at TRISC 2010 by Kai Axford from Accretive

Kai has delivered over 300 presentations and he is a manager at an IT solutions company.  Background at Microsoft in security.

Kai starts with talking about noteworthy espionage events:

  • Anna Chapman.  The Russian spy that recently got arrested.  Pulls up her facebook and linkedin page.  Later in the talk he goes into the adhoc wireless network she setup to transfer files to other intelligence agents.
  • Gary Min (aka Yonggang Min) is a researcher at DuPont.  He accessed over 22,000 abstracts and 16,706 documents from the library at DuPont.  He downloaded 15x more documents than anyone else.  Gary was printing the documents instead of transferring on a USB.  Risk to DuPont was $400,000,000.  He got a $30,000 fine.
  • Jerome Kerviel was a trader that worked in compliance before he started abusing the company. Stock trading and was using insider knowledge to abuse trading.

Cyberespionage is a priority in China’s five-year plan.  Acquire Intellectual Property and technology for China.  R&D is at risk from tons of international exposure.  Washington Post released a map of all top secret government agencies and private in the US.  http://projects.washingtonpost.com/top-secret-america/map/

Microsoft and 0-day from SCADA is another example.  SCADA is a very dangerous area for us.  2600 did a recent piece about SCADA.

Lets step back and take a look at the threat.  Insiders.  They are the ones that will take our data and information.  The insider is a greater risk.  We are all worried about the 17-year old kid in Finland, but it is really insiders.

There is a question of, if you gave your employees access to something, are they ‘breaking in’ when they access a bunch of data and take it home with them?

Types of users:

  • Elevated users who have been with the company for a long time
  • Janitors and cleaning crew
  • Insider affiliate (spouse, girlfriend)
  • Outside affiliate

Why do people do this?

  • Out of work intelligence operators world wide
  • Risk and reward is very much out of skew.  Penalties are light.
  • Motivators: MICE (Money, Ideology, Coercion, Ego)

In everyone that does espionage, there is a trigger.  There is something that makes it happen.  Carnegie-Mellon did some research stating that everyone who was stealing data had someone else that knew it was happening.

Tools he mentioned and I don’t know where else to mention them:

  • Maltego
  • USB U3 tool.  Switchblade downloads docs upon plugin.  Hacksaw sets up smtp and stunnel to send out all the docs outbound of the computer.
  • Steganography tools >  S-Tools.  This is what Anna was doing by putting info in images.
  • Cell phone bridge between laptop and network.
  • Tor

Mitigate using:

  • defense in depth
  • Background checks, Credit Check,
  • gates, guards, guns,
  • shredding and burning of docs
  • clean desk policy
  • locks, cameras
  • network device blocking
  • encryption devices
  • Application Security.
  • Enterprise Rights Management.
  • Data classification.

1 Comment

Filed under Conferences, Security

Mac OSX Forensics (and security) – TRISC 2010

This talk was presented by Michael Harvey from Avansic.

This has little to do with my day job, but I am big fan of Mac and have really enjoyed using it for the last several years both personally and professionally.  Security tools are also really great for the Apple platform, which I use using Mac Ports: nmap, wireshark, fping, metasploit… Enough about me, on to the talk.

There is a lot of objection about doing forensics on Macs and it is really needed, but in reality it is about 10% of the compute base and a lot of higher level officers in a company are using Macs because they can do whatever they want and aren’t subject to IT restrictions.

Collection of data is the most important.  In a Mac, just pulling the hard drive can be difficult.  Might be useful to pre-download the PDFs on how to do this.  You want to use a firewire write-blocker to copy the drive.  Live CDs (Helix and Raptor LiveCD) lets you copy the data and write block.  Michael really likes Raptor because of its support for legacy Macs and Intel based Macs.  In a follow-up conversation with him he emphasized how Raptor is great for people that don’t do forensics all the time.

Forensics cares about Modified, Accessed, Created time stamps.  Macs add on a time stamp called “Birth Time.”  This is the real created date.  Look at the file properties.  You can use SleuthKit (Open Source forensics tool) to assemble a timeline with M-A-C and Birth Time.

Macs use .plist files in lieu of the windows registry that most people are familiar with. “Property List” files.  These can be ASCII, XML and Binary.  ASCII is pretty rare these days for plist files.  Macs more often dont use the standard epoch unix time and instead uses Jan 1, 2001.  Michael is releasing information on plist format.  Right now there is not a lot of documentation on it.  Plist is more or less equivalent to the windows registry.

Two ways to analyze plist: plutil.pl and Plist Edit Pro.

Dmg files.  Disk images, similar to iso or zip files.  Pretty much a dmg file is crucial for using a Mac.  We can keep an eye out for past-used dmg files to know what has been installed or created…

SQLite Databases.  Lightweight SQL database.  This is heavily used by Firefox, iPhone, or Mac apps.  This is real common on Macs.

Email.  Email forensics will usually come in three flavors: MS Entourage (Outlook), Mail.app, and Mozilla Thunderbird.  A good tool for this is Emailchemy and is forensically sound.  It takes in all the formats.

Useful plist File Examples to look at for more info

  • Installed Applications: ~/Library/Preferences/com.apple.finder.plist
  • CD/DVD Burning: ~/Library/Preferences/com.apple.DiskUtility.plist
  • Recent Accessed Docuents, Servers, and Aplications: ~/Library/Preferences/com.recentitems.plist
  • Safari History: ~/Library/Preferences/com.apple.Safari.plist
  • Safari Cache: ~/Library/Preferences/com.apple.Safari/cache.db
  • Firefox: didn’t get this one

Forensic Software that you can use

  • AccessData FTK3
  • Mac Forensics Lab
  • Sleuth Kit (great timeline)
  • Others exist

In conclusion, Mac OSX Investigations are not that scary.  Be prepared with hard drive removal guides and how to extract data off of them.  The best forensic imaging tool out there should be chosen by hardware speed (and firewire), write-blocking capabilities, ability to use dual-core.  You need to know your tools handle HFS+, Birthed times, plist files, dmg files, SQLite Databases.

Audience member asked about harddrive copying tool.  Michael recommends Tableau (sp?).

Here are some resources:

  • Apple Examiner – appleexaminer.com
  • Mac Forensics Lab Tips – macforensicslab.com
  • Access Data – accessdata.com
  • Emailchemy – weirdkid.com/products/emailchemy

File Juicer.  Extracts info from databases used by browsers for cache.  Favicons are a good browser history tool…  You can point File Juicer at a SQLite Database or .dmg files.

Also, talking with Michael afterward ended with two book recommendations: SysInternals for Mac OSX and Mac OSX Forensics (unsure of title but it includes a DVD).
All in all, a really interesting talk and I look forward to seeing what else Mike produces in this arena.

Leave a comment

Filed under Conferences, Security

Pen Testing, DNSSEC, Enterprise Security Assessments – TRISC Day 1 Summary

Yesterday’s TRISC event had some great talks. The morning talks were good and were higher-level keynotes that, to be honest, I didn’t take good notes on. The talk on legal implications for the IT industry was really interesting. I was able to talk with Dr. Gavin Manes (a fellow Oklahoman) about legal implications of cloud computing and shared compute resources. In the old days, a lawyer was able to get physical access to the box and use it as evidence but it sounds like with the growth of SaaS that the courts don’t expect have to have physical box access but the law seems to be 5 to 10 yrs behind on this and it could backfire on us.

The three classes I attended in the afternoon are added below. Some of the notes are only partially complete, so take it for what it is: notes. Interspersed with the notes are my comments, but unlike my astute colleague Ernest, I didn’t delineate my comments with italics. So, pre-apology to any speakers if they feel like I am putting words there that they didnt say. If there are any incorrect statements, please feel free to leave a comment and I will get it fixed up, but hopefully I captured the sessions in spirit.

Breaking down the Enterprise Security Assessment by Michael Farnum

Michael Farnum did a great job with this session. If you want to follow him on twitter, his id is @m1a1vet and he blogs over at infosecplace.com/blog.

External Assessments are crucial for compliance and really for just actual security. We can’t be all about compliance only. One of the main premises of the talk is to avoid assumptions. Ways to do that in the following categories are below.

In Information Gathering check for nodes even if you think they don’t exist:

  • Web Servers. Everything has a web server nowadays. Router, check. Switch, check. Fridge, check.
  • Web Applications and URLs
  • Web app with static content (could be vulnerable even if you have a dummy http server). Might have apps installed that you didn’t even know (mod_php)
  • Other infrastructure nodes. Sometimes we assume what we have in the infrastructure… Don’t do that

In addition to regular testing, we need to remember wireless and how it is configured. Most companies have a open wireless network that goes just to the internet. The question that needs to be addressed in an assessment is: is it really segmented? For this reason we need to make sure that wireless has an IDS tied to it.

Basic steps of any assessments are identification and penetration. We don’t need to always penetrate if we have the knowledge of what we are doing but we do need to make sure that we identify properly.  No use in penetrating if you can show that the wireless node allows WEP or your shopping cart allows non-https.

Culture issues are also something that we need to watch out for. Discussing security assessments with Windows and Linux people generally ends with agreeable and disagreeable dialogs respectively when talking with contractors and vendors.

Doing Network Activity Analysis

  • Threat > malicious traffic – Actually know what the traffic is
  • Traffic > policy compliance – don’t assume that the tools keep you safe
    Applications
  • Big security assumptions. Not internally secured apps. Too much reliance on firewalls. CSRF, XSS and DNS Rebinding work w/o firewalls stopping them.
  • Browsers need to be in scope

What is up with security guys trying to scare people with social engineering? Michael says why bother doing social engineering if you don’t have a security training and awareness program. He guarantees you will fail. Spend the money elsewhere.

The Gap Analysis of physical security includes video and lighting. The operations team will probably hate you for it though. Getting into “their” area… Be careful when testing physical security (guards, cameras, fences) w/o involving physical ops team.

Reviews and interviews need to happen with developers, architecture team, security coverage, and compliance. At the end of an assessment, you need to do remediation, transfer knowledge with with workshops, presentations, documentation, and scheduling a verification testing to make sure it is fixed. While it makes more money to do point in time evaluation without follow-up (because you can do the same review next year and say, “yep, its still broken and you didn’t fix it) it is better to get your customers actually secure and verify that they take the next steps.

Actual Security versus Compliance Security.

DNSSEC: What you don’t know will hurt you by Dean Bushmiller

This talk was very interesting to me because of my interest with DNS and DNS Rebinding.  Dean passed out notes on this, so my notes are a little light, however I will see if I can post his slides here. But here are my notes for further research.

Read the following RFCs: DNS 1034, 1035 and DNSSEC 4033, 4034, 4035.

One of the big takeaways is that DNSSEC is meant to solve the integrity issues with DNS and does not solve confidentiality at all. It just verifies integrity.

All top-level domains are signed now, so when reading DNSSEC material online, ignore the island talk. A good site to check out is root-DNSSEC.org.

DNSSEC works by implementing PKI. One of the problems that people will face is key expiration. Screw that up and your site will be unavailable. Default is 30-day expiration period.

DNSSEC has a nonexistent domain protection. Subdomains are chained together in a circular logic and there is no way for a bad guy to add in a subdomain in the middle. This dumps all subdomains… All of them. All domains are enumerated and could make it easier for a malicious user to look at all your subdomains. They can already do this now, but this should prevent injection of bad subdomains into your domain.

An Introduction to Real Pen Testing: What you don’t learn at DefCon by Chip Meadows

What is a Penetration Test?

  • Authorized test of the target (web app, network, system)
  • Testing is the attempt to exploit vulnerabilities
  • Not a scan, but a test
  • Scanners like Saint and Nessus are part of a test but they are not the test, they are just a scan

Why Pen Test?

  • Gain a knowledge of the true security posture of the first
  • Satisfy regulatory requirements
  • Compare past and present

PCI is not the silver bullet. Doesn’t really keep us secure.

Chip had a lot of other points that mimicked Michael Farnum’s earlier talk and they have been redacted from here, but he did mention the following tools and link that are also great for security guys to check out.

Testing Tools

  • fling -ag ip add > Feed into scanner
  • Hydra
  • Msswlbruteforcer
  • ikescan
  • nikto
  • burpsuite
  • dir buster
  • metasploit
  • firewalk
  • burp suite

http://vulnerabilityassessment.co.uk/Penetration%20Test.html

Wrap up

The talk that I was most interested in was the DNSSEC talk, but the most useful talks for most people are the security assessments and pen testing talks.  I have been thinking about writing a talk on Agile Security and about how to integrate security with Agile development methods.  Look for that in the near future.

One other note,  I am testing my new setup made just for conferences. Well, I can use it for other things too, but I always worry about ‘open’ networks at hotels especially at security conferences. What I have done is setup dd-wrt on my home router with OpenVPN running on it as well.  From my laptop (Mac Pro) I run Tunnelblick and get a VPN connection back home.  This is cool because if someone is watching the traffic they will just see an encrypted stream from my laptop.  That way, I don’t have to worry about whether or not they have WPA or just a plain open connection.  All my traffic is encrypted at that point.  OpenVPN was a little difficult to get setup and I found a lot of conflicting documentation, let me know and maybe I can piece together some instructions for the blog.

1 Comment

Filed under Conferences, Security

TRISC 2010 – Texas Regional Infrastucture Security Conference

TRISC starts today. Only one of the Agile Admins is up in Dallas today, and there are some pretty good speakers lined up today with some really interesting talks.

I am looking forward to talks on DNSSEC, Pen Testing, and a talk from Robert Hansen.

Stay tuned for more TRISC coverage and in the interim, feel free to follow the coverage on my twitter account.

Leave a comment

Filed under Conferences, General, Security

DevOps Time!

All right!  After the last three days of Velocity 2010, we’ve talked a lot about ops and even hinted at devops, although often in a “recycled from previous Velocity” fashion.  But today it’s time to mainline it with DevOpsDays!

I’m going to be too busy actually participating to do full writeups like I did from Velocity, but I’ll distill down the best takeaways and bring them here as soon as I can.  If you just can’t wait and aren’t here, follow along on twitter at #devopsdays!

Leave a comment

Filed under DevOps

Velocity 2010 – Facebook Performance Shenanigans

Pipelining, Progressive Enhancement, and More: Making Facebook Twice as Fast by Jason Sobel (Facebook), Changhao Jiang (Facebook)

It’s the last session of Velocity already! The companies are tearing down their booths, people are escaping to the airport. Today went really, really fast. The room is still mostly full though!

As we’ve heard before, they have loads of users.  They have a central performance team but also distributed and embedded throughout the company.

The core site speed team started working on PHP speed.  Then they read Steve Souder’s book and realized “Oh, crap…”

They are working on a “perflab” to measure performance impacts of all changes.  And detect regressions.

What are the three things they measure at Facebook?

  1. Server time
  2. Network time
  3. Client/render time

What are we optimizing for?  Shouldn’t be any of those three.  Optimize for people.  That doesn’t even mean end user response time – that means impression of performance.

How fast is Facebook?  Well, determine what the core of the experience is.  What do people look at first, what defines the experience?  Lazy load the rest of that crap.

Metric: Time To Interact (TTI).  It’s a very custom metric.  When is the user getting value out of the site?  This is subjective and requires you to really know your users.

For this, the critical pieces have to be there and have to WORK – you can’t just display it and have the functionality not there yet.  You can’t pick visible but not functional yet.

Techniques used to speed things up:

  1. Early flush.  Get them a list of the crucial elements.
  2. Components.  Pages used the same components with different names, the color blue was defined a thousand times.  Make a reusable set of visual components that can appear on any page and share the same CSS rules.  Besides enforcing visual standards, you can optimize them and then reuse them.  Theirs are a grid, an image block, some buttons, page headers…
  3. JavaScript!  We love it, but it is hard.  They wrote a lot before they knew what they were doing.  They have something called “primer.”  There’s a simple JS library that lives in the head and can bootstrap the rest of the javascript and respond to simple stuff devs were writing over and over again.  An event handler that can do a popup, get and insert content, or do a form submit.  And go get other javascript.  Then you tag something with a rel=”dialog” and it pops a dialog.  And once the page is done you can go get the stuff instead of making it on demand.  “async” gets content. In the feedback interface; Like and View and Delete use it.
  4. BigPipe is an attempt to rethink how we present pages.  The problem is the page generation, network latency, and page rendering being serial.  They render personalized pages and have to query several back end services to make the page.  The page is waiting on the slowest back end query.  So pipeline it out!  Decompose pages into “pagelets” and pipeline them through different execution stages in the server and browser.  They give priorities to different pagelets.
    How does it work?  First you get a nearly empty doc.  In the head, script src bigpipe.js.  Then there are divs on the page with IDs; a template with the logical structure of the page.  For each pagelet, it’s flushed separately in a script tag, JSON encoded.  BigPipe on th  client downloads CSS for the pagelet, displays it, downloads JS, and executes onLoad()s.
    This gave them a 2x improvement in perceived latency (defined by TTI) across all browsers.
    What about search engines?  Well, first of all, for devs to use the pipe, they have to write pagelets, and they have a pagelet abstraction for them to use.  Only has three functions: initialize, prepare, and render.  To pipeline you create a BigPipe instance, specify your page layout and place holders, add pagelets to the pipe (source file and wrapper id) and then call render.  So you can do pipeline, singleflush, parallel, or prepare models.  One parameter in Bigpipe::GetInstance controls it. Use singleflush for search and non-JS stuff.  Preparelets you batch multiple pages.  Parallel lets you use multiple threads for different pagelets (at the cost of server resources!).

Whew!  All this was a success – on Dec 22 they got their goal of making Facebook twice as fast.

Combine with ESIs for even more fun!

Thoughts from the Site Speed team:

To build a culture of performance…  Make tshirts!  They gave shirts to those who made improvements.

  1. Getting the right metrics, that people buy into, then they’ll work on optimizing it.
  2. Build the right abstraction to make the site fast by default, and if devs use them then you are fast without them having to do loads of work.
  3. Partnership.  If other teams are committed you’ll have success.  Find the people that get it and work with them.  Ignore the ignorant.

Final thought – spriting!  He likes spriting.it’s crazy but the platform is a little broken so you have to do stuff like that.  But let’s fic the platform so you don’t hav eto do crazy stuff.  Fast by default!!!

And that’s a wrap for Velocity 2010!  Next, stuff from DevOpsDays, and my thoughts and reflections on what we’ve learned!

Leave a comment

Filed under Conferences, DevOps

Velocity 2010 – Always Ship Trunk

Always Ship Trunk: Managing Change In Complex Websites by Paul Hammond (Typekit)

No rest for the wicked.  More sessions to write up.  Let’s find out how to do feature switches, Flickr-style.  My comments are in italics.

Use revision control. Branching is sad because of merging.  But Mercurial and git make it all magically delicious.

Revision control is nice but what it doesn’t answer is what is running on a given Web server.

There are three kinds of software.

  1. Installed
  2. Open Source installed
  3. Web apps/SaaS

Web apps are not like installed apps.  Revision control is meant to deal with loads of versions.  With a Web app there’s about 1 version of your app in use.  If you administer every computer your software is installed on, you don’t have to worry about a lot of stuff.  Once you upgrade, the old code will never be run again.  It has a very linear flow.

But not really.  Upgrades don’t happen on every box simultaneously.  And shouldn’t – best practice is rolling to a subset.

And you push to a staging/QA environment first.  So suddenly you have more “installs.”  And beta environments.

You have stuff (dependencies) outside your control – installed library dependencies, Web service dependencies – all that change has to be managed.

Coordinating lots of peopel working at the same time is hard.

Deep thought alert: Nobody knows you just deployed unless you tell them.

You can separate the code deployment from the launch.  You can rewrite your infrastructure and keep the UI the same and no one knows.

Deep thought alert 2: You can run different versions in production at the same time.

Put it out.  Ramp up usage.  Different people can see different UIs and they don’t know.

What we need is a revision control system that lets up manage multiple parallel versions of the code and switch between them at runtime.

Branches don’t solve that problem for us (by themselves).  And they don’t help with dependency changes that affect all branches at once – if someone changes their Web API you call, it affects every version!

revision vs version.

Manage the different versions within your application – “branching in code.”  You know, if statements.

This is really dangerous if you don’t have super duper regression testing right?  I’m rolling a new version but not really…  Good luck on that.

This is the “switch concept.”  It allows for feature testing on production servers.

Join it with cookies and you can have a “feature flip” page!  You can put all kinds of private functionality into the app and rely on whatever if statement you wrote to make sure no one bad gets to it!  Good Lord!

There are benefits to production testing (even if it’s not from end users) – firewall stuff, CDN stuff, et cetera.  It’s very flexible.  You can do dark launches.  Run the code in the background and don’t display it.  Now that’s clever.

There are three types of feature flags

  1. user facing feature development
  2. infrastructure development
  3. kill switches

Disable login!

They have loads of $cfg[‘disable_random_feature’] = false

The cost of this is complexity.

Separate your operational controls from development flags.

Be disciplined about removing unused feature flags so it’s not full of cruft.

If you’re going to do this,  just go all in and always deploy trunk to every server on every deploy and manage versions with config.

Definitely daring.  I wonder if it’s appropriate for more “real” workloads than “I’m uploading my pics to a free service for kicks” though.

Joel Spolsky sayeth:  This is retarded.

With new style distributed merge, instead:

  • Use branches for early development. Branches should be merged into trunk.
  • Use flags for rollout of almost-finished code.

Is there a better alternative?  Everyone who makes revision  control systems makes them for installed software not Web software – what would one for installed software look like?

Q&A Tidbits: Put all the switches in one place… Not spread through the code.

What about Sarbanes/Oxley division of labor?  Pshaw.  This is for apps that are just for funsies.

You have to build some culture stuff to about devs not jsut hitting deploy and wandering off, but following up on production state.

1 Comment

Filed under Conferences, DevOps

Velocity 2010 – Performance Indicators In The Cloud

Common Sense Performance Indicators in the Cloud by Nick Gerner (SEOmoz)

SEOmoz has been  EC2/S3 based since 2008.  They scaled from 50 to 500 nodes.  Nick is a developer who wanted him some operational statistics!

Their architecture has many tiers – S3, memcache, appl, lighttpd, ELB.  They needed to visualize it.

This will not be about waterfalls and DNS and stuff.  He’s going to talk specifically about system (Linux system) and app metrics.

/proc is the place to get all the stats.  Go “man proc” and understand it.

What 5 things does he watch?

  • Load average – like from top.  It combines a lot of things and is a good place to start but explains nothing.
  • CPU – useful when broken out by process, user vs system time.  It tells you who’s doing work, if the CPU is maxed, and if it’s blocked on IO.
  • Memory – useful when broken out by process.  Free, cached, and used.  Cached + free = available, and if you have spare memory, let the app or memcache or db cache use it.
  • Disk – read and write bytes/sec, utilization.  Basically is the disk busy, and who is using it and when?  Oh, and look at it per process too!
  • Network – read and write bytes/sec, and also the number of established connections.  1024 is a magic limit often.  Bandwidth costs money – keep it flat!  And watch SOA connections.

Perf Monitoring For Free

  1. data collection – collectd
  2. data storage- rrdtool
  3. dashboard management – drraw

They put those together into a dashboard.  They didn’t want to pay anyone or spend time managing it.  The dynamic nature of the cloud means stuff like nagios have problems.

They’d install collectd agents all over the cluster.  New nodes get a generic config, and node names follow a convention according to role.

Then there’s a dedicated perf server with the collectd server, a Web server, and drraw.cgi.  In a security group everyone can connect in to.

Back up your performance data- it’s critical to have history.

Cloudwatch gives you stuff – but not the insight you have when breaking out by process.  And Keynote/Gomez stuff is fine but doesn’t give you the (server side) nitty gritty.

More about the dashboard. Key requirements:

  • Summarize nodes and systems
  • Visualize data over time
  • Stack measurements per process and per node
  • Handle new nodes dynamically w/o config chage

He showed their batch mode dashboard.  Just a row per node, a metric graph per column.  CPU broken out by process with load average superimposed on top.  You see things like “high load average but there’s CPU to spare.”  Then you realize that disk is your bottleneck in real workloads.  Switch instance types.

Memory broken out by process too.  Yay for kernel caching.

Disk chart in bytes and ops.  The steady state, spikes, and sustained spikes are all important.

Network – overlay the 95th percentile cause that’s how you get billed.

Web Server dashboard from an API server is a little different.

Add Web requests by app/request type.  app1, app2, 302, 500, 503…  You want to see requests per second by type.

mod_status gives connections and children idleness.

System wide dashboard.  Each graph is a request type, then broken out by node.  And aggregate totals.

And you want median latency per request.  And any app specific stuff you want to know about.

So get the basic stats, over time, per node, per process.

Understand your baseline so you know what’s ‘really’ a spike.

Ad hoc tools -try ’em!

  • dstat -cdnml for system characteristics
  • iotop for per process disk IO
  • iostat -x 3 for detailed disk stats
  • netstat -tnp for per process TCP connection stats

His slides and other informative blog posts are at nickgerner.com.

A good bootstrap method… You may want to use more/better tools but it’s a good point that you can certainly do this amount for free with very basic tooling, so something you pay for best be better! I think the “per process” intuition is the best takeaway; a lot of otherwise fancy crap doesn’t do that.

But in the end I want more – baselines, alerting, etc.

Leave a comment

Filed under Cloud, Conferences, DevOps

Velocity 2010 – Grendel

Protecting “Cloud” Secrets With Grendel by Sam Quigley (Square, Inc) and Coda Hale (Yammer, Inc.)

Everyone stores private data.  Passwords, credit cards, documents, etc.  But also personal conversations, personal histories, usage patternns – that’s all private too.  So you store private info – yes you – so how do you protect it?  Firewalls and VPNs?  Passwords?  Bah.  They are useful against last decade’s attacks.

Application level attacks are the new hotness – see the OWASP Top 10.  What you want to do is encryption.  But that’s complex.  Veracode has analyzed a lot of apps and crypto problems are the #1 problem.

What do we do?  Here’s some ideas.

Grendel

It is a secure document storage system. Open and does minimal/simple.  It does data storage, authentication, and access control using the OpenPGP message format and a RESTful interface, it’s in Java, and uses a normal DB backend.

OpenPGP – mature, flexible.  It’s for confidentiality and integrity.  It uses asymmetric keys.  The keys are stored encrypted with passphrases.  The keys are used to encrypt documents to one or more recipients.

REST API – http native.  Why REST?  For all the reasons everyone uses REST.  Ubiquitous, well understood, simple, easily debugged (charles), free features.

Java 1.6 + RDBMS.  Java because it’s fast and stable and well understood.  Uses hibernate.  RDBMS because you already have one.

Grendel is simple.  One config file.  DB location and password and some c3p0 stuff.

java -jar grendel.jar schema -c database.properties

generates a schema.  Three tables; users, documents, and links.

java -jar grendel.jar server -c database.properties -p 8080

starts it.

The API has users, docs, links, and linked docs.  JSON based.

You can create a user, which makes a new key set behind the scenes.

You can store a document.  PUT /users/name/documents/docname with a basic auth header.  It decrypts the user’s keys, signs and encrypts the doc, and stores it.

GET /users/name/documents gets you a JSON list.  Or get the document and you get the document (duh).

Then you can link the document to another user to share it with them.

So what’s the big deal?

Self defending data.  The data itself enforces the access control rules.  And business logic is enforced with math.

He didn’t even mention the brilliance of this related to scenarios like things like subpoenas causing Amazon to give up your S3 data to people…

Authentication done right.  It’s hard to do it right.  Adaptive hashing.  A centralized service model.  Resistant to modern attacks.

It makes it “sudo for the Web.”  You can grant long lived session coolies, and re-auth for privileged access.  Yeah, we do that in general not with encryption…  Like Amazon.com remembers you but when it’s purchase time you have to reauth securely.

It also mitigates XSS/CSRF attacks, kinda.

This creates a privacy wall.  You the admin are locked out of the data.  Insider threat defeated.

In the future…

Support for sessions.  OAuth 2.0.  And spreading the idea in general!

How is this better than symmetric encryption with the user’s password?  Since you’re proxying it anywhere.  Because you can’t share then.

I guess one downside is that you can’t see inside the docs to search index, etc.

You could use client side certs instead of passwords right?  No.

Does it have support for password change?  Yes.

I personally am psyched about this – I think we have a product underway that could really benefit from using it.

Leave a comment

Filed under Cloud, Conferences, DevOps, Security

Velocity 2010 – Memcached Scalability

After lunch, we start off with Hidden Scalability Gotchas in Memcached and Friends by Neil Gunther (Performance Dynamics Company), Shanti Subramanyam (Oracle Corporation), and Stefan Parvu (Oracle Finland).

Scaling up versus scaling out.  Bigger or more.  There is no “best approach” – you need to be quantitative, with controlled measurements and numbers to see the cost-benefit.

Data isn’t information, you need to transform it.  Capacity planning  has “planning” in it.  Like with finance, you need a model.  Metrics + models = information.

Controlled Measurements

You want to take measurements in a known environment with a specific workload.  Using production time series data is like predicting the stock market using Google finance graphs.

You need throughput measured in steady state.  No load vs vusers with varying throughput…

So they did some controlled tests.

Memcached scaling is thread limited.  Past about 4-6 threads, throughput levels off.

By using a SPARC multicore friendly hash map patch, it did scale up to maybe 30 threads.

Quantifying Scalability

1.  Equal bang for the buck.  Ideal parallelism is a linear load vs capacity graph, but really it plateaus and degenerates at some point. But there’s an early art of the graph that looks like this.

2.  Cost of sharing resources – when the curve falls away from linear.

3.  Resource limitation – where the curve tops out (Amdahl’s law)

4.  Degradation/negative return – more capacity makes things worse after a point.

Formula: c(N)=N/(1+a(N-1)+bN(N-1))

N is the number of threads.  1=concurrency, a=contention , b= coherency

Run it through excel USL analysis and calculate a and b.

As memcached versions came out, the concurrency was improved, but N didn’t budge.  People can say they make improvements, but if it doesn’t affect the data, then bah.

Anyway, the model is semi predictive but not perfect.  If you know whether your problem is a contention (like queuing) or coherence (like point to point transfers) issue you know what to look for in your code.

Memcached Gotchas

Throw more hardware at it!  Well, current strategies are around old cheap hardware with single CPUs.  As multicore arrives, if you can’t use all the cores, you won’t utilize your hardware fully.

As memcached is thread limited it’ll be a problem on multicore.

Take controlled measurements with steady state throughput to analyze data.

Quantify scalability using a model.  Reduce contention and coherency.

Follow them at:

There’s a lot of discussion about the model predictability because they had a case where the model predicted one thing until there were higher order data points and then it changed.  The more data, the more the model works – but he stresses you need to trust the model with what data you have.  You’re not predicting, you’re explaining with the model.  It’s not going to tell you exactly what is wrong…  Lots of questions, people are mildly confused.

Leave a comment

Filed under Conferences, DevOps