Velocity 2009 – Hadoop Operations: Managing Big Data Clusters

Hadoop Operations: Managaing Big Data Clusters (see link on that page for preso) was given by Jeff Hammerbacher of Cloudera.

Other good references –
book: “Hadoop: The Definitive Guide”
preso: hadoop cluster management from USENIX 2009

Hadoop is an Apache project inspired by Google’s infrastructure; it’s software for programming warehouse-scale computers.

It has recently been split into three main subprojects – HDFS, MapReduce, and Hadoop Common – and sports an ecosystem of various smaller subprojects (hive, etc.).

Usually a hadoop cluster is a mess of stock 1 RU servers with 4x1TB SATA disks in them. “I like my servers like I like my women – cheap and dirty,” Jeff did not say.

HDFS:

Pools servers into a single hierarchical namespace
It’s designed for large files, written once/read many times
It does checksumming, replication, compression
Access is from from Java, C, command line, etc. Not usually mounted at the OS level.

MapReduce:

Is a fault tolerant data layer and API for parallel data processing
Has a key/value pair model
Access is via Java, C++, streaming (for scripts), SQL (Hive), etc
Pushes work out to the data

Subprojects:

Avro (serialization)
HBase (like Google BigTable)
Hive (SQL interface)
Pig (language for dataflow programming)
zookeeper (coordination for distrib. systems)

Facebook used scribe (log aggregation tool) to pull a big wad of info into hadoop, published it out to mysql for user dash, to oracle rac for internal…
Yahoo! uses it too.

Sample projects hadoop would be good for – log/message warehouse, database archival store, search team projects (autocomplete), targeted web crawls…
As boxes you can use unused desktops, retired db servers, amazon ec2…

Tools they use to make hadoop include subversion/jira/ant/ivy/junit/hudson/javadoc/forrest
It uses an Apache 2.0 license

Good configs for hadoop:

use 7200 rpm sata, ecc ram, 1U servers
use linux, ext3 or maybe xfs filesystem, with noatime
JBOD disk config, no raid
java6_14+

To manage it –

unix utes: sar, iostat, iftop, vmstat, nfsstat, strace, dmesg, friends

java utes: jps, jstack, jconsole
Get the rpm! http://www.cloudera.com/hadoop

config: my.cloudera.com
modes – standalong, pseudo-distrib, distrib
“It’s nice to use dsh, cfengine/puppet/bcfg2/chef for config managment across a cluster; maybe use scribe for centralized logging”

I love hearing what tools people are using, that’s mainly how I find out about new ones!

Common hadoop problems:

“It’s almost always DNS” – use hostnames
open ports
distrib ssh keys (expect)
write permissions
make sure you’re using all the disks
don’t share NFS mounts for large clusters
set JAVA_HOME to new jvm (stick to sun’s)

HDFS In Depth

1. NameNode (master)
VERSION file shows data structs, filesystem image (in memory) and edit log (persisted) – if they change, painful upgrade

2. Secondary NameNode (aka checkpoint node) – checkpoints the FS image and then truncates edit log, usually run on a sep node
New backup node in .21 removes need for NFS mount write for HA

3. DataNode (workers)
stores data in local fs
stored data into blk_<id> files, round robins through dirs
heartbeat to namenode
raw socket to serve to client

4. Client (Java HDFS lib)
other stuff (libhdfs) more unstable

hdfs operator utilities

safe mode – when it starts up
fsck – hadoop version
dfsadmin
block scanner – runs every 3 wks, has web interface
balancer – examines ratio of used to total capacity across the cluster
har (like tar) archive – bunch up smaller files
distcp – parallel copy utility (uses mapreduce) for big loads
quotas

has users, groups, permissions – including x but there is no execution, but used for dirs
hadoop has some access trust issues – used through gateway cluster or in trusted env
audit logs – turn on in log4j.properties

has loads of Web UIs – on namenode go to /metrics, /logLevel, /stacks
non-hdfs access – HDFS proxy to http, or thriftfs
has trash (.Trash in home dir) – turn it on

includes benchmarks – testdfsio, nnbench

Common HDFS problems

disk capacity, esp due to log file sizes – crank up reserved space
slow but not dead disks and flapping NICS to slow mode
checkpointing and backing up metadata – monitor that it happens hourly
losing write pipeline for long lived writes – redo every hour is recommended
upgrades
many small files

MapReduce

use Fair Share or Capacity scheduler
distributed cache
jobcontrol for ordering

Monitoring – They use ganglia, jconsole, nagios and canary jobs for functionality

Question – how much admin resource would you need for hadoop? Answer – Facebook ops team had 20% of 2 guys hadooping, estimate you can use 1 person/100 nodes

He also notes that this preso and maybe more are on slideshare under “jhammerb.”

I thought this presentation was very complete and bad ass, and I may have some use cases that hadoop would be good for coming up!

Velocity 2009 – Hadoop Operations: Managing Big Data Clusters

HDFS In Depth

MapReduce

Leave a comment Cancel reply

Subscribe

Recent Comments

Recent Posts

Austinites

Cloud

DevOps

@wickett

@ernestmueller

@iteration1

Archives

Velocity 2009 – Hadoop Operations: Managing Big Data Clusters

HDFS In Depth

MapReduce

Share this:

Related

Leave a comment Cancel reply

Subscribe

Recent Comments

Recent Posts

Austinites

Cloud

DevOps

@wickett

@ernestmueller

@iteration1

Archives