MapR M7 - Meetup

4 downloads 143 Views 4MB Size Report
MapR M7. Making HBase Easy, Dependable and Fast. Tomer Shiran ... MapR combines 10+ Apache-‐licensed open source projects with MapR's innovawon.
MapR  M7  

Making  HBase  Easy,  Dependable  and  Fast  

Tomer  Shiran   [email protected]  

©MapR  Technologies  -­‐  Confiden6al  

1  

MapR  Distribu:on  for  Apache  Hadoop   § 

Open,  enterprise-­‐grade  distribu6on  for  Hadoop   –  Easy,  dependable  and  fast   –  Open  source  with  standards-­‐based  extensions   –  Includes  numerous  Apache-­‐licensed  open  source  projects   •  • 

§ 

Hive,  Pig,  Cascading,  HBase,  ZooKeeper,  HCatalog,  Flume,  Sqoop,  Whirr,   Mahout,  Oozie   Integrated,  tested  and  hardened  

MapR  is  deployed  at  1000’s  of  companies   –  From  small  Internet  startups  to  the  world’s  largest  enterprises  

§ 

MapR  customers  analyze  massive  amounts  of  data:   –  Hundreds  of  billions  of  events  daily   –  90%  of  the  world’s  Internet  popula6on  monthly   –  $1  trillion  in  retail  purchases  annually  

©MapR  Technologies  -­‐  Confiden6al  

2  

MapR  Distribu:on  for  Apache  Hadoop  

MapR%Control%System%(MCS)

Hive Pig Cascading

Oozie

Mahout

Flume

MapR%Distribu-on%for Apache%Hadoop

HCatalog

Sqoop

Whirr

Apache% HBase

MapReduce

MapR%Data%Pla*orm%(MDP)

NFS%interface HDFS%API

MapR  combines  10+  Apache-­‐licensed  open  source  projects  with  MapR’s  innova6on   ©MapR  Technologies  -­‐  Confiden6al  

3  

HBase  Adop:on   § 

Used  by  45%  of  Hadoop  users    

§ 

Database  Opera:ons:      Large  scale  key-­‐value  store    Blob  store        Lightweight  OLTP        

§ 

Real-­‐:me  Analy:cs:    Logis6cs

 

 Shopping  cart  

 Billing

 

 Auc6on  engine    

 Log  analysis   ©MapR  Technologies  -­‐  Confiden6al  

4  

HBase:  A  Compelling  Alterna:ve   Current  MapR  HBase  Op:ons  

HBase  Advantages   §  §  § 

Scale   Strong  Consistency   Leverage  of  Hadoop  

Open  Source     Free  HBase  

Open  Source     with  Support  

©MapR  Technologies  -­‐  Confiden6al  

5  

Issues  Constraining  HBase  Deployments   Reliability   • Compac6ons  disrupt  opera6ons   • Very  slow  crash  recovery   • Unreliable  splidng  

Business  con6nuity   • Common  hardware/sofware  issues  cause  down6me   • Administra6on  requires  down6me   • No  point-­‐in-­‐6me  recovery   • Complex  backup  process  

Performance   • Many  bohlenecks  result  in  low  throughput   • Limited  data  locality   • Limited  #  of  tables  

Manageability   • Compac6ons,  splits  and  merges  must  be  done  manually  (in  reality)   • Basic  opera6ons  like  backup  or  table  rename  are  complex  

©MapR  Technologies  -­‐  Confiden6al  

6  

Announcing  M7  

EASY  

DEPENDABLE  

FAST  

Snapshots   No  Compac:ons  

No  RegionServers  

Mirroring   No  Manual  Splits  

Consistent  Low  Latency  

Region  Recovery  in  Seconds  

M7  Enterprise  Grade   ©MapR  Technologies  -­‐  Confiden6al  

7  

Extending  the  MapR  Data  PlaTorm  

MapR%Control%System%(MCS)

Hive Pig Cascading

Oozie

Mahout

Flume

MapR%Distribu-on%for Apache%Hadoop

HCatalog

Sqoop

Whirr

Apache% HBase

MapReduce

HBase&API MapR%Data%Pla*orm%(MDP)

NFS%interface HDFS%API

©MapR  Technologies  -­‐  Confiden6al  

8  

Apache  HBase  on  MapR  

Limited  data  management,  data  protec6on  and  disaster  recovery  for  tables.     ©MapR  Technologies  -­‐  Confiden6al  

9  

M7  –  An  Integrated  System  for  Unstructured   and  Structured  Data  

©MapR  Technologies  -­‐  Confiden6al  

10  

Unified  Namespace  for  Files  and  Tables   $  pwd   /mapr/default/user/dave     $  ls   file1    file2    table1    table2     $  hbase  shell   hbase(main):003:0>  create  '/user/dave/table3',  'cf1',  'cf2',  'cf3'   0  row(s)  in  0.1570  seconds     $  ls   file1    file2    table1    table2    table3     $  hadoop  fs  -­‐ls  /user/dave   Found  5  items   -­‐rw-­‐r-­‐-­‐r-­‐-­‐      3  mapr  mapr                  16  2012-­‐09-­‐28  08:34  /user/dave/file1   -­‐rw-­‐r-­‐-­‐r-­‐-­‐      3  mapr  mapr                  22  2012-­‐09-­‐28  08:34  /user/dave/file2   trwxr-­‐xr-­‐x      3  mapr  mapr                    2  2012-­‐09-­‐28  08:32  /user/dave/table1   trwxr-­‐xr-­‐x      3  mapr  mapr                    2  2012-­‐09-­‐28  08:33  /user/dave/table2   trwxr-­‐xr-­‐x      3  mapr  mapr                    2  2012-­‐09-­‐28  08:38  /user/dave/table3   ©MapR  Technologies  -­‐  Confiden6al  

11  

Fewer  Layers  

HBase

Java+Virtual+Machine+(JVM)

Distributed+File+System+(DFS)

HBase

Java+Virtual+Machine+(JVM)

Java+Virtual+Machine+(JVM)

Local+File+System+(ext3)

Distributed+File+System+(DFS)

Unified+PlaKorm

Disks

Disks

Disks

Other+DistribuBons+w/+HBase

MapR+M3/M5+w/+HBase

MapR%M7

©MapR  Technologies  -­‐  Confiden6al  

12  

Why  No  RegionServers?   One  network  hop   No  daemons  to  manage  

One  cache  

©MapR  Technologies  -­‐  Confiden6al  

13  

Region  Assignment  

©MapR  Technologies  -­‐  Confiden6al  

14  

Instant  Recovery  

§ 

Apache  HBase  experiences  an  outage  when  any  node  crashes   –  –  – 

§ 

Each  RegionServer  has  a  single  log  (WAL)  that  must  be  replayed  before  any  region  can   be  recovered   Typically  10-­‐30  minutes   All  regions  served  by  that  RegionServer  cannot  be  accessed  

M7  provides  instant  recovery   – 

M7  uses  small  WALs   • 

–  – 

§ 

Mul6ple  WALs  per  region  vs.  1  per  RegionServer  (1000  regions)  

Instant  recovery  on  put   1000-­‐10000x  faster  recovery  on  get  

How?   – 

M7  leverages  unique  MapR-­‐FS  capabili6es,  not  impacted  by  HDFS  limita6ons   •  •  • 

Append  support   No  limit  to  #  of  files   MapR-­‐FS  translates  random  writes  to  sequen6al  writes  on  disk  

©MapR  Technologies  -­‐  Confiden6al  

15  

Background:  Log-­‐Structured  Merge  Trees   §  §  § 

Tradi6onal  disk-­‐based  index  structures  like  B-­‐Trees  are  expensive  to  maintain   in  real-­‐6me   LSM  Trees  reduce  the  cost  by  deferring  and  batching  index  changes   Writes   –  – 

§ 

Writes  go  to  an  in-­‐memory  index   • 

And  a  commit  log  in  case  the  node  crashes  and  recovery  is  needed  

• 

This  may  trigger  a  compac6on  

The  in-­‐memory  index  is  occasionally  merged  into  the  disk-­‐based  index  

Reads   – 

Reads  hit  the  in-­‐memory  index  and  the  disk-­‐based  index  

Write

Memory

Disk

Index

Log

Read

©MapR  Technologies  -­‐  Confiden6al  

Index

16  

Background:  Compac:ons   Respone'(me'(ms) Compac(on

§ 

Compac6ons  disrupt  opera6ons  because  the  response  6me  spikes  

§ 

HBase  users  disable  compac6ons  and  run  them  manually   –  Down6me  on  Sunday  is  beher  than  down6me  on  Monday…  

  ©MapR  Technologies  -­‐  Confiden6al  

17  

Elimina:ng  Compac:ons   HBase-­‐style  

LevelDB-­‐style  

Examples  

BigTable,  HBase,   Cassandra,  Riak  

Cassandra,  Riak  

WAF  

Low  

High  

Low  

RAF  

High  

Low  

Low  

I/O  storms  

Yes  

No  

No  

Disk  space  overhead   High  (2x)  

Low  

Low  

Skewed  data  handling   Bad  

Good  

Good  

Rewrite  large  values  

Yes  

No  

Yes  

M7  

Write-­‐amplifica6on  factor  (WAF):  The  ra6o  between  writes  to  disk  and  applica6on  writes.  Note  that  data  must   be  rewrihen  in  every  indexed  structure.   Read-­‐amplifica6on  factor  (RAF):  The  ra6o  between  reads  from  disk  and  applica6on  reads.   Skewed  data  handling:  When  inser6ng  values  with  similar  keys  (eg,  increasing  keys,  trending  topic),  do  other   values  also  need  to  be  rewrihen?  

©MapR  Technologies  -­‐  Confiden6al  

18  

Portability  (In  Both  Ways)   § 

HBase  applica6ons  work  as  is  with  M7   –  No  need  to  recompile   –  No  vendor  lock-­‐in  

§ 

Customers  can  also  run  Apache  HBase  on  an  M7  cluster   –  Recommended  during  a  migra6on   –  Table  names  with  a  slash  (/)  are  in  M7,  table  names  without  a  slash  are  in  

Apache  HBase  (this  can  be  overridden  to  allow  table-­‐by-­‐table  migra6on)  

§ 

Use  standard  CopyTable  tool  to  copy  a  table  from  HBase  to  M7   and  vice  versa   –  hbase  org.apache.hadoop.hbase.mapreduce.CopyTable  -­‐-­‐new.name=/

user/tshiran/mytable  mytable  

©MapR  Technologies  -­‐  Confiden6al  

19  

The  PlaTorm  for  Big  Data  

Unprecedented  Hadoop  and  NoSQL  capabili6es   integrated  on  an  easy,  dependable  and  fast  plauorm     §  Supports  Big  Data  opera6ons  ranging  from  batch   analy6cs  to  real-­‐6me  database  opera6ons     §  The  only  plauorm  that  makes  Hadoop  and  HBase   enterprise  grade   § 

  ©MapR  Technologies  -­‐  Confiden6al  

20  

MapR  Edi:ons  

§  §  §  §  § 

Control  System   NFS  Access   Performance   Unlimited  Nodes   Free    

§  §  §  §  §  § 

Also  Available  through:    

§ 

Control  System   NFS  Access   Performance   High  Availability   Snapshots  &  Mirroring   24  X  7  Support   Annual  Subscrip6on  

Compute  Engine   ©MapR  Technologies  -­‐  Confiden6al  

21  

§  § 

§  §  § 

All  the  Features  of  M5   Simplified   Administra6on  for   HBase   Increased  Performance   Consistent  Low  Latency   Unified  Snapshots,   Mirroring  

Ques:ons?   § 

Want  to  join  the  beta?   –  Request  access  at  www.mapr.com  

§ 

Follow-­‐up  ques6ons?   –  Tomer  Shiran  ([email protected])  

 

©MapR  Technologies  -­‐  Confiden6al  

22