MapR M7. Making HBase Easy, Dependable and Fast. Tomer Shiran ... MapR
combines 10+ Apache-‐licensed open source projects with MapR's innovawon.
MapR M7
Making HBase Easy, Dependable and Fast
Tomer Shiran
[email protected]
©MapR Technologies -‐ Confiden6al
1
MapR Distribu:on for Apache Hadoop §
Open, enterprise-‐grade distribu6on for Hadoop – Easy, dependable and fast – Open source with standards-‐based extensions – Includes numerous Apache-‐licensed open source projects • •
§
Hive, Pig, Cascading, HBase, ZooKeeper, HCatalog, Flume, Sqoop, Whirr, Mahout, Oozie Integrated, tested and hardened
MapR is deployed at 1000’s of companies – From small Internet startups to the world’s largest enterprises
§
MapR customers analyze massive amounts of data: – Hundreds of billions of events daily – 90% of the world’s Internet popula6on monthly – $1 trillion in retail purchases annually
©MapR Technologies -‐ Confiden6al
2
MapR Distribu:on for Apache Hadoop
MapR%Control%System%(MCS)
Hive Pig Cascading
Oozie
Mahout
Flume
MapR%Distribu-on%for Apache%Hadoop
HCatalog
Sqoop
Whirr
Apache% HBase
MapReduce
MapR%Data%Pla*orm%(MDP)
NFS%interface HDFS%API
MapR combines 10+ Apache-‐licensed open source projects with MapR’s innova6on ©MapR Technologies -‐ Confiden6al
3
HBase Adop:on §
Used by 45% of Hadoop users
§
Database Opera:ons: Large scale key-‐value store Blob store Lightweight OLTP
§
Real-‐:me Analy:cs: Logis6cs
Shopping cart
Billing
Auc6on engine
Log analysis ©MapR Technologies -‐ Confiden6al
4
HBase: A Compelling Alterna:ve Current MapR HBase Op:ons
HBase Advantages § § §
Scale Strong Consistency Leverage of Hadoop
Open Source Free HBase
Open Source with Support
©MapR Technologies -‐ Confiden6al
5
Issues Constraining HBase Deployments Reliability • Compac6ons disrupt opera6ons • Very slow crash recovery • Unreliable splidng
Business con6nuity • Common hardware/sofware issues cause down6me • Administra6on requires down6me • No point-‐in-‐6me recovery • Complex backup process
Performance • Many bohlenecks result in low throughput • Limited data locality • Limited # of tables
Manageability • Compac6ons, splits and merges must be done manually (in reality) • Basic opera6ons like backup or table rename are complex
©MapR Technologies -‐ Confiden6al
6
Announcing M7
EASY
DEPENDABLE
FAST
Snapshots No Compac:ons
No RegionServers
Mirroring No Manual Splits
Consistent Low Latency
Region Recovery in Seconds
M7 Enterprise Grade ©MapR Technologies -‐ Confiden6al
7
Extending the MapR Data PlaTorm
MapR%Control%System%(MCS)
Hive Pig Cascading
Oozie
Mahout
Flume
MapR%Distribu-on%for Apache%Hadoop
HCatalog
Sqoop
Whirr
Apache% HBase
MapReduce
HBase&API MapR%Data%Pla*orm%(MDP)
NFS%interface HDFS%API
©MapR Technologies -‐ Confiden6al
8
Apache HBase on MapR
Limited data management, data protec6on and disaster recovery for tables. ©MapR Technologies -‐ Confiden6al
9
M7 – An Integrated System for Unstructured and Structured Data
©MapR Technologies -‐ Confiden6al
10
Unified Namespace for Files and Tables $ pwd /mapr/default/user/dave $ ls file1 file2 table1 table2 $ hbase shell hbase(main):003:0> create '/user/dave/table3', 'cf1', 'cf2', 'cf3' 0 row(s) in 0.1570 seconds $ ls file1 file2 table1 table2 table3 $ hadoop fs -‐ls /user/dave Found 5 items -‐rw-‐r-‐-‐r-‐-‐ 3 mapr mapr 16 2012-‐09-‐28 08:34 /user/dave/file1 -‐rw-‐r-‐-‐r-‐-‐ 3 mapr mapr 22 2012-‐09-‐28 08:34 /user/dave/file2 trwxr-‐xr-‐x 3 mapr mapr 2 2012-‐09-‐28 08:32 /user/dave/table1 trwxr-‐xr-‐x 3 mapr mapr 2 2012-‐09-‐28 08:33 /user/dave/table2 trwxr-‐xr-‐x 3 mapr mapr 2 2012-‐09-‐28 08:38 /user/dave/table3 ©MapR Technologies -‐ Confiden6al
11
Fewer Layers
HBase
Java+Virtual+Machine+(JVM)
Distributed+File+System+(DFS)
HBase
Java+Virtual+Machine+(JVM)
Java+Virtual+Machine+(JVM)
Local+File+System+(ext3)
Distributed+File+System+(DFS)
Unified+PlaKorm
Disks
Disks
Disks
Other+DistribuBons+w/+HBase
MapR+M3/M5+w/+HBase
MapR%M7
©MapR Technologies -‐ Confiden6al
12
Why No RegionServers? One network hop No daemons to manage
One cache
©MapR Technologies -‐ Confiden6al
13
Region Assignment
©MapR Technologies -‐ Confiden6al
14
Instant Recovery
§
Apache HBase experiences an outage when any node crashes – – –
§
Each RegionServer has a single log (WAL) that must be replayed before any region can be recovered Typically 10-‐30 minutes All regions served by that RegionServer cannot be accessed
M7 provides instant recovery –
M7 uses small WALs •
– –
§
Mul6ple WALs per region vs. 1 per RegionServer (1000 regions)
Instant recovery on put 1000-‐10000x faster recovery on get
How? –
M7 leverages unique MapR-‐FS capabili6es, not impacted by HDFS limita6ons • • •
Append support No limit to # of files MapR-‐FS translates random writes to sequen6al writes on disk
©MapR Technologies -‐ Confiden6al
15
Background: Log-‐Structured Merge Trees § § §
Tradi6onal disk-‐based index structures like B-‐Trees are expensive to maintain in real-‐6me LSM Trees reduce the cost by deferring and batching index changes Writes – –
§
Writes go to an in-‐memory index •
And a commit log in case the node crashes and recovery is needed
•
This may trigger a compac6on
The in-‐memory index is occasionally merged into the disk-‐based index
Reads –
Reads hit the in-‐memory index and the disk-‐based index
Write
Memory
Disk
Index
Log
Read
©MapR Technologies -‐ Confiden6al
Index
16
Background: Compac:ons Respone'(me'(ms) Compac(on
§
Compac6ons disrupt opera6ons because the response 6me spikes
§
HBase users disable compac6ons and run them manually – Down6me on Sunday is beher than down6me on Monday…
©MapR Technologies -‐ Confiden6al
17
Elimina:ng Compac:ons HBase-‐style
LevelDB-‐style
Examples
BigTable, HBase, Cassandra, Riak
Cassandra, Riak
WAF
Low
High
Low
RAF
High
Low
Low
I/O storms
Yes
No
No
Disk space overhead High (2x)
Low
Low
Skewed data handling Bad
Good
Good
Rewrite large values
Yes
No
Yes
M7
Write-‐amplifica6on factor (WAF): The ra6o between writes to disk and applica6on writes. Note that data must be rewrihen in every indexed structure. Read-‐amplifica6on factor (RAF): The ra6o between reads from disk and applica6on reads. Skewed data handling: When inser6ng values with similar keys (eg, increasing keys, trending topic), do other values also need to be rewrihen?
©MapR Technologies -‐ Confiden6al
18
Portability (In Both Ways) §
HBase applica6ons work as is with M7 – No need to recompile – No vendor lock-‐in
§
Customers can also run Apache HBase on an M7 cluster – Recommended during a migra6on – Table names with a slash (/) are in M7, table names without a slash are in
Apache HBase (this can be overridden to allow table-‐by-‐table migra6on)
§
Use standard CopyTable tool to copy a table from HBase to M7 and vice versa – hbase org.apache.hadoop.hbase.mapreduce.CopyTable -‐-‐new.name=/
user/tshiran/mytable mytable
©MapR Technologies -‐ Confiden6al
19
The PlaTorm for Big Data
Unprecedented Hadoop and NoSQL capabili6es integrated on an easy, dependable and fast plauorm § Supports Big Data opera6ons ranging from batch analy6cs to real-‐6me database opera6ons § The only plauorm that makes Hadoop and HBase enterprise grade §
©MapR Technologies -‐ Confiden6al
20
MapR Edi:ons
§ § § § §
Control System NFS Access Performance Unlimited Nodes Free
§ § § § § §
Also Available through:
§
Control System NFS Access Performance High Availability Snapshots & Mirroring 24 X 7 Support Annual Subscrip6on
Compute Engine ©MapR Technologies -‐ Confiden6al
21
§ §
§ § §
All the Features of M5 Simplified Administra6on for HBase Increased Performance Consistent Low Latency Unified Snapshots, Mirroring
Ques:ons? §
Want to join the beta? – Request access at www.mapr.com
§
Follow-‐up ques6ons? – Tomer Shiran (
[email protected])
©MapR Technologies -‐ Confiden6al
22