Big data

41 downloads 3575 Views 1MB Size Report
SpagoBI and Talend jointly support Big Data scenarios. Monica Franceschini - SpagoBI Architect. SpagoBI Competency Center - Engineering Group ... Page 4 ...

SpagoBI and Talend jointly support Big Data scenarios

Monica Franceschini - SpagoBI Architect SpagoBI Competency Center - Engineering Group

Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

www.spagobi.org

Big-data • Agenda – – – –

Intro & definitions Layers Talend & SpagoBI SpagoBI big-data roadmap

Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

www.spagobi.org

Big Data - 3Vs

"Big data" is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Source: The Importance of 'Big Data': A Definition, Mark Beyer, Douglas. Gartner, 21 June 2012.

VOLUME The increase in data volumes within enterprise systems is caused by transaction volumes and other traditional data types, as well as by new types of data. Too much volume is a storage issue, but too much data is also a massive analysis issue VARIETY IT leaders have always had an issue translating large volumes of transactional information into decisions — now there are more types of information to analyze — mainly coming from social media and mobile (context-aware). Variety includes tabular data (databases), hierarchical data, documents, e-mail, metering data, video, still images, audio, stock ticker data, financial transactions and more. VELOCITY This involves streams of data, structured record creation, and availability for access and delivery. Velocity means both how fast data is being produced and how fast the data must be processed to meet demand Gartner Press Release, “Gartner Says Solving ‘Big Data’ Challenge Involves More Than Just Managing Volumes of Data”, June 27, 2011 Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

www.spagobi.org

Big Data- 3Vs & more VARIABILITY variance in meaning, in lexicon

VERACITY 1 in 3 business leaders don’t trust the information they use to make decisions. How can you act upon information if you don’t trust it? Establishing trust in big data presents a huge challenge as the variety and number of sources grows.

VALUE The economic value of different data varies significantly. Typically there is good information hidden amongst a larger body of nontraditional data; the challenge is identifying what is valuable and then transforming and extracting that data for analysis.

Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

www.spagobi.org

Big data - Layers • Infastructure – On-site – IaaS

• Data management: – – – –

capture cleaning loading store

ETL

• View and Analyse – Text analysis – Text mining – exploration, navigation, presentation

Business Intelligence

• Application – Cloud – SaaA

Services

Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

www.spagobi.org

Big data & Businessn Intelligence

• Tasks: – Manage big-data (ETL) → Talend – Read, interpret and show big-data (BI) → SpagoBI – Big-data and real-time (BI) → SpagoBI

Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

www.spagobi.org

Talend - Big Data Management Big Data Production RDBMS Analytical DB NoSQL DB ERP/CRM SaaS Social Media Web Analytics Log Files RFID Call Data Records Sensors Machine-Generated

Big Data Management Big Data Integration

Big Data Quality

Big Data Consumption Mining Analytics

Storage Processing Filtering

Parsing Checking

Search Enrichment

Turn Big Data into actionable information Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

www.spagobi.org

Talend Goal: democratize Big Data Talend Open Studio for Big Data “Big Data for the Masses”  Improves efficiency of big data job design with

graphic interface  Abstracts and generates code

HCatalog

 Run transforms inside Hadoop  Native support for HDFS, Sqoop, HBase,

Mahout, Pig, Hive & MapReduce code generat°

…an open source ecosystem

 Apache License 2.0  Embedded in Hortonworks Data Platform  Certifed with Cloudera, MapR and Grenplum

Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

www.spagobi.org

ETL: Analytical databases & appliances Connectors from/to: Greenplum Netezza Sybase Teradata VectorWise Vertica HDFS HBase Hive Cassandra MongoDB

‗ ‗ ‗ ‗ ‗ ‗ ‗ ‗ ‗ ‗ ‗

Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

www.spagobi.org

SpagoBI - load

Certified appliances: Teradata VectorWise

‗ ‗

Connectors from: Cassandra HBase Hive Impala Hadoop

‗ ‗ ‗ ‗ ‗

RT with: Storm WSO2

‗ ‗

More:

‗Scheduled data-set ‗In-memory data set

Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

www.spagobi.org

SpagoBI - meaning

Connectors from: Neo4J Freebase OrientDB

‗ ‗ ‗

Support for open standards: RDF (Resource Description Framework) http://www.w3.org/RDF/ OWL (Web Ontology Language) http://www.w3.org/OWL/ R Mahout Text mining

‗ ‗ ‗ ‗ ‗

Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

www.spagobi.org

SpagoBI - show

Explorative front-end Network analysis Exploration In-memory Data visualization

‗ ‗ ‗ ‗

Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

www.spagobi.org

SpagoBI - roadmap • Capture / Store – Talend, connector to/from: • • • • • • • • • • • •

Greenplum Netezza Sybase Teradata VectorWise Vertica HDFS HBase Hive Cassandra MongoDB …

Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

• LOAD – Certified appliances: • Teradata • VectorWise

– Connectors from: • • • • • •

Cassandra HBase Hive Impala Hadoop MongoDB

– RT with: • Storm • WS02

– More: • Scheduled data-set • In-memory data set www.spagobi.org

SpagoBI - roadmap • Meaning

• Show

– Connectors from:

– Explorative front-end – Network analysis – Data visualization

• Neo4J • Freebase • OrientDB

– Support for open standards: • RDF • OWL

• Services

– Mining • R • MashR • Text mining

– Big data as a service • Multitenant • Cloud • BI as a service (ad-hoc+self-service)

Data scientist Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

www.spagobi.org

Bundle Talend -SpagoBI SpagoBI and Talend announce their bundle!

The bundle will provide: a distribution of both tools interacting one with each other a use-case that can be run to explore their functionalities

Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

www.spagobi.org

[email protected] @twittmonique

Copyright © 2013 Engineering Group, SpagoBI Competency Center. All rights reserved.

www.spagobi.org