Big Data is about VOLUME, VARIETY and VELOCITY of data. Let us see how SAP HANA platform full fill the requirement of 3 V’s (Volume, Variety and velocity) challenges of Big Data.
VOLUME
Volume of data increasing day by day and by 2020 it will be 40 Zetabyte .So for Big data now challenge is to store high volume of data.SAP HANA has successfully overcome with the volume aspect of Big Data by fortifying SAP HANA platform. Following are the two game changing features in SAP HANA platform related to data volume.
- SAP HANA and HADOOP integration
- Dynamic Tiering
SAP HANA and HADOOP integration
HADOOP facilitate to store infinite volume of data using distributed file system.SAP with its release of SP09, very tightly integrated with hadoop.Following are the SAP HANA and HADOOP integration options:
- SDA (Smart Data Access)
- SAP Data Services
- SAP BO-IDT (Information Design Tool)
- HANA XS Engine and Hadoop Hbase
SMART DATA ACCESS:Smart Data Access (SDA) provides SAP HANA with data virtualization capabilities. This technology allows to create a virtual table to combine SAP HANA data with other heterogeneous data sources like-HADOOP,TERADATA,MS SQL SERVER,ORACLE,SAP Sybase ASE,SAP Sybase IQ,SAP HANA
In SAP HANA SPS07, HANA connect to HIVE:
- Hadoop Pig and Hbase does not support ODBC but HIVE does support ODBC. SAP HANA can connect to Hadoop HIVE virtually using SDA. Please refer installation and configuration guide of Hadoop and SAP HANA using SDA.
- SAP HANA Acedmy video tutorial to configure Hadoop HIVE with SAP HANA using SDA.
- SQL Script to create Remote Data Source to HADOOP HIVE
CREATE REMOTE SOURCE HIVE
ADAPTER "hiveodbc"
CONFIGURATION 'DNS=HIVE'
WITH CREDENTIAL TYPE 'PASSWORD'
USING 'user=hive;password=hive';
- Create Virtual table on HIVE remote data source and consumed it on HANA catalog.
In SAP HANA SPS08, HANA connect to Apache SPARK:
- SQL Script to create Remote Data Source to HADOOP SPARK
CREATE REMOTE SOURCE HIVE
ADAPTER "hiveodbc"
CONFIGURATION 'DNS=SPARK'
WITH CREDENTIAL TYPE 'PASSWORD'
USING 'user=hive;password=SHa12345';
- Create Virtual table on SPARK remote data source and consumed it on HANA catalog.
- SAP HANA Acedmy video tutorial to configure Hadoop SPARK with SAP HANA .
- HANA Academy: SAP Smart Data Access for Apache Spark - Video 1 of 4. - YouTube
- HANA Academy: SAP Smart Data Access for Apache Spark - Video 2 of 4. - YouTube
- HANA Academy: SAP Smart Data Access for Apache Spark - Video 3 of 4. - YouTube
- HANA Academy: SAP Smart Data Access for Apache Spark - Video 4 of 4. - YouTube
In SAP HANA SPS09, HANA directly connect to Hadoop HDFS:
- Create Map Reduce Archives package in SAP HANA Development Prospective using JAVA
- Create Remote Data Source directly to Hadoop HDFS
CREATE REMOTE SOURCE HADOOP_SOURCE
ADAPTER "hadoop"
CONFIGURATION 'webhdfs_url=<url:port>;webhcat_url=<url:port>'
WITH CREDENTIAL TYPE 'PASSWORD'
USING 'user=hive;password=hive';
- Create Virtual Function
CREATE VIRTUAL FUNCTION HADOOP_WORD_COUNT
RETURN TABLE ("word" NVARCHAR(60),"count" integer)
package DEV01."DEV01.HanaShared::WordCount"
CONFIGURATION 'enable_remote_cache;mapred_jobchain=[("mapred_input":"/data/mockingbird"."mapred_mapper":"com.sap.hana.hadoop.samples.Wordmapper",
"mapred_reducer":"com.sap.hana.hadoop.samples.WordReducer"}]'
AT HADOOP_SOURCE;
- Create Virtual UDF to directly connect to HDFS file.
CREATE VIRTUAL FUNCTION HADOOP_PRODUCT_UDF()
RETURN TABLE ("product_class_is" INTEGER, "product_id" INTEGER,"brabd_name" VARCHAR(255))
CONFIGURATION 'datetiem_format=yyyy-MM-dd HH:mm:ss;date_format=yyyy-mm-dd HH:mm:ss;time_format=HH:mm:ss;enable_remote_caching=true;cache_validity=3600;
hdfs_location=/apps/hive/warehouse/dflo.db/product'
AT HADOOP_SOURCE;
- SAP HANA Acedmy video tutorial to configure Hadoop HDFS with SAP HANA .
- SAP HANA Academy: SDA; Hadoop Enhancements - 1. Creating User [SPS09] - YouTube
- SAP HANA Academy: SDA; Hadoop Enhancements - 2. Map Reduce Archives [SPS09] - YouTube
- SAP HANA Academy: SDA; Hadoop Enhancements - 3. Remote Sources [SPS09] - YouTube
- SAP HANA Academy: SDA; Hadoop Enhancements - 4. Virtual Functions [SPS09] - YouTube
- SAP HANA Academy: SDA; Hadoop Enhancements - 5. Virtual UDFs [SPS09] - YouTube
CONNECT TO HADOOP USING SAP DATA SERVICE
- Select File Format tab from Local Object Library->right click on HDFS File and click New
- Provide following parameter values in HDFS File Format editor
- Name: HDFS
- Namenode host: <host name of hadoop installation>
- Namenode port: <hadoop port>
- Root Directory: < Hadoop file path>=</user/hadoop/input>
- File Name: hdfs_data.txt
- Click on Save&Close and double click on created HDFS file again to view file format.
- Cretate Project->Job->Data Flow
- Drag HDFS file to the canvase and make it as source->drag query transformation and target table on the data flow canvase and join.
- Double click on Query transformation and schema IN and schema out
- Execute Job and view the data brought in HANA from Hadoop.
- Please refer HANA Academy Video
SAP BO(IDT)-HADOOP INTEGRATION
- We can create universe on Hadoop HIVE using SAP BO IDT(Information Design Tool).Please read complete doc for installation and configuration of SAP BO IDT on HIVE.
HANA XSENGINE AND HADOOP HBASE
HANA XSEngine can talk to Hadoop Hbase via server side Javascript.Please refer following article for more details.
Streaming Real-time Data to HADOOP and HANA
DYNAMIC TIERING
Dynamic tiering is SAP HANA extended storage of SAP IQ ES server integrated with SAP HANA node.Dynamic tie-ring has been included in SPS09.HOT data reside in SAP HANA In-Memory and warm data reside on IQ ES server columnar petabyte storage on disk.It provides environment to increase Terabyte SAP HANA In-Memory capability to Patabyte columnar disk storage without using Hadoop.
HOT & WARM Table creation:
CREATE TABLE "SYSTEM".SalesOrder_HOT" (
"ID" INTEGER NOT NULL,
"CUSTOMERID" INTEGER NOT NULL,
"ORDERDATE" DATE NOT NULL,
"FINANCIALCODE CHAR(2) NULL,
"REGION" CHAR(2) NULL,
"SALESREPRESENTATIVE" INTEGER NOT NULL,
PRIMARY KEY("ID")
);
CREATE TABLE "SYSTEM".SalesOrder_WARM" (
"ID" INTEGER NOT NULL,
"CUSTOMERID" INTEGER NOT NULL,
"ORDERDATE" DATE NOT NULL,
"FINANCIALCODE CHAR(2) NULL,
"REGION" CHAR(2) NULL,
"SALESREPRESENTATIVE" INTEGER NOT NULL,
PRIMARY KEY("ID")
)USING EXTENDED STORAGE;
Reference Document:
SAP HANA SPS 09 - Dynamic Tiering.pdf
Reference SAP HANA Academy Video:
SAP HANA Academy - SAP HANA Dynamic Tiering : Installation Overview [SPS 09] - YouTube
SAP HANA Academy - SAP HANA Dynamic Tiering: Introduction [SPS09] - YouTube