Quantcast
Channel: SCN : All Content - SAP HANA Developer Center
Viewing all articles
Browse latest Browse all 6745

BIG DATA & SAP HANA-Part1

$
0
0

Big Data is about VOLUME, VARIETY and VELOCITY of data. Let us see how SAP HANA platform full fill the requirement of 3 V’s (Volume, Variety and velocity) challenges of Big Data.

VOLUME

Volume of data increasing day by day and by 2020 it will be 40 Zetabyte .So for Big data now challenge is to store high volume of data.SAP HANA has successfully overcome with the volume aspect of Big Data by fortifying  SAP HANA platform. Following are the two game changing features in SAP HANA platform related to data volume.


      • SAP HANA and HADOOP integration
      • Dynamic Tiering

SAP HANA and HADOOP integration

            HADOOP facilitate to store infinite volume of data using distributed file system.SAP with its release of SP09, very tightly integrated with hadoop.Following are the SAP HANA and HADOOP integration options:

      • SDA (Smart Data Access)
      • SAP Data Services
      • SAP BO-IDT (Information Design Tool)
      • HANA XS Engine and Hadoop Hbase


SMART DATA ACCESS:Smart Data Access (SDA) provides SAP HANA with data virtualization capabilities. This technology allows to create a virtual table to combine SAP HANA data with other heterogeneous data sources like-HADOOP,TERADATA,MS SQL SERVER,ORACLE,SAP Sybase ASE,SAP Sybase IQ,SAP HANA

            

              In SAP HANA SPS07, HANA connect to HIVE:



    CREATE REMOTE SOURCE HIVE

    ADAPTER "hiveodbc"

    CONFIGURATION 'DNS=HIVE'

    WITH CREDENTIAL TYPE 'PASSWORD'

    USING 'user=hive;password=hive';


        • Create Virtual table on HIVE remote data source and consumed it on HANA catalog.


                In SAP HANA SPS08, HANA connect to Apache SPARK:

                           

        • SQL Script to create Remote Data Source to HADOOP SPARK


    CREATE REMOTE SOURCE HIVE

    ADAPTER "hiveodbc"

    CONFIGURATION 'DNS=SPARK'

    WITH CREDENTIAL TYPE 'PASSWORD'

    USING 'user=hive;password=SHa12345';


        • Create Virtual table on SPARK remote data source and consumed it on HANA catalog.

     


                In SAP HANA SPS09, HANA directly connect to Hadoop HDFS:


        • Create Map Reduce Archives package in SAP HANA Development Prospective using JAVA
        • Create Remote Data Source directly to Hadoop HDFS


    CREATE REMOTE SOURCE HADOOP_SOURCE

    ADAPTER "hadoop"

    CONFIGURATION 'webhdfs_url=<url:port>;webhcat_url=<url:port>'

    WITH CREDENTIAL TYPE 'PASSWORD'

    USING 'user=hive;password=hive';


        • Create Virtual Function


    CREATE VIRTUAL FUNCTION HADOOP_WORD_COUNT

    RETURN TABLE ("word" NVARCHAR(60),"count" integer)

    package DEV01."DEV01.HanaShared::WordCount"

    CONFIGURATION 'enable_remote_cache;mapred_jobchain=[("mapred_input":"/data/mockingbird"."mapred_mapper":"com.sap.hana.hadoop.samples.Wordmapper",

    "mapred_reducer":"com.sap.hana.hadoop.samples.WordReducer"}]'

    AT HADOOP_SOURCE;


        • Create Virtual UDF to directly connect to HDFS file.


    CREATE VIRTUAL FUNCTION HADOOP_PRODUCT_UDF()

    RETURN TABLE ("product_class_is" INTEGER, "product_id" INTEGER,"brabd_name" VARCHAR(255))

    CONFIGURATION 'datetiem_format=yyyy-MM-dd HH:mm:ss;date_format=yyyy-mm-dd HH:mm:ss;time_format=HH:mm:ss;enable_remote_caching=true;cache_validity=3600;

    hdfs_location=/apps/hive/warehouse/dflo.db/product'

    AT HADOOP_SOURCE;


     

                                      

             

    CONNECT TO HADOOP USING SAP DATA SERVICE         

        • Select File Format tab from Local Object Library->right click on HDFS File and click New

                                       SDA4.png

        • Provide following parameter values in HDFS File Format editor
          • Name: HDFS
          • Namenode host: <host name of hadoop installation>
          • Namenode port: <hadoop port>
          • Root Directory: < Hadoop file path>=</user/hadoop/input>
          • File Name: hdfs_data.txt

                                                  SDA5.png

        • Click on Save&Close and double click on created HDFS file again to view file format.

                                                    SDA6.png

        • Cretate Project->Job->Data Flow
        • Drag HDFS file to the canvase and make it as source->drag query transformation and target table on the data flow canvase and join.

                                            SDA7.png

        • Double click on Query transformation and schema IN and schema out

                                             SDA8.png

        • Execute Job and view the data brought in HANA from Hadoop.

                                             SDA10.png


    SAP BO(IDT)-HADOOP INTEGRATION

     

                                            HADOOP_IDT.png

    HANA XSENGINE AND HADOOP HBASE

        

                        HANA XSEngine can talk to Hadoop Hbase via server side Javascript.Please refer following article for more details.

                                                 XSEngine.png

     

                        Streaming Real-time Data to HADOOP and HANA

        

    DYNAMIC TIERING

     

              Dynamic tiering is SAP HANA extended storage of SAP IQ ES server integrated with SAP HANA node.Dynamic tie-ring has been included in SPS09.HOT data reside in SAP HANA In-Memory and warm data reside on IQ ES server columnar petabyte storage on disk.It provides environment to increase Terabyte SAP HANA In-Memory capability to Patabyte columnar disk storage without using Hadoop.

     

         HOT & WARM Table creation:

     

    CREATE TABLE "SYSTEM".SalesOrder_HOT" (

    "ID" INTEGER NOT NULL,

    "CUSTOMERID" INTEGER NOT NULL,

    "ORDERDATE" DATE NOT NULL,

    "FINANCIALCODE CHAR(2) NULL,

    "REGION" CHAR(2) NULL,

    "SALESREPRESENTATIVE" INTEGER NOT NULL,

    PRIMARY KEY("ID")

    );

     

     

    CREATE TABLE "SYSTEM".SalesOrder_WARM" (

    "ID" INTEGER NOT NULL,

    "CUSTOMERID" INTEGER NOT NULL,

    "ORDERDATE" DATE NOT NULL,

    "FINANCIALCODE CHAR(2) NULL,

    "REGION" CHAR(2) NULL,

    "SALESREPRESENTATIVE" INTEGER NOT NULL,

    PRIMARY KEY("ID")

    )USING EXTENDED STORAGE;

     

     

    Reference Document:

    SAP HANA SPS 09 - Dynamic Tiering.pdf

     

    Reference SAP HANA Academy Video:

     

    SAP HANA Academy - SAP HANA Dynamic Tiering : Installation Overview [SPS 09] - YouTube

    SAP HANA Academy - SAP HANA Dynamic Tiering: Introduction [SPS09] - YouTube


    Viewing all articles
    Browse latest Browse all 6745

    Trending Articles



    <script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>