Big Data technology not only supports the ability to collect large amounts of data, but it also provides the ability to understand it and take advantage of its value. The goal of all organizations with access to large data collections should be to harness the most relevant data and use it for optimized decision making.
There are many different analytic techniques that can be brought to bear on one or more repositories of unstructured data. Textual a analytics and semantics have been around for many years. However, these techniques have been brought into the limelight by the ubiquity of born-digital documents and the fact that modern processors are able to execute very sophisticated algorithms in a timely manner. Added to this is the potential additional insight made available from cross-linking to conventional BI analysis.The recent availability of huge amounts of data, along with advanced tools of exploratory data analysis, data mining and data visualization, offers a whole new way of understanding the world.
What is Big Data
Big Data means immense amount of data, which is difficult to collect, store, manage, and analyze via general database software.
- Traditionally we have used Databases and Business Intelligence tools to give a sense of the Data
- Big Data goes beyond the conventional analytical tools and data bases and it deals with a variety of Data.
- Traditional databases were historic in reporting, but the Big Data will be in real time.
- Big Data has the capacity do deal with data which is in PetaByte and Exabyte.
Big Data will be defined by the below dimensions. Any Data that fits 1 or more dimensions can be defined as Big Data.
Volume:
- Machine-generated data is produced in much larger quantities in Petabytes per day
- A single jet engine can generate 10TB of tracking data in 30 minutes; 25,000 airline flights per day lead to daily volume to Petabytes.
- Structured data needs high bandwidth and high capacity storage.
Velocity:
- Social media data streams produce a large influx of opinions and relationships valuable to customer relationship management.
- Even at 140 characters per tweet, the high velocity (or frequency) of Twitter data ensures large volumes (over 8 TB per day).
Variety:
- As new services are added, new sensors deployed, or new marketing campaigns executed, new data types are needed to capture the resultant information.
Value:
Challenge is in identifying what is valuable in the large volume of data and then transforming and extracting that data for analysis.
Why Big Data
We are relying more and more on the information & Data in decision making process the volume of information to be managed by the business increases. In 2011, over 1.8 Exabyte of data were created. But by 2020, more than 35,000 Exabyte (or 35 Zetabyte) will have been created — that’s nearly 20,000 times today’s amount.
Big Data - Features
- Real-time analysis of transactional and analytic data from disparate sources.
- Adaptable, powerful analytic models that expose insight.
- Integration with predictive analysis software and algorithms.
- Native full-text search, graphical search modeling, and user interface toolkit.
- Find valuable and actionable information from their mass amounts of data.
- Accelerate business processes with rapid analysis and reporting.
- Invent new business models and processes.
- Reduce Total cost of ownership (TCO) with less hardware and maintenance.
Big Data - Benefits
- Real-time business insight by analyzing business operations as they happen.
- Adaptable, powerful analytic models– Create flexible views that expose analytic information at the speed of thought without assistance from IT.
- Extensive, source-agnostic data access– Add external data to analytic models to incorporate data from across the entire organization.
- Multipurpose, in-memory technology– Instantly explore and analyze all transactional and analytical data in real time from virtually any data source.
- Better decisions more quickly by gaining immediate access to all relevant information.
- Greater analytic flexibility through reduced reliance on IT.
- Dramatically reduced hardware and maintenance costs through a flexible, cost-effective, real-time approach for managing large data volumes.
- Improved planning, forecasting, and financial close processes by employing analytic models that uncover trends and patterns.
Big Data - Opportunities
- Gartner predicts that through 2015, 85% of Fortune 500 organizations will be unable to exploit Big Data for competitive advantage.
- There is a small but emerging and rapidly growing segment of Big Data projects that serve as an extension to on-premises enterprise BI projects, but they often deliver incremental value rather than being completely integrated into the organization wide BI strategy.
- Gartner has observed that despite significant amount of interest among large enterprises, the early deployments were either tactical, one-off projects or extensions of traditional BI strategies with additional analytics and data processing.
- A completely integrated Big Data strategy is almost nonexistent among enterprise users, and Gartner expects this situation to continue until viable solutions and proof points emerge.
- Big Data projects mean different things to different constituencies, and the costs and infrastructure planning elements also differ drastically.
- Service providers and high-performance computing (HPC) environments that deal with vast amounts of unstructured, fast-moving real-time data are likely to deploy "greenfield" Big Data projects faster than enterprise users with large investments in traditional BI tools and technologies.
Few CASE STUDIES ON BIG DATA
Case Study 1 : Ebay uses big data to make you buy more
- With 180 million active buyers and sellers on Ebay, the website generates a lot of data. At any given point in time, there can be around 350 million items listed for sale, with over 250 million queries made per day through Ebay's auction search engine.
- Ebay uses big data to make predictions on whether a listed item will sell and how much it will sell for, which affects how high an item ranks on the auction site's search engine.
- Ebay is optimizing its search engine and the results that come up by making tweaks based on customer behaviour patterns deciphered through the collected data.
- Ebay alters and rewrites the search queries made by users through its search engine, adding synonyms and alternative terms so that it can bring up more relevant results.
Case Study 2: Yahoo! Improves Campaign Effectiveness, Boosts Ad Revenue with Big Data Solution
Yahoo! operates one of the most popular websites in the world, with more than 700 millionunique visitors a month globally. The company owns and operates an online advertising exchange for the large number of customers who purchase advertising on various Yahoo! sites. They take advantage of this exchange to better target and manage their ad campaigns. Because it sought to give these customers more meaningful and useful analytical data faster, Yahoo! implemented a solution that takes data from its vast data stores within the Apache Hadoop open-source framework and ultimately moves it to Microsoft SQL Server 2008 R2. Using this solution, Yahoo! has improved campaign effectiveness, while advertisers have increased spending with Yahoo! The company also provides more relevant advertising data, and the solution’s partitioning design means faster loading of large data sets.
Case Study 3 - Big data catches illegal immigrants - Australia
- Government cloud computing and big data are changing the face of enterprise IT solutions within government agencies around the world.
- The Australian government is using big data to help assess who is an illegal immigrant at ports around the country.
- Australia’s Department of Immigration and Citizenship’s new computerized system has already helped catch hundreds of illegal travelers while saving taxpayers millions of dollars.
Case Study 4 : TV Show Satyamev Jayate – India
Key Features of the show:
- Topics were not comfortable for discussion in Indian society
- 90 mins show, telecasted b/n 11 AM to 12 PM in morning (not a prime time these days)
- Contained very thorough statistics and research about the topics
- Simultaneously telecasted in multiple languages and had unbelievable reach in India
Why analytics was necessary?
- Understand the pulse of audience, measure the impact of the show.
- Have audience been touched? How to measure the success?
- Goal was not to revolutionize the society but to change the perception of individual
Data Analytics helped the show close the loop with audience?
- Understand how sentiments of people came off. Ex.: 1st episode – 99% of people responded positively. Data was sent to Rajasthan CM & he took action To measure impact of the show on the society & analyze response with different demographics
Suparna Goswami Blog : Big Data & its Implications
http://scn.sap.com/people/suparna.goswami/blog/2012/11/12/big-data-and-its-implications
Siva Darivemula Blog : App to data or Data to App : Rise of the Data Platform
Thanks for reading this blog.
Suggestions are most welcome.
Soon I will update another blog advance level on Big Data.