Literature Review On Big Data

1891 Words8 Pages
The term ’Big Data’ appeared for first time in 1998 in a Silicon Graphics (SGI) slide deck by John Mashey with the title of ”Big Data and the Next Wave of InfraStress” [2]. Big Data mining was very relevant from the beginning, as the first book mentioning ’Big Data’ is a data mining book that appeared also in 1998 by Weiss and Indrukya [3] . However, the first academic paper with the words ’Big Data’ in the title appeared a bit later in 2000 in a paper by Diebold [4]. The origin of the term ’Big Data’ is due to the fact that we are creating a huge amount of data every day. Usama Fayyad [5] in his invited talk at the KDD BigMine’12 Workshop presented amazing data numbers about internet usage, among them the following: each
…show more content…
The Big Data mining revolution is not restricted to the industrialized world, as mobiles are spreading in developing countries as well. It is estimated than there are over five billion mobile phones, and that 80% are located in developing countries.

In 2001, Gartner had introduced the 3V’s definition of data growth that was in the inception stages. 3V signified the three dimensions of – Volume, Velocity and Variety. Now, the industry recognizes 5V, adding Veracity and Value as the additional aspects of data. While volume, velocity and variety are spices that the world has already tasted and still continue to be a major consideration in the data domain, veracity and value are aspects of the modernized data that throw up major challenges. Increasing channels or sources of data such as social media has made users a major part of data contribution and consumption. This is a boon and a bane at the same time. While it opens up a huge window into understanding consumers, there is a massive amount of junk that is created at various levels.
…show more content…
In terms of sources, formats, modes and feeds – data influx happens in all shapes and sizes. Analytic tools therefore need to be smart enough to decipher all the diverse natures of data, assimilate them with advanced algorithm development, optimization and automation to bring it on a uniform, consumable format.
Data governance and security: Increase in mobility and access to information has led to massive discussions around data governance, protection and security. Industries such as banking, healthcare, pharma, and defense are under strict compliance and regulatory mandates that make it a tough job to create a proper data protection framework. It is not enough to have an IT infrastructure and security in place. Data governance has taken primary importance in these sectors where opportunity is big in Big Data, but risks can be huge.
Infrastructure and system architecture: While the advanced technologies of Hadoop and MapReduce are scaled to meet the 5Vs of big data, they assert significant demands on infrastructure in terms of scale, storage capacities that are efficient and cost effective. Intelligent storage capacities can leverage through data compression, automatic data tearing and data deduplication. The question is how much is needed to implement Big Data and how much is
Open Document