Big Data Challenges

1723 Words7 Pages
CHALLENGES WITH BIG DATA MINING: A SURVAY Libina Rose Sebastian Dept. of Computer Science and Engineering St. Joseph College of Engineering and Technology Palai, Kerala, India libina.libu@gmail.com Mereen Thomas Dept. of Computer Science and Engineering St. Joseph College of Engineering and Technology Palai, Kerala, India mereen.thomas@gmail.com Abstract— Big data is a collection of dataset which are so large and complex. Data sets are growing day by day and sharing, transfer, capture, storage etc. are the main challenges in Big Data. Data mining discover patterns from large data set. Data mining with Big data is a complex task. Here HACE theorem is proposed which finds Complex and Evolving relationships among data. It finds the characteristics…show more content…
Each blind man will conclude independently that the elephant feels like a hose, a tree, a wall and a rope. But the actual situation is entirely different from this. When the elephant continuously growing and make changes, more complex problem will be occurred. Each blind man may have his own information sources that tell him about biased knowledge about the elephant. They collect the information from a particular region and the accuracy of information collected may be less. The blind men cannot collect information within a particular amount of time. According to the type and size of data, the requirement of time will vary. Aggregating heterogeneous information from different sources (blind man) helps to draw a best possible picture. Aggregation will be difficult when a large amount of data is there. Characteristics of such big data will be modeled out using HACE theorem. 4. FEATURES When we move to social networking sites like Facebook, data are of different forms like text, images, relations etc. One person will be connected to more than one person. These kinds of relationships can be represented as graphs. From these heterogeneous sources, the discovery and extraction of useful information will be complicated. Twitter also contains such information. Blogs and News sites are content based. So that large amount of contents will be there to store. It is difficult to store such huge information…show more content…
Managing and mining Big Data is a challenging as it is complex. HACE theorem suggests that the key characteristics of the Big Data are Heterogeneous and diverse data sources, Autonomous with distributed and decentralized control, and Complex and Evolving relationship. Data sets are generated from multiple autonomous sources. Big Data contains large volumes of data represented by Heterogeneous and diverse dimensionalities. Different information collectors have their own way for data recording. Nature of different applications also results in different representations. Being autonomous, data source is able to generate and collect information without centralized control. Data sets are generated from multiple autonomous sources. The autonomous sources may make complicated conditions like missing or uncertain data. The social networking sites like Twitter and Facebook have ties between nodes. These ties are represented using Sample feature relationship representation. These ties makes the Big data more

More about Big Data Challenges

Open Document