1.1. DATA MINING Data mining refers to extracting or mining knowledge from large amounts of data. Data mining has attracted a great deal of attention in the information industry and in society as a whole in recent years, due to the wide availability of huge amounts of data and the forthcoming need for turning such data into useful information and knowledge. The information and knowledge gained can be used for applications ranging from market analysis, fraud detection, and customer retention, to production control and science exploration. Data mining can be viewed as a result of the natural evolution of information technology.
Data mining is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data preprocessing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating. B.2 Introduction The growing popularity and development of data mining technologies bring serious threat to the security of individual's
Data Science vs Statistics Data science is one of the rapidly emerging trends in computing and is a vast multi-disciplinary area. Data science combines the application of subjects namely computer science, software engineering, mathematics and statistics, programming, economics, and business management. Data science is based on the collection, preparation, analysis, management, visualization and storage of large volumes of information. Data science in simple terms can be understood as having strong connections with databases including big data and computer science. A data scientist is an individual with adequate domain knowledge relevant to the question addressed.
On the other hand we can also see all the good technology can do. Big Data There are many different definitions for Big Data. SAS (n.d.) an analytical software company describes it as, “a popular term used to describe the exponential growth and availability of data, both structured and unstructured.” Many think Big Data just came into existence but it has been around for years. Banks, retail, advertisers have been using big data for marketing purposes. Tracking consumers’ habits in many different aspects of their life has allowed them to gear specific products in a specific manner.
• High marketing and communication costs. • There are cities in where they are not present yet (like Montrose). Opportunities • Highly scalable model that gives the opportunity to grow across different countries. • Large market that is continuously growing. • Potential increase in-market and out-of-market M&A.
The extensive use of information and communication technology has generated large volumes of data storage. The data repositories might contain massive amount of useful information. In order to extract useful knowledge from these data repositories for making better decision, necessitate the need for proper methods of extracting knowledge. Machine learning is an important technique which extracts necessary knodledge and information such as association, patterns, changes and anomalies from various data repositories (Barka et al., 2010). The idea of machine learning is something resulting from this environment.
Web data contents include text, image, audio, video, metadata and hyperlinks. In short, Web content mining is the process of extracting knowledge from web contents. Web content mining deals directly with information. The goal is to mine content from web documents in order to build knowledge from it. This knowledge can be either latent or somehow simply difficult to be analyzed in a straightforward way.
A. Group Assignment a. Discuss the two data mining methodologies The process of going through massive sets of data looking out for unsuspected patterns which can provide us with advantageous information is known as data mining. With data mining, it is more than possible or helping us predict future events or even group populations of people into similar characteristics. Cross Industry Standard Process for Data Mining (CRISP-DM) is a 6-phase model of the entire data mining process which is commonly used across industries for a wide array of data mining projects and provides a structured approach to planning a data mining project.
The ease at which people can share knowledge, information and opinions online growth resulted in the abundance of information. The abundance of information results in information overload and scarcity of attention (Simon, 1971). Attention is focused mental engagement on a particular item of information. Items come into our awareness, we attend to a particular item, and then we decide whether to act (Davenport, 2002). Hebert A. Simon (1971) was perhaps the first person to articulate the concept of attention when he wrote: "... in an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes.
Knowledge discovery also known as data mining is the processes involve penetration into tremendous amount of data with the support from computer and web technology for examining the data. Data mining is a process of discovering interesting knowledge by extracting or mining the data fromlarge amount of data and the process of finding correlations or patterns among dozens of fields in large relational databases [3, 4]. Privacy Preserving in Data Publishing (PPDP) is very important in data mining when publishing individual information on web . The improvements are toward producing more effective methods that preserve the privacy and also reduces information loss to the researchers. There are also researches related to improvements of the algorithm that avoids some attacks on data.