1115 Words5 Pages

Abstract

The sinking of the RMS Titanic caused the death of thousands of passengers and crew is one of the deadliest maritime disasters in history. One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there were some elements of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class. The objective is to apply different machine learning models to complete the analysis of what sorts of people were likely to survive. The result of applying machine learning algorithms are compared and analysed on the basis of accuracy.

Keywords- Titanic, Logistic Regression, Random*…show more content…*

The iceberg collision ripped open Titanic’s hull in several places. Titanic carried thousands of people of all ages, genders and class that fateful night, and only a few hundred escaped in lifeboats and rest died in the icy water. The dead included a large number of men whose place was given to the many women and children on board. The dead primarily consisted of men in the ship’s second class.

Machine learning techniques are applied to predict which passengers survived the sinking of the Titanic. Features like ticket fare, age, sex, class will be used to make the predictions. Predictive analysis is a procedure that incorporates the use of computational methods to determine important and useful patterns in large data. Using the machine learning algorithms, survival is predicted on different combinations of features.

The objective is to perform exploratory data analytics to mine various information in the dataset available at kaggle and to know effect of each field on survival of passengers by applying analytics between every field of dataset with “Survival” field. The prediction the output for newer data sets by applying machine learning algorithm is done. The data analysis will be done on applied algorithms and accuracy will be checked. The different algorithms are compared on the basis of accuracy and the best performing model is suggested with respect to used dataset.

Data*…show more content…*

As the name suggest, this algorithm creates the forest with a number of trees. The higher the number of trees in the forest gives the higher accuracy results. Random forest algorithm can be used for both classification and regression problems. For instance, it will take random samples of 100 observation and 5 randomly chosen initial variables to build a model. It will repeat the process (say) 10 times and then make a final prediction on each observation. Final prediction is a function (mean) of each prediction

Decision Tree

Decision tree is a type of supervised learning algorithm which is generally used in classification problems. It is suitable for both categorical and continuous input and output variables. Each root node represents a single input variable (x) and a split point on that variable. The leaf nodes of the tree contain an output variable (y) which is used to make a prediction. For example: Given a dataset with two inputs (x) of height in centimetres and weight in kilograms, the output of sex as male or female (hypothetical example, for demonstration purpose only.) There are two types of decision tree based on the type of target

The sinking of the RMS Titanic caused the death of thousands of passengers and crew is one of the deadliest maritime disasters in history. One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there were some elements of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class. The objective is to apply different machine learning models to complete the analysis of what sorts of people were likely to survive. The result of applying machine learning algorithms are compared and analysed on the basis of accuracy.

Keywords- Titanic, Logistic Regression, Random

The iceberg collision ripped open Titanic’s hull in several places. Titanic carried thousands of people of all ages, genders and class that fateful night, and only a few hundred escaped in lifeboats and rest died in the icy water. The dead included a large number of men whose place was given to the many women and children on board. The dead primarily consisted of men in the ship’s second class.

Machine learning techniques are applied to predict which passengers survived the sinking of the Titanic. Features like ticket fare, age, sex, class will be used to make the predictions. Predictive analysis is a procedure that incorporates the use of computational methods to determine important and useful patterns in large data. Using the machine learning algorithms, survival is predicted on different combinations of features.

The objective is to perform exploratory data analytics to mine various information in the dataset available at kaggle and to know effect of each field on survival of passengers by applying analytics between every field of dataset with “Survival” field. The prediction the output for newer data sets by applying machine learning algorithm is done. The data analysis will be done on applied algorithms and accuracy will be checked. The different algorithms are compared on the basis of accuracy and the best performing model is suggested with respect to used dataset.

Data

As the name suggest, this algorithm creates the forest with a number of trees. The higher the number of trees in the forest gives the higher accuracy results. Random forest algorithm can be used for both classification and regression problems. For instance, it will take random samples of 100 observation and 5 randomly chosen initial variables to build a model. It will repeat the process (say) 10 times and then make a final prediction on each observation. Final prediction is a function (mean) of each prediction

Decision Tree

Decision tree is a type of supervised learning algorithm which is generally used in classification problems. It is suitable for both categorical and continuous input and output variables. Each root node represents a single input variable (x) and a split point on that variable. The leaf nodes of the tree contain an output variable (y) which is used to make a prediction. For example: Given a dataset with two inputs (x) of height in centimetres and weight in kilograms, the output of sex as male or female (hypothetical example, for demonstration purpose only.) There are two types of decision tree based on the type of target

Related

## Traveling Salesman Research Paper

957 Words | 4 PagesThe traveling salesman problem and genetic algorithms From Class I learned that genetic algorithms are search and optimization methods inspired by the evolution and genetic basis that it implies. For the use of an algorithm a set of possible solutions is generated (we will name each of these solutions "individuals")and our problem (called population), this population is mutated and recombined by random actions, as in evolution, they also undergo an assessment to decide which are the most suitable and separate them from the rest, which will be discarded. Throughout this paper I will try to explain how the problem of The Traveling Salesman can be solved by using a Genetic Algorithm (GA). The Traveling Salesman Problem (TSP) is easy to understand,

## Data Mining Lab Report

3531 Words | 15 PagesMETA-LEVEL INFORMATION EXTRACTION FROM TEXT DOCUMENTS USING GA-BASED COATES AND COLT ALGORITHM Kanmani. K Department of Computer Science and Engineering M.I.E.T Engineering College Trichy, India Rama. B Department of Computer Science and Engineering M.I.E.T Engineering College Trichy, India Abstract— Information Retrieval (IR) system identifies the pages in the collection of documents which matches the user query. IR system allows us to narrow down the set of documents that are relevant to the particular text mining problem. Then perform text documents clustering based on Genetic Algorithm (GA) with mutation rate.

## Global Optimization Assignment

705 Words | 3 PagesIt is a part of optimization algorithm which uses the information gathered by the algorithm to decide which solution candidate to be tested next or how to reach the next solution candidate.It is a local search. Heuristic are problem dependent. Heuristics can be used by both deterministic and probabilistic algorithms. \item \textbf{Meta-heuristics:} It is a Heuristic method for solving general class of problem. It is generalized local search.

## Prediction Model In Tennis

984 Words | 4 PagesIt uses a supervised learning method called backpropagation for training the neural network. Support vector machine (SVM) [12] is a supervised learning model that can analyze data and recognize patterns used for classification and regression analysis. The decision tree [13] [14] learning approach considers decision tree as a prediction model that is a tree with internal nodes as each decision and leaf nodes as the result of the decisions made. This approach can be used to analyze our result as each path from the root to leaf node represents a solution for our

## Gaussian Mixture Model Analysis

1069 Words | 5 PagesDiscrete Cosine Transform algorithm is used to make a combined feature vector by extracting independent feature vectors from every spatial image. These fused feature vectors contain nonlinear information that is used to train a Gaussian Mixture Model based statistical model. The models provide correct assessment of the class conditional probability density function of the fused feature vector. Method produce recognition rates as high as 97% and 99.7% when test on standard databases- FERET-PolyU and ORL-PolyU correspondingly. These rates are achieved using 23% low down frequency DCT coefficients.

## Cart Inverted Pendulum Case Study

713 Words | 3 Pages To compare the simulation performance of the PID controllers, LQR, Fuzzy controllers on the cart-inverted pendulum. To apply the developed controllers on a real cart-inverted pendulum To use the developed controllers in a real time application 1.4 Methodology This is the step by step approach taken in order to realize the objective of this project. The approach is as follows. • Background reading of existing work related to the control of nonlinear systems- It entails looking at what other people have done in regards to the control of nonlinear systems and analyzing their findings. This will help me to narrow to some few controllers with better results that be used in a real time application.

## Methodology Research Methodology

1786 Words | 8 Pages3.0 METHODOLOGY 3.1 OVERVIEW Methodology is a systematic and a well-planned methods that need to be applied to a field of study where the data are important to be collected for analysis. There are various methods in collecting information where the data need to be gathered. It can be done by searching web pages, technical papers, reports, conferences proceedings, project focus groups, conducting interviews with experienced personnel or distributing questionnaire. 3.2 SURVEY In this research, the researcher will collect all the data needed by submitting the survey on passenger experience on baggage handling system. A survey is likely the most efficient way to get the data and feedback from them.

## Circle Packing Problem Case Study

2032 Words | 9 Pages[245] presented a Tabu Search and Variable Neighborhood Descent (TS-VND), which is an adaptation of the tabu search procedure of ITS algorithms [424]. In 2013, Lopez and Beasley [2126] vieweded this problem as being one of scaling the radii of the unequal circles so that they could all be packed into the fixed-size container. Their algorithm was composed of an optimization phase and an improvement phase. The optimization phase was based on the formulation and they improved

## Information Diffusion Model

853 Words | 4 PagesExperiments were conducted and results were verified on both synthetic and real world networks. In a related paper published by Xuanyu et al [6], the authors also study two different models of information diffusion. Based on whether the users have in depth knowledge of other users in the network, the diffusion process may be type dependent or type independent. Garg, S. and Kumar, S [7] used the SI model to study the information flow. A number of infected nodes of a social network in NetLogo environment is taken and simulated for a different size and different average degree network taking input 100 times which helps in correction of data and improves the accuracy of the

## Advantages Of Exchange Market Algorithm

1074 Words | 5 PagesEspecially EMA has two searcher and two absorbent operators for individuals to be absorbed by the selected person, which leads to creation and organization of random numbers in the best way. Obtaining answer earlier, search area selectivity and in turn the widespread optimization range, convergence to the identical solutions in each program iteration, and high performance in the global optimum finding are some good points of EMA [31]. GA is an adaptive search technique which simulates an evolutionary process like it is seen in nature based on the ideas of the selection of the mutation, fittest and crossing. GA follows the principles of Darwin's theory to find the solutions of a problem [32, 33]. The high adaptability and the generalizing feature of GA help to execute these problems by a noncomplex formation.

### Traveling Salesman Research Paper

957 Words | 4 Pages### Data Mining Lab Report

3531 Words | 15 Pages### Global Optimization Assignment

705 Words | 3 Pages### Prediction Model In Tennis

984 Words | 4 Pages### Gaussian Mixture Model Analysis

1069 Words | 5 Pages### Cart Inverted Pendulum Case Study

713 Words | 3 Pages### Methodology Research Methodology

1786 Words | 8 Pages### Circle Packing Problem Case Study

2032 Words | 9 Pages### Information Diffusion Model

853 Words | 4 Pages### Advantages Of Exchange Market Algorithm

1074 Words | 5 Pages