 # The Chi-Square Test

Good Essays
Chi-Square Test
Chi-square test is a statistical test generally used to compare observed data with expected data based on a specific hypothesis known as null hypothesis. The Chi-square test test, what are the chances that an observed distribution is due to chance? It is also known as goodness of fit statistic, as it determines how fine the observed distribution of data fits with expected distribution when assuming the variables are independent. It is used for categorical data.
Null Hypothesis
Null hypothesis is that the variables are independent. If the observed distribution data doesn't fit the expected distribution model, the probability that the variables are dependent tends to be accurate, as a consequence proving the null hypothesis false.
Degree of freedom is used to determine the significance level can be explained as the quantity of scores that are free to fluctuate. For example, three dice are tossed. The total score of three dices is 12. If a 2 is rolled on one die with 4 on the second, then the third die have to be 6 (if not, the sum would not be 12). In this example, two dies are open to differ whereas the third die is not. For that reason, degree of freedom is equal to 2.
For data with one category:
Degree of freedom = number of observations -1
For data with more than one category represented in a table:
Degrees of freedom = (number of rows in the table) – 1 X (number of columns in the table) -1
The chi-square test is used to determine whether two variables are independent or not. If two variables are dependent on each other, their values have a tendency to progress together, either in the opposite direction or in the same.
Example
Consider a data set of 100 individuals divided into categories of Male, Female and university admission (Yes/No). A Chi-square test is applied to know the chances that the gender and admission in university are completely independent variables.
The table should incorporate "marginal" information providing the sum for every column and row, plus for the entire data