The Chi-Square Test

882 Words4 Pages
Chi-Square Test
Chi-square test is a statistical test generally used to compare observed data with expected data based on a specific hypothesis known as null hypothesis. The Chi-square test test, what are the chances that an observed distribution is due to chance? It is also known as goodness of fit statistic, as it determines how fine the observed distribution of data fits with expected distribution when assuming the variables are independent. It is used for categorical data.
Null Hypothesis
Null hypothesis is that the variables are independent. If the observed distribution data doesn't fit the expected distribution model, the probability that the variables are dependent tends to be accurate, as a consequence proving the null hypothesis false.
…show more content…
Degree of freedom is used to determine the significance level can be explained as the quantity of scores that are free to fluctuate. For example, three dice are tossed. The total score of three dices is 12. If a 2 is rolled on one die with 4 on the second, then the third die have to be 6 (if not, the sum would not be 12). In this example, two dies are open to differ whereas the third die is not. For that reason, degree of freedom is equal to 2.
For data with one category:
Degree of freedom = number of observations -1
For data with more than one category represented in a table:
Degrees of freedom = (number of rows in the table) – 1 X (number of columns in the table) -1
The chi-square test is used to determine whether two variables are independent or not. If two variables are dependent on each other, their values have a tendency to progress together, either in the opposite direction or in the same.
Consider a data set of 100 individuals divided into categories of Male, Female and university admission (Yes/No). A Chi-square test is applied to know the chances that the gender and admission in university are completely independent variables.
The table should incorporate "marginal" information providing the sum for every column and row, plus for the entire data
…show more content…
Calculate difference between the expected value and observed value (also known as residual). The square of number is used to avoid negative values. Divide this answer by the expected value in order to normalize. Evaluate this for each cell in the table, and after that take sum of all values.
X2 =((20-25)2/25) + ((30-25)2/25) + ((30-25)2/25) + ((20-25)2/25)
=(25/25) + (25/25) + (25/25) + (25/25)
=1 + 1 + 1 + 1
Chi-square value =4
The Chi-square value and degrees of freedom is used to obtain p-value. Charts are used to get your p-value as shown below.
In the chart, select degrees of freedom (DF) value from left, and locate the number closest to Chi-square value, and after that see the corresponding number in the top row of probability. This provides with approximate probability or Significance level (p-value) for that Chi-square test. The p-value is 0.05. 0.05 is critical value. If the calculated chi-square value is greater than the value at 0.05, we accept the hypothesis. Otherwise we reject the hypothesis. There’s actually not much we can conclude as it is floating on the threshold of
Open Document