Reliability, Validity and Fairness
The quality of any test is indicated by three measures: reliability, validity, and fairness. The reliability measure indicates the extent to which an individual examinee’s test scores will not vary if the examinee retakes the test in a different version, at a different date and so on. The validity measure indicates the extent to which the test scores correspond to the level of knowledge or ability which they are supposed to test, and the extent to which the decisions reached on the basis of said scores are justified. A test is considered fair if it is administered to all examinees under uniform conditions as far as possible, and if it does not discriminate in favor of or against any sector of the population.
NITE makes every effort to ensure that the test will be reliable, valid, and fair. The quality of the Psychometric Test is assessed on an ongoing basis. The levels of reliability and validity found in these assessments reveal that the quality of the test is very high, and comparable to that of similar tests around the world. The reliability coefficient of the test is 0.95 (a value of 1 indicates perfect reliability, and a value of 0.80 is considered to reflect acceptable reliability for a high-stakes test). The most accepted measure for assessing the predictive validity of a test, indicating the connection between the test score and academic achievements, is the Pearson correlation coefficient. This is on a scale from -1.00 to +1.00. From 2005 to 2010, the composite (“Sekhem”) score, which is based on the Psychometric Test and high-school matriculation average, shows a correlation of 0.46 in predicting achievements in the first year of undergraduate programs across different university faculties. This value is considered a high correlation coefficient, and is comparable with acceptable values in other test batteries around the world. More information can be found in the article summarizing findings concerning the reliability and validity of the test.
One of the Psychometric Test’s chief advantages is that all examinees are tested in a uniform fashion. This uniformity is featured in the test’s structure and content, the administration conditions and the way in which the score is calculated.
The structure and content of the Psychometric test are uniform and constant in every version of the test and at every test date, except for minor changes made for non-Hebrew-speaking examinees or those entitled to special test accommodations. This uniformity is featured in the domains and skills assessed by the test, the types of question, how many questions of each type appear in the test, and the distribution of difficulty levels.
NITE makes every effort to administer the test under uniform conditions at every test date and in every location. Therefore all proctors read out directions from the same text, and all examinees are given the same time period to answer the questions in each chapter of the test (except for non-Hebrew-speakers and examinees entitled to special test accommodations). NITE handles individually the rare cases where it turns out that the conditions for one or more examinees did not meet the uniformity requirements.
In order to compare the ability of each examinee in the Psychometric test with all other examinees, their scores are calculated relative to the scores of every examinee since the first test administered, as opposed to those of the examinees taking the test on the same date. In other words, an examinee who achieved a particular score in July 2010 would achieve exactly the same score if he or she took the test in December 2010 and gave the same answers.
Every population contains groups who are more successful in some respects and less successful in others. For example, it is well-established that, on the whole, girls do better than boys at expressing themselves in writing, and boys do better than girls in mathematics. So the fact that group A has a higher average score in a particular test than group B does not necessarily mean that the test is unfair and discriminates in favor of group A.
In order to assess the fairness of a test, we need to examine whether members of different groups with the same score in the test have, on average, similar academic achievements. If the answer is in the affirmative, the test is fair. One way to answer this question is to check, using a common predictive method for all groups (when predicting academic achievements from the score in the Psychometric Test), whether the predictions for the different groups’ achievements match their actual academic achievements. The purpose of the Psychometric Test is to predict the examinees’ chances of success in academic studies, and so the assessment of the Psychometric Test’s fairness must be based on achievements in academic studies. In order to ensure that the Psychometric Test does not discriminate between sectors of the population in the future, NITE’s question-writing staff checks every question and confirms that the ability to answer is not dependent on the examinees’ gender, ethnicity, cultural background, etc.
In order to confirm that the Psychometric Tests already administered did not discriminate between sectors of the population, NITE conducts ongoing fairness studies using various approaches and advanced statistical methods to assess to what extent each test treated different sectors of the population fairly. These studies assess fairness with respect to gender (male/female), ethnic origin and affiliation (Jews, Arabs, new immigrants), age (young/old), special needs, and more.
The results of the fairness studies can be read in the reports listed below.