Quality of the Arrays - MOR & MIRKAM:
Reliability, Validity and Fairness
The quality of any test is indicated by three measures: reliability, validity, and fairness.
The reliability measure indicates the extent to which an individual examinee’s test scores will remain constant if the examinee retakes the test in a different version, on a different date and so on.
The validity measure indicates the extent to which the test scores correspond to the level of knowledge or ability which they are supposed to test, and the extent to which the decisions reached on the basis of said scores are justified.
The fairness measure reflects the extent to which the test is administered to all examinees under uniform conditions, and does not discriminate in favor of or against any sector of the population.
The quality of the MOR and MIRKAM systems is assessed on an ongoing basis. NITE makes every effort to ensure that the evaluations performed by these systems are reliable, valid and fair, bearing in mind the inherent difficulty of measuring cognitive variables.
Statistical checks test the internal reliability of the systems by comparing the reliability of assessors and comparing tests with retests. Over the years the results of these checks testify that the reliability of the MOR and MIRKAM systems meet accepted standards from professional literature and are comparable to similar screening systems in other countries.
Testing the predictive validity of the assessment center is complex relative to other assessment tools, since the criterion for testing their validity is the actual practical performance in the profession which successful candidates will be studying for. The level of professional performance is difficult to test for a number of reasons. Firstly, it is difficult to define what is meant by good performance in the profession: the question of who is a good doctor, for example, is liable to be answered differently by hospitals, by colleagues on the medical staff, and by patients, and each of these answers is significant in its own right. Secondly, objective quantitative assessment is impossible because every doctor deals with unique cases which cannot be compared with those of another doctor. Finally, the more competent a doctor is, the more complex and challenging are the cases to which that doctor is assigned.
In spite of these difficulties, over the years scholarly data has accumulated regarding the validity of the assessment centers. Research conducted in Israel and abroad on screening systems identical or similar to MOR and MIRKAM has shown positive correlation between the scores in these systems and behavioral measures tested in clinical practice at medical school, internships and accreditation tests.
Several aspects of the fairness of the system are tested. One of these is uniformity. In the MOR and MIRKAM systems all candidates are tested at the same types of testing station and with the same biographical questionnaire. Both tests are conducted in an almost identical fashion, and professionals who have undergone the same training assess the candidates by means of standard structured assessment forms. This degree of uniformity did not exist in a number of the screening tools used in the past, such as an extended interview before a board. Another aspect of the fairness of the system is the number of independent observations: in the MOR and MIRKAM systems decisions are not made on the basis of a single opinion. Furthermore, experts examine all the elements of the test and the MOR and MIRKAM assessment centers in order to ensure that the ability to perform in them does not depend on the candidates’ gender, origin, cultural background or similar characteristics.
Gafni, N., Moshinsky, A., Kapitulnik, J. (2003). A standardized open-ended questionnaire as a substitute for a personal interview in dental admissions. Journal of Dental Education, 67(3), 348-353.
Ziv, A., Rubin, O., Moshinsky, A., Gafni, N., Kotler, M., Dagan, Y., … Mittelman, M. (2008). MOR: a simulation‐based assessment centre for evaluating the personal and interpersonal qualities of medical school candidates. Medical Education, 42(10), 991-998.
Gafni, N., Moshinsky, A., Eisenberg, O., Zeigler, D., Ziv, A. (2012). Reliability estimates: behavioural stations and questionnaires in medical school admissions. Medical education, 46(3), 277-288.
Eva, K. W., Reiter, H. I., Rosenfeld, J., Trinh, K., Wood, T. J., Norman, G. R. (2012). Association between a medical school admission process using the multiple mini-interview and national licensing examination scores. Jama, 308(21), 2233-2240.
Hadad, A., Gafni, N., Moshinsky, A., Turvall, E., Ziv, A., Israeli, A. (2016). The multiple mini-interviews as a predictor of peer evaluations during clinical training in medical school. Medical teacher, 38(11), 1172-1179.
Moshinsky, A., Ziegler, D., Gafni, N. (2017). Multiple Mini-Interviews in the Age of the Internet: Does Preparation Help Applicants to Medical School? International Journal of Testing, 17(3), 253-268.