|Year : 2016 | Volume
| Issue : 3 | Page : 183-186
Item analysis of multiple choice questions of undergraduate pharmacology examinations in an International Medical School in India
Yeshwanth Rao Karkal, Ganesh Shenoy Kundapur
Department of Pharmacology, Melaka Manipal Medical College (MMMC), Manipal University, Manipal, Karnataka, India
|Date of Web Publication||10-Oct-2016|
Yeshwanth Rao Karkal
Department of Pharmacology, Melaka Manipal Medical College (MMMC), Manipal University, Manipal - 576 104, Karnataka
Source of Support: None, Conflict of Interest: None
Background: Item analysis is widely used to improve test quality by observing the characteristics of a particular item and this can hence be used to ensure that questions are of an appropriate standard for inclusion in a test. Hence, this study to evaluate the multiple choice questions of an undergraduate pharmacology program.
Materials and Methods: A total of 488 items were randomly selected and subjected to item analysis. Facility value (FV) and discrimination index (DI) were calculated by applying the appropriate formulae with the help of MS Excel.
Results: The overall mean FV (difficulty index) and DI was 56.64% (±2.36) (mean range: 23.89-71.25%) and 0.22 (±0.84) (mean range: 0.16-0.44), respectively. 71.09% of the items analyzed were found to be “good/optimal” items based on the FV (14.13% — optimal, 56.96% — good) and 36.26% of the items analyzed were found be “very/reasonably” good items based on the DI (20.49% — very good, 15.77% — reasonably good). The number of “poor” items was 22.95% based on the FV and 18.23% based on the DI. When both the parameters were considered together, only 23% of the items were found to be “good” and 17.11% were found to be “poor.” Pearson correlation between the two indices showed a negative correlation (but statistically insignificant) between these two indices (r = −0.001379, P= 0.9774).
Conclusion: Item analysis when regularly incorporated can help to develop a very useful, valid and a reliable question bank.
Keywords: Assessment, item analysis, multiple choice questions
|How to cite this article:|
Karkal YR, Kundapur GS. Item analysis of multiple choice questions of undergraduate pharmacology examinations in an International Medical School in India. J NTR Univ Health Sci 2016;5:183-6
|How to cite this URL:|
Karkal YR, Kundapur GS. Item analysis of multiple choice questions of undergraduate pharmacology examinations in an International Medical School in India. J NTR Univ Health Sci [serial online] 2016 [cited 2019 Nov 15];5:183-6. Available from: http://www.jdrntruhs.org/text.asp?2016/5/3/183/191842
| Introduction|| |
Making fair and systematic evaluations of other's performance can be a challenging task. Judgments cannot be made solely on the basis of intuition, haphazard guessing, or custom. Teachers, employers, and others in evaluative positions use a variety of tools to assist them in their evaluations. Tests are tools that are frequently used to facilitate the evaluation process. Developing the perfect test is the unattainable goal for anyone in an evaluative position. Even when guidelines for constructing fair and systematic tests are followed, a plethora of factors may enter into a student's perception of the test items. Looking at an item's difficulty and discrimination will assist the test developer in determining what is wrong with individual items. Item and test analysis provide empirical data about how individual items and whole tests are performing in real test situations.
Item analysis “investigates the performance of items considered individually either in relation to some external criterion or in relation to the remaining items on the test.” These analyses evaluate the quality of items and of the test as a whole. Such analyses can also be employed to revise and improve both items and the test as a whole.
Item difficulty is simply the percentage of students taking the test who answered the item correctly. The larger the percentage getting an item right, the easier item. The higher difficulty index, the easier item is understood to be. Item difficulty has a profound effect on both variability of test scores and the precision with which test scores discriminate among different groups of examinees.
When all of the test items are extremely difficult, the great majority of the test scores will be very low. When all items are extremely easy, most test scores will be extremely high. In either case, test scores will show very little variability. An item that everyone gets correct or that everyone gets incorrect will have a discrimination index (DI) = 0.
Item analysis is a valuable yet relatively simple procedure, performed after the examination, that provides information regarding the reliability and validity of a test item. Hence, this study to do an item analysis of multiple choice questions (MCQs) of formative/summative examinations in pharmacology.
| Materials and Methods|| |
Assessment by MCQs is the main mode of evaluation in the institution where this study was carried out (KMC-IC, Manipal institution). The MCQs are predominantly of the “single best response” type with five options. Every MCQ has three components — the stem, key and the distractor. 450 MCQs were randomly picked from the summative (university examination) and the formative examination (sessional exams). All the MCQs had five distractors. All these 488 items from nine exams (seven-formative, two-summative) were subjected to “item analysis” with the help of MS Excel program separately after every sessional/university examination.
The first step in item analysis is to arrange the valuated items in rank order, with students scoring highest marks at the top. The next step is to break this distribution in two groups, that is, higher ability group (HAG) and lower ability group (LAG). In our study, since the number of students in every examination were <50, the HAG and the LAG groups were equally distributed. Now, for each question, the number of students ticking options, a, b, c, or d were counted in each these two groups. Once this is done for all the items of an examination, we proceed further to relate the indices related to each that is, facility value (FV) (also known as difficulty index) and DI. FV means, number in the group answering a question right that is, for e.g., if 40% of the group answers the question correctly, then FV will be 40%. FV can be calculated by the formula: HAG + LAG/N × 100. It is a measure of how easy or how difficult a question is. Hence, it is also known as “difficulty index.” Higher the FV, easier is the question. Similarly, DI indicates the ability of the question to discriminate between a higher and a lower ability student. This is calculated from the formula: 2 × (HAG-LAG)/N, where N = total number of items. Unlike FV, DI is expressed as a fraction. The maximum value for DI is 1.0, which indicates an ideal question with perfect discrimination between HAG and LAG. Negative discrimination is when the value is <0 that is, this means more LAG students are answering the question right as compared to HAG students. Such items were not observed in this study.
After doing the item analysis for all the 488 items, the various items were then classified based on the:
- FV (into optimal items [FV = 50%], good items [FV = 30-70%], poor items [FV: >70% or <30%]),
- DI (into very good items [DI > 0.4], reasonably good items [DI: 0.3-0.39], marginal items [DI: 0.2-0.29], poor [DI < 0.19] and incorrect/negative items [DI < 0]),
- FV and DI (into good items [FV: 30-70%, DI > 0.3] and poor items [FV: 30%/>70%, DI < 0.19]).
| Results|| |
A total of nine examinations (seven – formative, two — summative) comprising of an average of 54.2 questions per test (total questions N = 488, single best MCQs) were used to analyze the items.
The overall mean FV (difficulty index) and DI was 56.64% (±2.36) (mean range 23.89-71.25%) and 0.22 (±0.84) (mean range 0.16-0.44), respectively. 71.09% of the items analyzed were found to be “good/optimal” items based on the FV (14.13% — optimal, 56.96% — good) [Table 1] and 36.26% of the items analyzed were found be “very good/reasonably good” items based on the DI (20.49% — very good, 15.77% — reasonably good) [Table 2]. The number of “poor” items was 22.95% based on the FV and 18.23% based on the DI. When both parameters were considered together, only 23% of the items were found to be “good” and 17.11% were found to be “poor” [Table 3].
Pearson correlation between difficulty (FV) and discrimination indices showed a negative correlation, but without statistical significance (r = −0.001379, P = 0.9774).
| Discussion|| |
This study was undertaken to analyze the quality of the MCQs used in pharmacology examinations of our institute by doing an item analysis. 71.09% of the items were found to be “good” based on the FV. This could be correlated to the primary assessment tool used in our institute that is, MCQ based. There is no essay type questions as the students are trained for USMLE which is MCQ based. Hence, the faculty spends a significant amount of time in preparing the questions. Around 28.89% of the items were poor items based on the FV that is, either they were very easy or very hard items. One advantage of easy items is that they can be placed at the start of the test as “warm up” questions. On the other hand, the difficult items should be reviewed for possible confusing language, areas of controversy, or even an incorrect key. Inclusion of very difficult items in the test depends upon the target of the teacher, who may want to include them to identify top scorers.
There could be several reasons for the construction of poor items that is, most of the faculty were teaching or were trained in Indian set up where the MCQ component is either nil or very small, or it could be that faculty are not trained in the construction of MCQs. Considine et al. feel that despite their widespread use, there is a lack of evidence-based guidelines relating to design and use of MCQs. Maybe our institution must come up with clear cut and uniform guidelines for the development of MCQ test items. The quality of items written for in-house examinations in medical schools remains a cause of concern. Several faculty development programs are aimed at improving faculty's item writing skills. This study emphasizes that items written by faculty without faculty development are generally lacking in quality. It also provides evidence of the value of faculty development in improving the quality of items generated by faculty. Item analysis results can be used to identify and remove nonfunctioning distractors from MCQs that have been used in previous tests. Similarly, based on the DI, 36.26% of the items analyzed were found to good items that is, they are able to differentiate properly between the good and the poor students. Hence, these items can be archived in the question bank and may not need modifications. However, the rest of the 64% items require discussion and modification.
The difficulty and discrimination indices are reciprocally related. This has been confirmed in our study as well by the negative correlation coefficient. We hypothesize that this also could be the reason in our study, when both FV and DI were considered, only 23% of the items were found to be good. However, in spite of a negative correlation coefficient (−0.001379), the two indices are not statistically significant in our study. A much bigger sample running into thousands of items may be required in the future to get accurate information regarding this statistical aspect. 9.83% of the total items were incorrect based on the DI. Flawed items must either be rectified by the faculty or deleted from the question bank. Tarrant and Ware demonstrated that flawed MCQ items affected the performance of high achieving students more than the borderline students. Assessment of MCQs by these indices highlights the importance of assessment tools for the benefit of both student and teacher.
Item analysis when regularly incorporated can help to develop a very useful, valid and a reliable question bank with 1000 of questions categorized into easy, difficult and ideal questions. Depending on the quality of students/batch or the examination (formative/summative/selection), the department can decide the proportion of questions to be selected from each category.
| Acknowledgment|| |
The author would sincerely like to thank Mr. Santhosh for his technical help in the calculation of the indices using MS Excel.
| References|| |
Sax G. Principles of Educational and Psychological Measurement and Evaluation. 3rd
ed. Belmont, CA: Wadsworth; 1989.
Matlock-Hetzel S. Basic Concepts in Item and Test Analysis. US: Texas A & M University; 1997.
Thompson B, Levitov JE. Using microcomputers to score and evaluate test items. Collegiate Microcomputer 1985;3:163-8.
Wood DA. Test Construction: Development and Interpretation of Achievement Tests. Columbus, OH: Charles E. Merrill Books, Inc.; 1960.
Thorndike RM, Cunningham GK, Thorndike RL, Hagen EP. Measurement and Evaluation in Psychology and Education. 5th
ed. New York: MacMillan; 1991.
Hingorjo MR, Jaleel F. Analysis of one-best MCQs: The difficulty index, discrimination index and distractor efficiency. J Pak Med Assoc 2012;62:142-7.
Ebel RL, Fresbie DA. Essentials of Educational Measurement. 5th
ed. New Jersey: Prentice-Hall; 1991.
Considine J, Botti M, Thomas S. Design, format, validity and reliability of multiple choice questions for use in nursing research and education. Collegian 2005;12:19-24.
Naeem N, van der Vleuten C, Alfaris EA. Faculty development on item writing substantially improves item quality. Adv Health Sci Educ Theory Pract 2012;17:369-76.
Tarrant M, Ware J, Mohammed AM. An assessment of functioning and non-functioning distractors in multiple-choice questions: A descriptive analysis. BMC Med Educ 2009;9:40.
Mitra NK, Nagaraja HS, Ponnudurai G, Judson JP. The levels of difficulty and discrimination indices in type a multiple choice questions of pre-clinical semester 1 multidisciplinary summative tests. Int e-J Sci Med Educ 2009;3:2-7.
Tarrant M, Ware J. Impact of item-writing flaws in multiple-choice questions on student achievement in high-stakes nursing assessments. Med Educ 2008;42:198-206.
Pelligrino J, Chudowsky N, Glaser R, editors. Knowing What Students Know: The Science and Design of Educational Measurement: Issues and Practice. Vol. 24. 2005; p. 3-13.
[Table 1], [Table 2], [Table 3]