Introduction
Amongst the inherited blood disorders in Southeast Asia, thalassemia is the most common one. It is also considered as an emerging burden of health among the world population (1, 2). In this region, although there are several forms of thalassemia, only three severe forms namely haemoglobin (Hb) Bart’s hydrops fetalis (homozygous α0-thalassemia), homozygous β-thalassemia and Hb E-β-thalassemia are of major concern, requiring appropriate prevention and control program (3). As such, a prevention and control program has long been implemented. The ultimate goal of the program is to reduce the incidence of these three severe diseases. The program comprises three steps: 1) screening for thalassemia in pregnant women and their husbands, 2) providing genetic counselling to the at-risk couples, and 3) offering after genetic counselling, option for a prenatal diagnosis and termination of pregnancy with affected foetus (3, 4).
In Thailand, screening for thalassemia is obligatory and financed by the National Health Security Office. The national policy demands all primary health care facilities providing thalassemia screening to all pregnant women and their husbands at the first visits. Screening procedures include the measurement of mean corpuscular volume (MCV) and/or mean corpuscular haemoglobin (MCH) to identify suspected carriers of α0-thalassemia and β-thalassemia, and the dichlorophenolindophenol (DCIP) test for Hb E (5, 6). Based on this screening strategy, blood samples of positive cases are sent to referral centres to investigate further with Hb and DNA analyses for definite diagnosis. For those with negative results, no further investigation is required. Therefore, it is crucial that initial screening must be accurate. Due to the lack of proficiency testing scheme, information on the performance of initial thalassemia screening of laboratories in the country is not available so far.
In this study, we have developed and initiated the proficiency testing (PT) program for the first time in Thailand aiming to assess the screening performance of laboratory staff and their competency in interpretation of the screening results. Because of the complexity of thalassemia in the country, we hypothesized that there might be incorrect interpretation and/or risk assessment among laboratory staff. Information obtained from the study would inspire health authorities of all similar settings to establish the PT program within the region.
Materials and methods
Preparation of PT items for thalassemia screening and the participants
Ethical approval of the study protocol was obtained from the Ethical Committee of Khon Kaen University, Thailand (HE582243). Appropriate human blood samples were selectively collected and used in preparation of the PT items. After informed-consent was obtained, 15 - 20 mL peripheral blood samples anticoagulated with acid citrate dextrose (ACD) were collected from two donors, i.e. normal individual (normal Hb-type with normal MCV and MCH) and Hb E carrier. Only one sample from each donor was taken. Baseline values of MCV and MCH were measured using Sysmex XS-800i haematology analyser (Sysmex, Kobe, Japan) and Coulter Hematology Analyzers (COULTER® LH780 Hematology Analyzer, Beckman Coulter, USA). Then, blood samples were preserved with the phosphate–adenine–glucose–guanosine–saline–mannitol (PAGGSM) reagent as described previously (7). In brief, one volume of PAGGSM was added into 3.5 volumes of blood sample and mixtures were mixed continuously on a rotating mixer at 400 x g for 60 minutes. After storing at 2-6 0C for 48 hours for stabilization of red blood cell (RBC) parameters, aliquots of the mixtures were distributed on ice to the participants for analysis within 1 week. Prior to distribution, two aliquots of each PT item were randomly selected to perform MCV and MCH measurement and DCIP test. Homogeneity and stability of the PT items prepared were tested according to the ISO/IEC 17043 (8). General characteristics of the PT items prepared were summarized in Tables 1 and 2Table 2. To minimize variation in the types of haematology analysers used (9), members of the two manufacturers, i.e. the Sysmex Co., Ltd. and the Beckman Coulter Co., Ltd., were invited. These two companies agreed with the PT program. The representatives of each company were responsible for invitation and transportation of PT items on time to the invited participants.
Table 1
Table 2
The proficiency testing scheme
Three cycles of the PT program were applied during June 2015 to May 2016. From the first to the third cycle of the PT scheme, numbers of participant laboratories were 30, 26 and 27 for group of participants using Sysmex Hematology Analyzer series, in cycles 1, 2, and 3, respectively. In contrast, number of those participants using Beckman Coulter Analyzer series increased from 29 in cycle 1, to 39 in cycle 2, and to 40 in cycle 3. Participants were asked to measure MCV and MCH values and perform the DCIP test within 24 hours of receiving the PT items. In brief, laboratory staff was asked to add 0.02 ml of each blood sample into 2 ml of DCIP reagent, and incubate the mixture at 37 0C for 15 minutes. After stopping the reaction by adding 0.02 ml of the stop-solution, laboratory staff reported the DCIP screening results according to the turbidity observed by naked-eye.
In each cycle, 2 PT items were prepared and assigned as blood samples of pregnant woman, and her husband. Along with the PT items, detail instructions, necessary information and report form were provided for participants. The performance assessment comprised 3 parts; Part I: Accuracy of MCV & MCH measurements and DCIP testing, Part II: Competency in interpretation of the screening results, and Part III: Competency in assessing the risk of foetus of having the three severe thalassemia diseases (Table 3).
Table 3
Evaluation criteria
Different evaluation criteria were applied, as described below.
Accuracy of initial screening test
To minimize variation associated with different haematology analysers, performance of MCV and MCH measurements was evaluated separately for the 2 groups of participants (9). Accuracy of MCV and MCH measurements were evaluated against the assigned values obtained from the consensus values of each participant group. Assigned values were calculated using robust statistical methods as recommended by the ISO 13528 (10). A z-score was calculated directly from the result reported by the participant in relation to an acceptable variation of all results to the assigned value [(participant result - assigned value) / standard deviation] for proficiency assessment. A z-score within - 2 to 2 was considered acceptable. A bar-chart of z-score was constructed to demonstrate performance of all participants. For qualitative DCIP test for Hb E, the result of each participant (either positive or negative) was compared directly with the assigned value defined by Hb analysis using capillary electrophoresis system (Capillarys II Flex Piecing, Sebia corp., France) at our laboratory (11). Proportions of participants reporting incorrect result were calculated and compared between cycles, i.e. cycles 1 and 2, cycles 1 and 3, and cycles 2 and 3.
Competency in interpretation of thalassemia screening
Interpretation of competency was evaluated according to standard screening guidelines, using MCV and/or MCH in combination with DCIP test (5, 6). There are 4 categories of the screening results and interpretations; i.e. category A (MCV > 80 fL or MCH > 27 pg with negative DCIP or (-/-) variety): non-thalassemia or non-clinically significant thalassemia; category B (MCV < 80 fL or MCH < 27 pg with negative DCIP or (+/-) variety): suspected α0-thalassemia and/or β-thalassemia; category C (MCV > 80 fL or MCH > 27 pg with positive DCIP or (-/+) variety): suspected Hb E trait; category D (MCV < 80 fL or MCH < 27 pg with positive DCIP or (+/+) variety): suspected Hb E with or without α- and/or β-thalassemia. These 4 categories were provided in the report form of the participants, allowing them to select an appropriate category based on the screening result of each PT item.
Competency in risk assessment of the foetus
Each participant was asked to assess the disease at-risk of the foetus by combining screening results obtained from the 2 PT items (which were assigned respectively as pregnant woman and her husband). The at-risk diseases being assessed included A: Hb Bart’s hydrops fetalis (homozygous α0-thalassemia); B: homozygous β-thalassemia; C: Hb E-β-thalassemia and D: No risk for the three severe thalassemia diseases. Possible patterns of screening results of the couples and diseases at risk of the foetuses are listed (Table 4).
Table 4
As for DCIP test, the proportions of participants reporting incorrect results (incorrect interpretation of thalassemia screening as well as incorrect risk assessment) were calculated and compared between cycles.
Performance levels of the participants
The overall performance of each participant was defined according to the above three evaluation criteria; accuracy of screening tests (MCV, MCH and DCIP), competency in interpretation of screening results and competency in risk assessment of the fetus. Four levels of performance are given in Table 5.
Table 5
Statistical analysis
Outcome variables being evaluated included ‘z-score’ of MCV and MCH, DCIP result, interpretation of screening test, risk assessment, and performance level. All data obtained from participants were entered into Excel spreadsheets. For quantitative test (MCV and MCH measurements), the robust statistical analysis (Algorithm A; ISO 13528) was applied to identify outliers (10). These analyses were performed with the Excel 2013. For qualitative tests, proportions of participants reporting incorrect results (incorrect interpretation of thalassemia screening as well as incorrect risk assessment) were calculated and compared between cycles using z-test. Statistical comparison was performed using the Minitab Statistical Software version 12.2 (Minitab Inc, Pennsylvania, USA). A P-value < 0.05 was considered significant.
Results
Accuracy of MCV & MCH measurements and DCIP test
Bar charts of z-scores of participants are illustrated in Figure 1 (for Beckman Coulter Analyzers) and Figure 2 (for Sysmex Haematology Analysers). The majority of participants in both groups could report acceptable MCV and MCH values, as indicated by z < ± 2. For DCIP test, the proportion of laboratories reporting incorrect result was 6/59 in the first cycle, 4/65 in the second cycle, and 1/67 in the third cycle of the PT scheme (Table 6). There was no statistically significant difference between cycles.
Table 6
Competency in interpretation and assessing the risk of foetus
Incorrect interpretation of screening results and risk assessment of the foetus are shown in Table 6. In the first cycle, 20 out of 59 participants interpreted screening results incorrectly. The proportion of incorrect interpretation decreased to 16/65 in second cycle, and to 14/67 in third cycle. However, no statistically significant difference between cycles was obtained.
The competency in assessing the risk of foetus based on screening test results showed fluctuation. A high proportion of incorrect risk assessment was observed in second cycle (37/65) whereas in cycles 1 and 3, the proportions of incorrect risk assessment (11/59 for cycle 1 and 13/67 for cycle 3) were significantly lower than that of cycle 2.
Proficiency levels of participants
Figure 3 illustrates the proficiency levels of participants in 3 cycles of the PT scheme. Excellent performance was obtained in 28/59 (47.5%) of participants in cycle 1, 25/65 (38.5%) in cycle 2, and 38/67 (56.7%) in cycle 3. In contrast, poor performance was respectively noted at 14/59 (23.7%), 17/65 (26.2%) and 14/67 (20.9%) in the cycles 1, 2 and 3. Performance of individual participant in each round is also illustrated (Figure 4). A few laboratories could maintain their excellent performance for all 3 cycles, e.g. participant numbers 1-5 of the first group using Beckman Coulter Haematology Analysers and participant numbers 1-2 of the second group with Sysmex Haematology Analysers. However, it could be seen that many of them improved their performance in the second and/or the third cycle of the PT scheme.
Discussion
In this study, we report the results of PT program for thalassemia screening initiated for the first time in our country. It was found that approximately half of participants had excellent performance and most of the non-excellent performances are due to incorrect interpretation and/or risk assessment. Our PT schemes included quantitative, qualitative and interpretive tests. For quantitative test, although most laboratories could report acceptable MCV and MCH values, falsely high values were occasionally seen, resulting in unacceptable z-scores of higher than 2 (Figures 1 and 2Figure 2). This information is especially important for thalassemia screening in that the falsely high values of MCV & MCH could lead to false negative in thalassemia screening. In general, false positive might occur at a thalassemia screening as reported previously (5, 6, 12), but this does not matter since major concern would be false negative. It is crucial therefore that false negative should be kept at minimum since this can result in an incidence of newborn with severe thalassemia syndromes. The evaluation results obtained from the PT program are therefore helpful for participants to identify and fix such serious problems.
In addition to MCV and MCH, DCIP test is also necessary for identifying individuals with Hb E (5, 6, 13, 14). Although the test is simple and rapid, the result is highly observer dependent and adequate practice is required before performing the test (15). Results from this PT program showed that some participants performed DCIP test incorrectly (Table 5). We observed that in comparison to provincial hospitals, participants from smaller hospitals in the community performed the test with less accuracy. In consistent with this finding, we also observed a high rate of incorrect DCIP results in referred samples from community hospital for confirmation of thalassemia at our centre. This is not unexpected because with limited resources and budgets and high routine workload, staffs of these community hospitals usually have lower chance of training. Nonetheless, without the PT program they would not know their performance on thalassemia screening. The test is used widely in Thailand and currently distributed to the low-middle income countries nearby (5, 6, 12, 16). This information would therefore make the health staff recognize the limitation of the test.
Competency in interpretation of screening results and assessment of foetal risk for severe thalassemias are other issues being evaluated. Whether laboratory staffs understand the results is crucial for communicating with the couple and other allied health personals at the hospitals, especially for genetic disease (17). Also, knowing the risk of the expecting foetus of the screened couples will certainly allow them to realize the importance of the accurate screening results. Under this PT program, it was found that approximately 1/3 of participants in the first round interpreted thalassemia screening incorrectly. Although no statistical significance was achieved, it appears that incorrect results decreased gradually during the second and for 1/5 in the third cycle. This may reflect an improvement in the interpreting skill of laboratory staff and support the benefit of the PT program.
In contrast to the competency in interpretation of screening results, incorrect assessment of foetal risk fluctuated from cycle to cycle. It is noteworthy that a proportion of incorrect risk assessment was particularly high for the second cycle of the PT scheme. This can be explained by the fact that in the second cycle, more complex cases were supplied. Both PT items were positive for MCV & MCH. Only one of them was positive for the DCIP test (Tables 3 and 4Table 4). Therefore, based on the screening results, the husband was suspected carrier of α- or β-thalassemia whereas the pregnant woman could be carrier of α- or β-thalassemia or Hb E (5). Accordingly, the expecting foetus could be at risk of having all the 3 thalassemia diseases including Hb Bart’s hydrops fetalis, β-thalassemia major and Hb E-β-thalassemia. Unfortunately, many participants reported the risk of having only one or two diseases. This reflects in part insufficient knowledge among participants. As for the DCIP test mentioned above, most of the misinterpretation and incorrect assessment of foetal risk were from community hospitals.
There are several approaches to evaluate the overall performance of participants. For thalassemia prevention and control program, we believe that detection of carriers of severe thalassemia, providing accurate interpretation of laboratory results and risk assessment of the corresponding inheritance and interaction are very important. These would require appropriate knowledge of the pathophysiology, genetic inheritance, and genotype-phenotype interactions of the diseases. As for other genetic disorders, it is necessary to have accurate laboratory results as well as accurate interpretation and risk assessment (17). In this study, we combined the accuracy of screening test and the competency in interpretation and risk assessment as the performance criteria. On average, only half of participants had excellent performance (Figure 3). Taking all evaluated items into account, the main reasons explaining the poor performance appear to be incorrect interpretation and risk assessment of the expecting foetus. Similarly, misinterpretations and risk assessments were also observed previously in a PT program on Hb analysis (18). Because different cases with different complexity were assigned in each cycle, it is therefore not surprising that several laboratories could not maintain their excellent performance. Performance of some participants even swung up and down (Figure 4). Our results indicate that the PT program should be continuously operated in parallel with extensive education program on thalassemia screening to all laboratories to warrant the better understanding and maintain the goal of the PT program.
To our best knowledge, only a few papers concerning PT programs for thalassemia have been published. Proficiency testing of hemoglobinopathy techniques in Ontario laboratories initiated in 1989 focused on the accuracy of screening test for sickle cell disease and Hb electrophoresis (19). More recent study of external quality assessment conducted in Italy focused on Hb A2 measurement and genotyping of β-thalassemia (20, 21). The Dutch Foundation for Quality Assessment in Medical Laboratories (SKML) currently provides international PT program for thalassemia by assessing the accuracy of Hb A2 and Hb F as well as diagnostic skill of participants (22). In Thailand, the first PT program of Hb analysis in prevention and control of severe thalassemia has been initiated since 2012 (18). Yet, the program does not include the assessment of initial screening performance and competency of laboratory staff in interpretation and risk assessment, which is the first step that can affect the success of thalassemia prevention.
Our study has some limitations. Firstly, as we used human blood for preparing the PT items, we therefore could not provide a large batch of samples for a larger number of participants; explaining why only members of two haematology manufacturers were invited and why we could not use the same PT items for the 2 groups of participants. Secondly, even though PT items were sent to participants under appropriate conditions and packaging, they were sometimes deteriorated during transportation, leading to a withdrawal of participants, particularly for community hospitals. Nevertheless, the study provides a model of PT program for thalassemia screening and emphasizes a need for setting up such a PT scheme to get a better achievement of prevention thalassemia in this region.