Development of a data entry auditing protocol and ... - Springer Link

5 downloads 0 Views 135KB Size Report
Feb 18, 2011 - in tool (eAuditor) to assist in auditing of data held in our tissue bank database. Using eAuditor, we have identified that human entry errors range ...
Cell Tissue Bank (2012) 13:9–13 DOI 10.1007/s10561-011-9240-x

BRIEF COMMUNICATION

Development of a data entry auditing protocol and quality assurance for a tissue bank database Matloob Khushi • Jane E. Carpenter • Rosemary L. Balleine • Christine L. Clarke

Received: 23 September 2010 / Accepted: 31 January 2011 / Published online: 18 February 2011 Ó Springer Science+Business Media B.V. 2011

Abstract Human transcription error is an acknowledged risk when extracting information from paper records for entry into a database. For a tissue bank, it is critical that accurate data are provided to researchers with approved access to tissue bank material. The challenges of tissue bank data collection include manual extraction of data from complex medical reports that are accessed from a number of sources and that differ in style and layout. As a quality assurance measure, the Breast Cancer Tissue Bank (http:\\www.abctb.org.au) has implemented an auditing protocol and in order to efficiently execute the process, has developed an open source database plugin tool (eAuditor) to assist in auditing of data held in our tissue bank database. Using eAuditor, we have identified that human entry errors range from 0.01% when entering donor’s clinical follow-up details, to

M. Khushi (&)  J. E. Carpenter  R. L. Balleine  C. L. Clarke Breast Cancer Tissue Bank, The University of Sydney at the Westmead Millennium Institute, Darcy Road, Westmead, NSW 2145, Australia e-mail: [email protected] R. L. Balleine  C. L. Clarke Translational Oncology, Western Sydney Local Health Network, Westmead, NSW 2145, Australia R. L. Balleine  C. L. Clarke Westmead Institute for Cancer Research, The University of Sydney at the Westmead Millennium Institute, Westmead, NSW 2145, Australia

0.53% when entering pathological details, highlighting the importance of an audit protocol tool such as eAuditor in a tissue bank database. eAuditor was developed and tested on the Caisis open source clinical-research database; however, it can be integrated in other databases where similar functionality is required. Keywords eAuditor  Caisis  Quality control protocol  Auditing  BCTB

Introduction The value of correctly consented, collected, processed and stored biological specimens is now well recognized for the advance of basic and translational research (Womack and Gray 2009). Of almost equal importance is the clinical information that accompanies each specimen. It is a major challenge to collect and process clinical information and link it to the available research material, and the complexities of clinical data management can surpass the complexities of specimen management. The central database is a critical component of any tissue bank, holding information on specimen collection, donor consent, as well as demographic, specimen and clinical data for each case. The Breast Cancer Tissue Bank (BCTB, http:\\www.abctb.org.au) based in Sydney, Australia, employs Caisis (Carpenter et al. 2007) an

123

10

open source clinical database. In collaboration with the Caisis Development team, in April 2008, the BCTB developed a specimen management module that has been released as a core Caisis component since version 4.1.1. The BCTB database is frequently up-dated with addition of new donor records and accrual of additional follow-up and treatment information on existing donors. It is essential to ensure the accuracy of data held in the BCTB database, as the data supplied to approved research projects is drawn directly from the database. In recognition of the importance of quality assurance (Verspoor et al. 2009), the BCTB has implemented an auditing protocol for auditing of the data contained within Caisis. We developed an in-house database plugin which appeared as a separate screen in Caisis version 3.5. With the move to Caisis version 4.0, and the accompanying upgrade of the software language from ASP.Net 1.1 to ASP.Net 2.0, the BCTB also upgraded the language of the auditing module. The workflow of our auditing protocol was re-assessed and we developed the process described here.

Methods The Caisis web interface screen for each donor record is divided into two vertical parts (Fearn et al. 2007; Fearn and Sculli 2010). On the left appears a running list of all case record items in the database, sorted in chronological order. On the right, the data entry forms for all of the data that can be stored in Caisis can be displayed. These include consent, demographics, specimen details and treatment information. The eAuditor is built on Microsoft ASP.Net 3.5 & SQL Server 2005 Express technologies. Once integrated with Caisis, eAuditor appears as a button on all data entry forms. Personnel employed to conduct the audit process print the records to be audited, check the accuracy of the data shown on the database record against the original hard copy records of the data, correct any error(s) on the database and mark these on the printed copy, then sign off on the hard copy. The database audit event begins when the auditor clicks on the eAuditor button for the selected data form. This brings up a windowless popup containing a text box for entry of total errors found on this form; and for comments; and two check boxes: one to

123

Cell Tissue Bank (2012) 13:9–13

confirm that all fields have been verified and the other to confirm that the audited form has been printed and signed. Once the eAuditor popup has been ticked and saved, a tick icon appears beside the audited records on the main chronological list. This enables easy identification of the records for which auditing has been completed. Each screen only requires auditing once. After a screen has been audited, the appearance of the eAuditor button changes to a green tick, which gives view-only access to the comments of the auditor and total errors recorded. The scope of auditing carried out using eAuditor includes the most significant data sets, such as tissue donor’s demographic information; pathology reports; consent; and tissue and blood accession details. To facilitate management of the auditing tasks, and to easily identify records that have yet to be audited, we have developed a new auditing tab in Caisis. This displays a list of records that are pending audit, by year and tissue bank collection site. For each donor record with pending audit items, the list will display both the audited and the outstanding records. Selecting any of the donors on this screen returns the auditor to the donor data entry screen, where they can view and undertake pending audit tasks. Once all mandatory records have been audited, that donor will disappear from the audit listing. Adding a new data entry form (pathology, blood or tissue accession) for that donor returns the donor to the audit listing, and alerts the user to the existence of a new record requiring auditing.

Design and integration with Caisis The eAuditor source code and installation instruction for the latest version of Caisis, v5.0 is available at http://caisis4tb.sourceforge.net/. A full customized copy of Caisis 5.0, in use by the BCTB, can be requested from the corresponding author. The eAuditor database table named ‘Auditing’, records the patient database identity, the audited table name with the primary key of the audited record, and the login name of the auditor with timestamp. The Caisis database records the auditor name, the data fields that were changed and the date/time of the auditing event, allowing the performance of each data entry officer to be monitored. Caisis also provides a ‘Reports’ section

Cell Tissue Bank (2012) 13:9–13

11

where database reports can be generated in response to any SQL query. As a consequence of these measures, data input quality is monitored through a number of auditing/quality check queries. For the integration of eAuditor in databases other than Caisis, the host database has to be web-based and has to be written in Asp.Net, C# & SQL Server. If the integrator (data manager) has basic knowledge of these languages, integration is the same as for any Asp.Net standard code integration. This includes copying eAuditor files into the host database installation folder, removing links to Caisis core classes on top of the code files, creating a hyperlink from host pages and writing a small amount of code to transfer the logged in auditor’s username, patient’s identity and data form’s name and identity to eAuditor as a form of query string. For SQL database interaction host database classes (code) could be used, in which case reference to the database classes would need to be corrected accordingly. Changes can be kept to a minimum by using the Caisis open source core SQL database classes. For databases written in other languages eAuditor can be re-written adopting our protocol as described above.

Results Implementation of eAuditor by the BCTB revealed errors in 273 fields, of 81,714 total fields audited in 3,242 forms (Table 1). The overall error rate was 0.33%. However, inspection of the errors observed in different forms revealed that human entry errors occurred at different rates depending on the type of data being entered. For example error rates of 0.53, 0.34, 0.26, 0.13 and 0.01% of data entered from

pathology reports, specimen accessions, tissue donor consent forms, donor’s demographics and clinical follow-up forms, respectively were detected. The highest error rates were noted in entry of data from pathology reports. Information entered from the pathology report includes cancer type and grade, hormone receptor status (ER, PR), HER-2, and TNM stages. These data are critical to the correct clinicopathological annotation of the specimen, and so errors in such data entry would be a major source of concern for the accuracy of information given to approved researchers if not identified and corrected by the auditing process. The second highest rate of error related to entry of specimen accessions. These errors are attributable to assigning the incorrect positions of samples within the cryostorage boxes. Such errors create confusion and difficulty in locating specimens, but do not compromise the clinical or the consent information. The 11 errors identified on patient consent forms refer to errors in the date consented, the identity of the person obtaining consent, or the version of the consent form in use. There were no errors in the section that records consent status (agreeable, refused or revoked). This could be due to a particular emphasis within BCTB to ensure that consent status, and changes of consent status are carefully monitored, and if consent is refused or revoked then all data is removed from the database immediately. To date only two patients have revoked their consent in the last 5 years. Most errors on demographics forms were minor, mostly related to errors in the contact details of patients. As errors in the date of birth entry will affect the age at diagnosis, this field is always carefully checked and errors in this field seldom occur.

Table 1 The error rate on different forms Form name

No of fields per form (A)

No of forms audited (B)

Total fields (X = A 9 B)

Fields with errors (Y)

Error rate* (Y/X) 9 100 (%)

Breast pathology

48

733

35,184

188

0.53

Specimen accession

28

667

18,676

63

0.34

Patient consent

11

379

4,169

11

0.26

Demographics

20

348

6,960

9

0.13

Clinical follow-up

15

1,115

16,725

2

0.01

Total

122

3,242

81,714

273

0.33

* The error rate was calculated by expressing the number of fields with errors, as a percentage of the total fields audited for each form type

123

12

Demographics and specimen accession data entry does not require any specialist knowledge, however, specimen accession entry is a complex task, as each specimen has to be allocated a correct geographical site, storage vessel, rack within the storage vessel, box within the rack, and position within the box. Most errors identified in specimen accession fields related to incorrect positions in the box, for example where one position may be allocated to two samples. On the other hand, the demographics form has fewer data items, and the nature of the data is less complex, consisting of patient name, address, medical record number, date of birth, country of birth, ethnicity etc. The different error rates identified with eAuditor between different data forms would suggest that the complexity of the data entered in the form increases the risk of erroneous data entry. Clinical follow-up entries had the lowest error rate (0.01%), and this is potentially attributable to the fact that this data is entered by an experienced follow-up person dedicated only to performing clinical followup. Therefore this may suggest that training and specialisation of tasks can reduce the risk of erroneous data entry.

Discussion This report details the development and implementation of an audit tool, eAuditor, as an adjunct in the validation of data held in a tissue bank database. The error rates identified overall were less than 1%, but error rates were not concordant on different forms. The lowest error rate was on the clinical follow-up forms, whereas the highest error rate was observed on the pathology forms. The reasons for the range of error rates across different forms could include the training level of data entry personnel; ambiguity in the data type that can be entered into the database fields; or the complexity of the data requiring entry. The skill level of the data entry officers is unlikely to be an issue, as our data entry personnel are either scientists, or registered nurses at participating hospitals, who would be familiar with the terminology on the data forms. There are also Standard Operating Procedures associated with data entry, to minimize variations in data entry protocols between different data entry officers. To minimize ambiguity in the data type that can be entered into database fields, we have

123

Cell Tissue Bank (2012) 13:9–13

restricted data-entry by the deployment of drop-down boxes that contain only a limited choice of data entry options. To further reduce errors relating to the wrong selection of a value in the dropdown box, selected dropdown options are linked to others fields in the form. For example, on the specimen accession screen, the choices in the drop-down box for ‘specimen type’ are ‘blood’, ‘FFPE’ and ‘fresh tissue’. If ‘blood’ is selected as a specimen type then under ‘specimen sub type’, the available sub-options only relate to blood (namely, ‘Buffy Coat’, ‘Serum’, ‘Whole Blood’ etc.). In addition, a range of other database level constraints have been implemented to avoid many types of erroneous entries. The complexity of the data requiring entry is identified as a factor in the differing error rates across different forms and potential solutions to this include narrowing the task range of data entry officers to allow a greater focus on data entry tasks, or further development of the data storage inventory system to prevent double allocation of samples to the same storage location. In addition, specific measures to reduce the errors in entry of data from pathology reports could include provision of pathology reports in electronic XML parsed data, along with the paper version. However, obstacles to achieving such an objective would include the challenges in standardisation of the data fields to be reported, and determination of the XML schema elements and attributes syntax for pathological reporting. There are other auditing techniques available, including read-aloud and double data-entry. Kawado et al. (2003) reported error rates on read-aloud and double-entry and an overall reported error rate of 0.34%, comparable to our overall error rate of 0.33%. In contrast, another study using double entry reported much larger error rates, ranging from 2.3 to 26.9% in two research databases (Goldberg et al. 2008). Double data entry is an auditing technique considered to be the gold standard approach. However, as double-entry needs to be carried out by two separate data entry officers, the expenses associated with double data entry are substantial. Moreover, when biospecimens and data are being collected from geographically dispersed hospitals, the data entry officers are located in these participating institutions, and double-data entry would necessitate employing additional staff at each site, with attendant increases in cost and other limitations.

Cell Tissue Bank (2012) 13:9–13

A further limitation of double-entry is that it cannot identify or correct errors originating in the original documents. For instance, fresh tissue may be assigned to a freezer location on the paper records, whereas it should be assigned to liquid nitrogen storage. A trained auditor who performs auditing for all centres with the help of eAuditor, can concurrently correct such errors in the original paper records. During the manual checking of records, the auditor can also identify and resolve inconsistencies in the paper records and correct the data in the database. Another advantage of conducting a manual audit is that if the data entry protocol has been changed since the data entry, then prior data can be re-categorised. For example, the occurrence of basal cell carcinoma in a patient was previously recorded in the ‘comments’ field. The decision was made subsequently to record such findings in a separate data form in the Caisis database, as a ‘non-breast procedure’. Thus such changes in record procedures can also be corrected during the eAuditor process. Database auditing is intended to be a cyclical process; in this regard, we intend to re-audit any records that have been changed subsequent to the first audit. We are currently developing the next release of eAuditor, and this will include the functionality to flag for re-auditing those patient records that have had alterations since the first audit. In summary, the rationale and need for auditing of manually entered biological data is clear (Burrage et al. 2006; Fan and Friedman 2008). By implementing eAuditor, the possibility of incorrect specimen annotation, and provision of inaccurate material and data to researchers is greatly diminished.

13 Acknowledgments The authors would like to thank Dr. Dinny Graham for her helpful feedback. The Breast Cancer Tissue Bank is funded by the National Health and Medical Research Council of Australia, the National Breast Cancer Foundation and the Cancer Institute NSW (CINSW). RLB is a CINSW Fellow.

References Burrage K, Hood L et al (2006) Advanced computing for systems biology. Briefings in Bioinformatics 7(4):390–398 Carpenter JE, Miller JA et al (2007) The Caisis system for biorepository data requirements—Breast cancer tissue bank, Australia. Cell Preserv Technol 5(1):51–52 Fan J-W, Friedman C (2008) Semantic reclassification of the UMLS concepts. Bioinformatics 24(17):1971–1973 Fearn P, Regan K et al (2007) Lessons learned from Caisis: an open source, web-based system for integrating clinical practice and research. Computer-based medical systems, 2007. CBMS ‘07. In: Twentieth IEEE international symposium on 2007 Fearn P, Sculli F (2010) The CAISIS research data system. In: Ochs MF, Casagrande JT, Davuluri RV (eds) Biomedical informatics for cancer research. Springer, US, pp 215–225 Goldberg SI, Niemierko A et al (2008) Analysis of data errors in clinical research databases. AMIA Annu Symp Proc 2008:242–246 Kawado M, Hinotsu S et al (2003) A comparison of error detection rates between the reading aloud method and the double data entry method. Control Clin Trials 24(5): 560–569 Verspoor K, Dvorkin D et al (2009) Ontology quality assurance through analysis of term transformations. Bioinformatics 25(12):i77–i84 Womack C, Gray NM (2009) Banking human tissue for research: vision to reality. Cell and Tissue Banking 10(3): 267–270

123