Data Management and Information Systems Overview

The Data Management and Information Systems (DMIS) Unit provides HNRP investigators and staff with a state-of-the-art web-based data management and reporting systems, and both the hardware and software for collaborative research and communication.

Organization

The work of the HNRP DMIS unit is directed by, Anthony Gamst, Ph.D. and managed by Clint Cushman. Dr. Gamst assures that the conceptual and practical operations of the DMIS Unit conform to the highest standards and meet the needs of the HNRP investigators. He is responsible for developing the conceptual data management modeling and for assuring the implementation of the data management plan which will use that model to provide flexibility, accountability, and timeliness of data acquisition, storage and access.

As the DMIS Manager for the HNRP, Mr. Clint Cushman, under Dr. Gamst’s direction, oversees the design, development and the use of the HNRP web-based client/server databases. He ensures that, on a day-to-day basis, there is close coordination regarding work for specific HNRP projects.  Mr. Cushman directly supervises the programming and technical support staff.

Resources Provided

The HNRP DMIS Unit is focused on developing web-based tools for fast, accurate, and secure data entry; designing, building and maintaining secure, high-availability data systems; and developing tools for the secure tracking, reporting and downloading of data and summaries.

In addition to the data collected in pursuit of specific scientific aims, the HNRP data management system tracks and displays agendas, minutes, reports, publications and other information facilitating scholarly exchange, collaboration, education and training. Collaboration is further facilitated in software through the use of secure content management systems, XMPP/IM services, digital white-boards, and video-teleconferencing solutions. The DMIS Group is focused on providing first-class support to HNRP investigators through testing and development of novel systems.

In order to provide ongoing support for research associated with the long-term longitudinal studies of the HNRP, it has been important to provide utilities to facilitate Reproducible Research.  To that end, the DMIS unit has developed a suite of utilities to track data access and associate it with publications and research projects.  This data is then stored and can be later accessed or associated with additional data from a date-associated frozen dataset.

DMIS Technical Details

The principal focus of the Data Management Unit’s innovation is creating an infrastructure for distance-independent national and international research collaboration using the Internet.  For example using a database-driven approach to dynamic web publication called a ‘portal’ approach, web pages are dynamically constructed via scripts that assemble each page ‘on the fly’ as users click to request various types of information.

The HNRP DMIS Group will continue to build upon the experience and expertise of the existing DMIS staff, leveraging existing HNRP infrastructure to meet future needs. The DMIS Group at the HNRP has nearly 20 years experience in HIV research and related areas. Having supported HNRP associated studies including the HIV Neurobehavior Research Center (HNRC, 5 P30 MH062512-08) and over 80 associated studies including major projects such as the NIDA Program Project (5 P01 DA012065-08) and the CNS HIV Anti-Retroviral Effects Therapy Research (CHARTER) Project (N01 MH22005), the DMIS Group has developed a highly effective infrastructure capable of capturing, validating and distributing data, meeting the diverse needs of investigators in a wide range of clinical and cohort studies.

Systems Security. The local network is protected from the outside world by a firewall, which restricts access based on IP address and service/port. All machines on the local network have up-to-date patches, virus and spy-ware protection. Access to the database is available through secure web-forms only and all communication over the local network runs on the secure sockets layer (SSL). User management tools enforce fine-grained access control based on user (login) ID. Authentication is password-based, with two-factor authentication available, using RSA SecurID tokens or USB keys.

Sharing. Users wishing to access HNRP resources are asked to establish a Resource Account. A User Management tool has been developed for this purpose and registered users have access to standardized reports.  To reduce the potential for misunderstanding or inadvertent misuse of particular variables, the on-line, searchable data dictionaries are annotated with 'usage and analysis notes' for key (potentially all) variables.As part of the registration process, users are asked to provide basic contact information and required to assent to a Data Use Agreement. Registered users are also be able to request access to additional data, specimens, and expertise (including subject-matter, data management, laboratory, and statistical support). Access requests take the form of brief proposals to be reviewed for scientific merit, feasibility, and availability of resources.  This entire process is managed over the web (through a Resource Request Review tool).  We request that data derived from HNRP resources be given back to the program for access by other investigators (perhaps after a short embargo period).  To further facilitate data exchange between external investigators, user driven bulk upload utilities have been created to provide an easily accessible method of data sharing.

Data Entry. All projects start with the creation of a searchable on-line data dictionary. Data entry screens, back-end tables, and bulk-upload tools (for laboratory, imaging and other high-throughput data) are generated directly from the dictionary. Bulk upload tools are also available for incorporation of data with essentially arbitrary structure, using an entity-attribute-value (EAV) model for the back-end tables. All data entry screens are accessible via the internet to allow for off-site data entry.

Integrity. The database is fully normalized, reducing the risk of mismatch, synchronization, and other errors. Data warehousing, where necessary, is accomplished by spinning off views. Data is never deleted from the system and all modifications are stored in a transaction log which can be used to roll back to previous dataset states in the event of detected errors.

Quality Control (QC). We have a robust, multi-level data quality control system. Data entry error-rates (on and off-site) have been less than 0.1% (on all items). Double entry is used for on-site quality control. Duplicate records are compared after entry, local QC staff follows-up on and corrects all discrepancies, and the (corrected) record is committed to the database. Logic, range and cross-form checks are then applied to all data and a report on errors and unusual values is sent to staff of the appropriate core or project, returning corrections to DMIS staff. Single-entry data is subject to a 20% random audit, with additional follow-up and training should error-rates rise above 0.1%. Staff experts in each Core are associated with each element of the dataset and are expected to annotate the on-line searchable data dictionary, commenting on issues related to coding, analysis or interpretation of the corresponding values. End-users can use this information to avoid common inferential errors. End-users discovering potential errors in the data are asked to write to the DMIS QC email alias, so that DMIS staff may follow-up with Core experts to resolve the issue. Email describing the resolution will then be sent to the end-user.

Availability. Mission-critical systems are continuously monitored and pages are sent to IT staff on failure. Routers, servers and HVAC equipment are attached to three-hour uninterruptible power supply (UPS). Developers work on hardware identical to that used in production so that components can be swapped on hardware failure, should it occur. We have less than two hours of unplanned down-time per year.

Backup and Disaster Recovery Incremental back-ups are performed nightly. Full back-ups are obtained weekly and stored, in perpetuity, off-site in locked, fire-safe media storage.

IT Support. The information technology support component of the HNRP is responsible for resolving over 6,000 requests per year and maintaining 100+ desktop/laptop machines for staff at multiple locations. The group is also responsible service/deployment of videoconferencing resources and for updates to and security maintenance of web and database servers.  During the proposed funding period, the existing data team will be enhanced with a dedicated IT Security specialist who will be responsible for auditing current resources and developing plans for future expansion.

Innovations. Given the diverse requirements of multiple studies, the HNRP DMIS group has developed extremely versatile data structures over its 20 year history. By using an event-based data management system, the DMIS Unit provides a structure, which allows for events to be defined on-the-fly and accommodates both cross-sectional and longitudinal data. This addresses long-standing issues of database flexibility for long-term cohort studies. The event based nature of this structure allows us to adapt to changing circumstance, interest, and scientific focus without having to completely re-write the data system. This structure combined with our robust protocol-based data dictionary allows for personalized case-report forms and easy integration of data across multiple protocols.

A recent development has been a conversion from trigger-based calculations combined with stored summary scores to dynamic calculations on the fly to allow for scoring algorithm version control and easy comparisons to earlier results. This change allows us to look at newly collected data alongside older results easily using common algorithms as well as reducing the processor overhead at time of entry.

Flexibility. A specific example of flexibility in data structure is our emphasis on Entity Attribute Value (EAV) structures for external data. The EAV structure allows for flexibility in storage (for example, of variable length lists), and allowed us to develop web-based upload tools to allow off-site investigators and future external investigators to define, upload and report on data (such as imaging or neuropathology results) they provide to support HNRP projects.

Reporting. DMIS has created several web-based utilities to support multi-site collaborations. These utilities range from dynamically generated web based cohort growth charts/reports to interactive enrollment, scheduling and cohort/specimen tracking utilities. To accommodate the needs of HNRP investigators, all data tables have been made available for download via a web-based data archiving utility.  Additionally, investigators are provided resources to store queries to generate project-specific datasets on the fly.

Collaboration. To facilitate communications among investigators who are physically dispersed (e.g. imaging resources are on La Jolla UCSD campus while the HNRC resides on Hillcrest campus), we coordinate as much as possible via the web to reduce the requirement for numerous face-to-face meetings, while emphasizing close communication. To that end, we have deployed a wiki-based content management system for all study web resources to provide HNRP investigators and staff direct control of content, Instant Messaging (IM) services for study staff and digital white-boards/video-teleconferencing for cross-site meetings. As an example it is possible for Participant Unit staff to directly interface with the participant community by providing health links and recruiting information for the various projects as new information comes to them. Additionally, project personnel are able to interact directly with support staff within the administrative core via IM and web-based meeting resources to help maintain a cohesive center.

Contact

For assistance, please contact Clint Cushman (DMIS Manager) at This email address is being protected from spambots. You need JavaScript enabled to view it..


Copyright ©2022 HIV Neurobehavioral Research Program | University of California, San Diego
For questions regarding this site, please contact the webmaster