Clinical Trials Data Model

Entity-Relationship Diagram

MySQL Diagram

Figure 1. MySQL relational diagram generated for the clinical trials support database.

Overview

Model Overview

The schema is centered on clinical studies, which act as the main organizational entity of the system. Each study is associated with a responsible laboratory, includes pseudonymized participants, defines its own variable set, and allows controlled access for users through study-specific roles.

The design also supports longitudinal data collection by storing repeated hospital visits for each participant. Data recorded during those visits may correspond either to structured scalar values or to file-based clinical resources. Direct patient identifiers are intentionally excluded from this operational database and would be managed in a separate restricted environment in a real hospital setting.

Core Entities

Main Structural Blocks

1. Study Administration

The laboratories and clinical_studies tables define the institutional and organizational context of the system. They store the identity of each study together with its code, description, status, contact information, and responsible laboratory.

2. User Management

The users, roles, and user_study_access tables control system access. This structure allows assigning different roles to the same user in different studies.

3. Pseudonymized Participants

The participants table stores only study-specific pseudonymized identifiers. The participant_seq table supports automatic identifier generation for each study.

4. Longitudinal Visits

The visits table represents repeated participant follow-up over time. Each visit stores date, label, optional notes, and the responsible user.

5. Dynamic Study Variables

The variable_set table defines the variables collected in each study. This makes the schema flexible and avoids using fixed clinical columns for every possible trial.

6. Clinical Records

The schema stores clinical observations in clinical_data_items and file metadata in clinical_data_files, both linked to visits and study variables.

Technical Documentation

Relational Structure

Technical relational structure including entities, keys, cardinalities, and referential integrity rules.

Tables in the Schema

laboratories — stores laboratories responsible for clinical studies.
clinical_studies — central table describing each clinical trial.
roles — defines application roles (admin, supervisor, data entry).
users — stores system users with authentication data.
user_study_access — junction table linking users, studies, and roles.
participant_seq — sequence counter for automatic participant ID generation.
participants — pseudonymized study participants.
visits — participant visits over time (longitudinal data).
variable_set — study-specific variable definitions.
clinical_data_items — scalar clinical values per visit/variable.
clinical_data_files — file metadata (images, sequencing) per visit/variable.

Primary Keys

laboratories.laboratory_id
clinical_studies.clinical_study_id
roles.role_id
users.user_id
user_study_access.access_id
participant_seq.clinical_study_id
participants.participant_id
visits.visit_id
variable_set.variable_id
clinical_data_items.item_id
clinical_data_files.file_id

Foreign Keys

clinical_studies.laboratory_id → laboratories.laboratory_id
user_study_access.user_id → users.user_id
user_study_access.clinical_study_id → clinical_studies.clinical_study_id
user_study_access.role_id → roles.role_id
participant_seq.clinical_study_id → clinical_studies.clinical_study_id
participants.clinical_study_id → clinical_studies.clinical_study_id
visits.participant_id → participants.participant_id
visits.user_in_charge → users.user_id
variable_set.clinical_study_id → clinical_studies.clinical_study_id
clinical_data_items.visit_id → visits.visit_id
clinical_data_items.variable_id → variable_set.variable_id
clinical_data_files.visit_id → visits.visit_id
clinical_data_files.variable_id → variable_set.variable_id
clinical_data_files.uploaded_by → users.user_id

Relationship Types

One-to-Many (1 : N)
This is the dominant pattern in the schema. Examples include: laboratories → clinical_studies, clinical_studies → participants, participants → visits, clinical_studies → variable_set, visits → clinical_data_items, and visits → clinical_data_files.
Many-to-Many (N : N)
The relationship between users and studies is many-to-many and is implemented through the user_study_access junction table.
One-to-One (1 : 1)
A logical one-to-one relationship exists between clinical_studies and participant_seq, since each study uses a single sequence counter for participant ID generation.

Referential Integrity

CASCADE — dependent records deleted with parent.
SET NULL — reference removed, dependent record kept.
RESTRICT — deletion blocked while dependents exist.

Pseudonymization

Participants are stored through pseudonymized study-specific identifiers. The participant_seq table supports automatic ID generation. Patient-identifying information is intentionally excluded from this operational schema and would be handled externally in a restricted environment.

Summary

Technical Summary Table

Table	Primary Key	Main Foreign Keys	Relationship Type
laboratories	laboratory_id	—	1:N with clinical_studies
clinical_studies	clinical_study_id	laboratory_id	1:N with participants, variable_set, access
roles	role_id	—	1:N with user_study_access
users	user_id	—	N:N with studies via access table
user_study_access	access_id	user_id, clinical_study_id, role_id	Junction table
participant_seq	clinical_study_id	clinical_study_id	1:1 with clinical_studies
participants	participant_id	clinical_study_id	1:N with visits
visits	visit_id	participant_id, user_in_charge	1:N with data items/files
variable_set	variable_id	clinical_study_id	1:N with data items/files
clinical_data_items	item_id	visit_id, variable_id	N-side of visit/variable
clinical_data_files	file_id	visit_id, variable_id, uploaded_by	N-side of visit/variable

Rationale

Design Decisions

Support for multiple studies — the schema manages several independent clinical studies within the same database.
Flexible variable definition — different studies define different variables through the variable_set table.
Longitudinal follow-up — the visit table supports repeated data collection over time.
Controlled access by study — user_study_access assigns users different roles per study.
Privacy-aware design — direct personal identifiers are excluded; handled externally in a restricted environment.