Basic principles

Goupile allows you to design an eCRF with a slightly different approach from the usual tools, since it involves programming the content of forms, while automating other aspects common to all eCRFs:

In addition to the usual features, we have made every effort to minimize the time between developing a form and entering data.

This is a tool in development and many design choices are still underway. [WIP] indications designate features or options that are under development or improvement.

Even if a compiled public version is not yet available, you can test the demo online.

Domains and projects

Each Goupile service has a domain (or subdomain). For example, demo.goupile.fr and psy-lille.goupile.fr are separate services with separate databases and different users, even if possibly hosted on the same server.

When creating a domain, one (or more) trusted administrators are designated to manage its projects and users. A pair of encryption keys is generated to perform backups of the domain databases. The public key is stored on the server to create the backups. The private key is entrusted to designated administrators and is not stored; its loss leads to the loss of all backups of this domain.

Details on the encryption used are detailed in the section on architectural choices.

Administrators can create projects and assign users to them, either to design forms or to enter data.

User Management

Manage your projects and users within the administration module.

Each Goupile domain has its own table of users.

These user accounts are managed by the administrator(s) designated for this domain, who can create, modify and delete them.

Each user can be assigned to one or more projects, with a set of permissions according to his preprogatives. There are two sets of permissions:

These permissions are detailed in the tables below:

PermissionExplanation
DevelopEdition of forms
PublishPublication of modified forms
ConfigureProject and center configuration (multi-center)
AssignChange user permissions on the project
PermissionExplanation
LoadRead records
SaveEdit records
ExportExport data en masse (CSV, XLSX, etc.)
BatchRecompute all calculated variables on all records

By default, user authentication is based on a username / password pair. This password is stored hashed in the database (libsodium pwhash).

Several strong authentication modes are available or planned:

Goupile usage

Goupile is coded as a SPA (Single Page Application) type web application. A brief overview of the different aspects of Goupile is given below; refer to the user manual for more detailed information.

Design of an eCRF

Program the page in the editor (left), and the result (right) is displayed immediately

When designing a form, the user programs it in JavaScript via an online text editor, by calling predefined functions which generate the fields desired by the user. The eCRF is displayed directly on the same screen.

To create an eCRF, the user begins by defining the organization and sequence of entry pages and the underlying data tables. It is possible to create multiple table eCRFs with 1-to-1 and 1-to-N (parent-child) relationships in this phase.

The content of the pages is also defined in Javascript. The use of a programming language to design the forms gives us a lot of possibilities, in particular the creation of complex forms (conditions, loops, dynamic calculations, specific widgets, etc.), without sacrificing simplicity for more usual forms.

Data validation

The verification of the validity of the data in relation to the imposed constraints takes place on the client side (systematically) and on the server side (on the pages where this option is activated). This is based on the Javascript code of each page, which can define conditions and errors depending on the data entered.

These errors feed the quality control base which can then be monitored [WIP].

To ensure the security of the server despite the execution of potentially malicious JavaScript code, several measures are taken and detailed in the section about server architecture.

Offline support

eCRFs developed with Goupile can work in offline mode (if this option is enabled). In this case, Goupile uses the PWA features of modern browsers to be able to cache its files and form scripts, and be installed as a native pseudo-application.

In this case, the data is synchronized in a second step when the connection is available.

Offline data is symmetrically encrypted using a key specific to each user and known to the server. This key is communicated to the browser after successful authentication.

In order for the user to be able to connect to their offline application, a copy of their profile (including the offline data encryption key) is stored on their machine, encrypted by a key derived from their password. When he tries to connect offline, his username and password are used to decrypt this profile and be able to access local data.

If the installable client is used (based on Electron), offline authentication can also be configured in strong mode, with the need to plug in a USB stick containing a secondary encryption key to be able to connect to the eCRF.

Server architecture

General view

The Goupile server is developed in C++. The compiled binary directly contains the database engine (SQLite), an HTTP server (libmicrohttpd) as well as the HTML / CSS / Javascript code sent to web browsers.

Several SQLite databases are created and used for each domain. First of all, there is a master database which contains the list of projects, users and permissions. Then, each project uses 1 to several databases (1 for the project + 1 per center in the case of a multi-center project). For example, a domain with 2 projects including a multi-center could use the following files:

goupile.db # Main database
instances/projet1.db
instances/projet2.db
instances/projet2@lille.db
instances/projet2@paris.db

PostgreSQL support to be able to deport the database to another machine is planned for later [WIP] .

Isolation of services

Each domain is managed by a dedicated service (launched, for example, by systemd), which is able to self-containerize on Linux (using POSIX capabilities, namespaces and seccomp filters) in a namespace with virtually no access except to the SQLite files.

This service can use a single (single-process mode) or several processes (multi-process mode [WIP] ) to manage each project. In multi-process mode, HTTP communication is relayed by the master process to the process in charge of managing each project.

In all cases, when the server validates the form data (non-systematic option according to the data validation needs of a form), the Javascript code is executed by the SpiderMonkey engine in a forked process with completely restricted rights (no access to file system or database).

Compilation options

In addition to containerization, several Clang compilation options are used to mitigate code vulnerabilities of the server in the event of a breach. As described later, the main Goupile binary is built in Paranoid mode.

Several measures are intended to prevent attacks by corrupting the call stack or hijacking the execution flow:

In addition, during development we use different sanitizers (ASan, TSan and UBsan) to detect memory access errors, multi-threading and the use of undefined behaviors in C / C ++.

Data Format

Each Goupile database is encrypted at rest (SQLite3 Multiple Ciphers). The main database encryption key is communicated to Goupile during launch by a means to be determined by the person who administers the server. Each other database has a specific key stored in the main database.

The script of the forms of a project are stored and versioned in the SQLite databases.

The data entered in a project are stored in the corresponding SQLite database (for multi-center studies, each center has a separate database). Two SQLite tables are used for the data:

The master key for a record is in ULID format. This makes it possible to generate unique record identifiers client-side (with infinitesimal risk of collision) which simplifies the implementation of the offline mode, while avoiding the performance problems posed by the indexing of the UUID identifiers.

Goupile installation

Compilation

The goupile server is cross-platform, but it is recommended to use it on Linux for maximum security.

Indeed, on Linux Goupile can operate in sandboxed mode thanks to seccomp and Linux namespaces (unshare system call). Sandboxing support is planned in the long term for other operating systems but is not available at this time. It is recommended to use it on Debian 10+ distribution.

Goupile is based on C++ (server side) and HTML / CSS / JS (client side). The compilation of Goupile uses a dedicated build tool which is itself stored in the repository.

Start by getting the code from the Git repository: https://framagit.org/interhop/goupile

git clone https://framagit.org/interhop/goupile
cd goupile

Linux

To compile a development and test version, proceed as follows from the root of the repository:

# Bootstrap the Felix tool used to build Goupile itself
./bootstrap.sh

# The goupile binary will be deposited in bin/Debug
./felix

For production use, it is recommended to build Goupile in Paranoid mode using Clang 11+ and the LLD 11+ linker. Under Debian 10 you can proceed like this:

# Bootstrap the Felix tool used to build Goupile itself
./bootstrap.sh

# Install recent LLVM: https://apt.llvm.org/
sudo bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
sudo apt install clang-11 lld-11

# The goupile binary will be deposited in bin/Paranoid
./felix -pParanoid --host=,clang-11,lld-11

Other systems

To compile a development and test version, proceed as follows from the root of the repository:

POSIX systems (macOS, WSL, etc.)

# Bootstrap the Felix tool used to build Goupile itself
./bootstrap.sh

# The goupile binary will be deposited in bin/Debug
./felix

Windows

# Bootstrap the Felix tool used to build Goupile itself
# You may need to start the Microsoft Visual Studio x64 (or x86) native console
# before you can continue
bootstrap.bat

# The goupile binary will be deposited in bin/Debug
felix

It is not recommended to use Goupile in production on non-Linux systems, as sandboxing and compilation in Paranoid mode are not available for these platforms.

However, you can use the command ./felix --help(or felix --helpon Windows) to view the build modes and options available for your system.

Manual deployment

Once the Goupile executable has been compiled, it is possible to create a Goupile domain using the following command:

# For this example, the domain will be created inside a tmp
# subdirectory of the repository, but you can create it where you want!
mkdir tmp

# This command will ask to create the first admin account and to define its password.
bin/Paranoid/goupile init tmp/test_domain

This process will create an archive recovery key that you must store in order to be able to restore an archive created in the Goupile domain administration panel. If lost, this key can be changed but archives created with the previous key will not be recoverable!

To access this domain through a web browser, you can launch it using the following command:

# Start this, Goupile will run on http://localhost:8889/
bin/Paranoid/goupile -C tmp/test_domain/goupile.ini

For a production release, it is recommended to run Goupile behind an HTTPS reverse proxy such as NGINX.

To automate the deployment of a complete server in production (with multiple Goupile and NGINX domains configured automatically), we provide a ready-to-use playbook and Ansible roles that you can use as-is or adapt to your needs.

Ansible deployment

The Ansible scripts provided are suitable for secure deployment on Debian 10+. They can theoretically be used and / or adapted for other systems but this is not regularly tested.

This PKnet playbook is configured to install the following services:

In this playbook, these services are spread over 3 machines:

You can quickly test this playbook using the Vagrant script which is included in the repository using the following commands:

cd deploy
vagrant up --no-provision
vagrant provision

The following test domains will then be configured and accessible on the local machine:

The playbook is defined in deploy/pknet.yml and the Vagrant inventory which serves as an example is defined in deploy/inventories/pknet/ vagrant/hosts.yml. You can copy the inventory and adapt it to set up your own production environment, with your own machines and domains. It contains comments explaining the different settings available.

Goupile development

Code and license

Goupile is an open source application, licensed under AGPL 3 . You can freely download and use the source code of Goupile. Anyone has permission to run, copy, modify, and redistribute modified versions of this code.

The source code is available in our git repository.

Please note, this is a single repository (monorepo) which contains several projects. Goupile specific code is available in src/goupile , but it references several other libraries (internal or external) in this repository.

Bugs and requests

Use our online ticket tracker to request new features or report bugs.

HDS server configuration

Environments and servers

Our HDS servers are deployed automatically using Ansible scripts, which are hosted host GPLExpert (HDS subcontractor and outsourcing).

We use two deployment environments: a pre-production environment (which manages subdomains *.preprod.goupile.fr) and a production environment. The pre-production environment is identical to production and allows us to test our deployment scripts. It only contains domains and test data.

Each environment uses two servers:

Communication between the proxy server and the back-end server takes place via a secure channel (IPSec and TLS 1.2+). The exchanges between the two NGINX services are protected by server and client certificates signed by an internal certificate created at the time of deployment (the private key is deleted immediately).

Disaster recovery plan [WIP]

Server environments are fully configured by automated Ansible scripts and can be replicated identically in minutes.

Restoring data after loss of the primary server can be done from several sources:

  1. Databases continuously replicated on another server [WIP]
  2. Encrypted nightly backup of SQLite databases carried out and copied to a separate server in another datacenter [WIP]
  3. Snapshot of VPS taken every night and kept for 14 days, which can be restored quickly by GPLExpert