PhD Candidate

Extracting and linking information from the scientific literature: a case study on bioinformatic workflows

Supervisors: Sarah Cohen-Boulakia, Olivier Ferret, Aurélie Névéol

Scientific workflows enable bioinformaticians to represent, exchange and promote the reproducibility of their analysis pipelines. Workflows are described in the literature (text) and/or stored in shared repositories (code). A major challenge in improving workflow reuse is to rebuild the link between documentation and workflow code.

In my work, I develop methods to extract and represent workflow components from full-text articles. I also analyze code structures in shared repositories (e.g., GitHub) to understand how workflows are implemented. By comparing these two sources, I aim to automatically link documentation and code, fostering better integration between literature and repositories.

I create a tool named CoPaLink. This tool is designed to automatically extract the names of bioinformatics tools mentioned in a scientific article describing a Nextflow workflow, as well as in the executable source code of the workflow. A linking step between the bioinformatics tool names from both sources is then performed.

This work has received support from the French government (Agence Nationale pour la Recherche) under the France 2030 program grant agreement ANR-22-PESN-0007 (ShareFAIR).

Diplomas

Profile picture