PhD Candidate
Extracting and linking information from the scientific literature: a case study on bioinformatic workflows
Supervisors: Sarah Cohen-Boulakia, Olivier Ferret, Aurélie Névéol
Scientific workflows enable bioinformaticians to represent, exchange and promote the reproducibility of their analysis pipelines. Workflows are described in the literature (text) and/or stored in shared repositories (code). A major challenge in improving workflow reuse is to rebuild the link between documentation and workflow code.
In my work, I develop methods to extract and represent workflow components from full-text articles. I also analyze code structures in shared repositories (e.g., GitHub) to understand how workflows are implemented. By comparing these two sources, I aim to automatically link documentation and code, fostering better integration between literature and repositories.
I create a tool named CoPaLink. This tool is designed to automatically extract the names of bioinformatics tools mentioned in a scientific article describing a Nextflow workflow, as well as in the executable source code of the workflow. A linking step between the bioinformatics tool names from both sources is then performed.
This work has received support from the French government (Agence Nationale pour la Recherche) under the France 2030 program grant agreement ANR-22-PESN-0007 (ShareFAIR).
Diplomas
- Master’s Degree in Computational Biology: Analysis, Modeling, and Engineering of Biological and Medical Information
- Paris-Saclay University | 2021-2023
- Double Bachelor’s Degree in Computer Science and Mathematics
- Paris-Saclay University | 2018-2021