Revamping OpenRefine

Antonin Delpeuch − Mastodon pintoch@mamot.fr

Open Research Tools and Technologies devroom, FOSDEM 2020

A brief history

How to attract contributions?

Reach out to neighbouring communities

OpenRefine - Wikidata workshop at the WikiTechStorm 2019

Photo: CC-BY-SA Mx Lucy

Improve localization with Weblate

Weblate screen in French

Start a W3C Community Group

to improve the reconciliation API

  • Vladimir Alexiev

    Ontotext

  • Thad Guidry

    OpenRefine

  • Ricardo Usbeck

    Fraunhofer Gesellschaft

  • Raphaël Troncy

    INSTITUT TELECOM

  • Osma Suominen

    National Library of Finland

  • Juliane Schneider

    Harvard Medical School

  • Jeff Young

    OCLC

  • Jeff Mixter

    OCLC

  • Fabian Steeg

    Hochschulbibliothekszentrum des Landes NRW

  • Ettore Rizza

  • David Newbury

    J. Paul Getty Trust

  • Brendan Quinn

    IPTC - International Press Telecommunications Council

  • Ben De Meester

    Imec vzw

  • Alan Buxton

    OpenCorporates

  • Adrian Pohl

    Hochschulbibliothekszentrum des Landes NRW

Create a steering committee

Student internship programmes?

We are applying to the Google Summer of Code and Outreachy programmes in 2020.

Revamping our stack

  • Migrating the build system from Ant to Maven;
  • Getting rid of non-free dependencies (org.json);
  • Improving continuous integration and release processes;
  • Migrating out of our unmaintained web framework (Butterfly) - still to be done.

Plans for 2020

  • Migrate the data processing backend to Spark, to make it easier to work on large datasets;
  • Create a proper documentation instead of the current GitHub wiki.

This work is supported by a grant from CZI's Essential Open Source Software for Science program.

Open questions

  • How to introduce breaking changes without disrupting our ecosystem of extensions?
  • Which tasks should we leave out for new contributors to pick up, which ones should we tackle ourselves?
  • Anything else we could do better?