Grants:Project/Diegodlh/Web2Cit: Visual Editor for Citoid Web Translators/Midpoint: Difference between revisions

Content deleted Content added
Diegodlh (talk | contribs)
paws dns update
 
(7 intermediate revisions by 2 users not shown)
Line 1:
{{Project/Proposals/Selected/Tabs | midpoint = true }}
 
{{Project Grant report|Status=DraftAccepted|Date=2020-21|Grantee=organizationindividual|Type=Midpoint}}
 
<!--Please fill in all of the below sections to complete your midpoint report. We really want to learn along with you, so provide as much detail as you can, while also keeping your answers concise (bullet points and bolding for key points is always appreciated!). Include screenshots, links, or other examples of what you've made as often as you can. You may also add additional sections for any additional information you would like to share!-->
Line 9:
==Summary==
''In a few short sentences or bullet points, give the main highlights of what happened with your project so far.'' <!--Hint: this can be a good section to write after you’ve created the rest of your report!-->
 
Our project involves three main subprojects:
 
* Development
** Web2Cit translation is already available for early adopters willing to use it before the visual editor is released. Just prepend <code>https://web2cit.toolforge.org/</code> to any URL and you will get a translation result, based on collaboratively defined configuration files, that you can use with Wikipedia's automatic citation generator.
** All source code has been published under a GPL free software license on Wikimedia's recent Gitlab installation, making this one of the first projects to be hosted there.
 
Line 47 ⟶ 48:
** '''Gitlab repository.''' Based on our previous experience with Github and Gitlab (and lack of experience with Gerrit), and considering that Wikimedia had set up a [[mw:GitLab | custom installation of Gitlab]], we decided to host our source code there, becoming [[phab:T282842#7597153 | one of the first projects]] to do so.
** '''GPL license.''' Our code has been published under the [[w:GNU General Public License | GNU General Public License v3]], to make sure others can freely reuse it, under the same conditions of freedom.
** '''Automated tests.''' In a lead developer's previous project [[wikidata:Wikidata:Zotero/Cita | Cita]], a lead developer's previous project, lack of time and knowledge resulted in the absence of [[w:Test automation | automated tests]], which is currently somewhat [https://github.com/diegodlh/zotero-cita/issues/30 hindering project evolution]. To prevent this from happening in Web2Cit, we included automated tests from the beginning.
** '''Code formatter.''' Based on previous experience with [[wikidata:Wikidata:Zotero/Cita | Cita]] as well, we decided to use the Prettier code formatter.zref<ref>https://prettier.io/</ref> This should help us focus on the important aspects when discussing changes to the code, rather than wasting time on coding style.
** '''TypeScript.''' Because we wanted our [[phab:tag/w2c-editor | translation editor]] to run on web browsers, and because it would consume our [[phab:tag/w2c-core/core library | core library]], which in turn would be used by other components of the Web2Cit ecosystem, we decided to use JavaScript for all our codebases. We were in doubt whether to use [[m:TypeScript | TypeScript]] (a JavaScript superset supporting static typing) because we were afraid it could deter potential contributors (given its relatively higher complexity), but in the end we decided to do so, given its potential to prevent bugs and increase code quality.
 
Line 111 ⟶ 112:
The goal of our research sub-project is to describe the width and nature of the current Citoid coverage gap (i.e., webpages for which Citoid returns wrong or incomplete metadata), by developing an automated script that can be run now and any time in the future, after Web2Cit has been implemented. So far, we have:
* Created a list of citation templates and relevant parameters for different language Wikipedias.<ref name="citation templates"/>
* Partially developed the automated script (hosted on [https://public.-paws.wmcloud.org/User:Nidiah/Web2Cit-research/understand-citoid-coverage.ipynb Wikimedia's PAWS]), including:
** Retrieving the list of featured articles for a predefined set of different language Wikipedias.
** Fetching the wikitext for these articles.
Line 128 ⟶ 129:
** A video overview of Web2Cit translation.<ref name="overview video"/>
** Draft technical specifications, including [https://docs.google.com/document/d/1OlT9VYje1dqQ-WLoEOziU-VphGuAAj_HihqT5RMe5d8/edit general] and [https://docs.google.com/document/d/12RpJGayIYarrH9euC468YHRqFmnLK648FAX_18UDN9Y/edit core-specific] documents.
** A (ready for translation) [[Web2Cit/Early_adopters | guide]] for early adopters who would like to start using Web2Cit (see below).
* The creation of a collaborative list of problematic URLs,<ref name="problematic urls">[https://docs.google.com/spreadsheets/d/1me0pHR8ZeNXjLFicWZzLRHpf2lRfftjAEKCDS0yH6i8/edit Collaborative list of problematic URLs]</ref> which may be used by Web2Cit contributors to start defining ___domain configurations.
* The configuration of [[phab:tag/web2cit | Phabricator project tags and workboards]] to keep track of and engage the community around project tasks, including feature requests, bug reports, etc.
 
==== Early adopter guidelines ====
We published a [[Web2Cit/Early_adopters | guideline]], including a series of companion videos, for those interested in starting to use Web2Cit. Although this requires more advanced technical skills than those that will be needed once the translation editor is available, we expect the barrier to be lower than it currently is for developing Citoid/Zotero translators.<ref>[https://www.zotero.org/support/dev/translators Zotero translator development documentation]</ref>
 
These guideline and videos guide potential contributors through the process of using Web2Cit to fix Citoid/Zotero translation for problematic URLs. Consider the following example, covered in detail in our hands-on video:<ref name="demo video">[https://www.youtube.com/watch?v=kICOBUcmKNI Hands-on demonstration for Web2Cit early adopters]</ref>
Line 173 ⟶ 174:
* '''Workload.''' As mentioned above, this project reminded us of how much time and effort project management requires. However, we had not planned for project management hours in our proposal, nor had we set a separate budget line for it. Going forward, in future projects, we would consider this role separately, allocating specific time and budget to these tasks, as done in other projects (see for example the [[Grants:Project/CS&S/Structured Data on Wikimedia Commons functionalities in OpenRefine#Budget | budget]] of the ''Structured data on Wikimedia Commons functionalities in OpenRefine'' project).
* '''Task tracking.''' From a project management perspective, keeping track of our pending tasks across different sub-projects and teams represents a challenge. We try to use Phabricator as our central task tracker, but it works differently for different sub-projects, because understandably some teams find it more useful than others. Going forward, we plan to continue using Phabricator as our central point of coordination, providing a general overview of pending tasks, in addition to other task tracking strategies that individual sub-project may use.
* '''Balancing creativity and decisiveness.''' As a project manager I sometimes find it challenging to find a balance between giving enough freedom to our team members to explore, come up with creative solutions, propose new ideas, etc, and on the other hand making decisions and prioritizing tasks from the overall perspective of project management. I think (and hope) this project is giving me very valuable experience in this never-ending role of becoming a goodbetter leader.
 
==== Development ====
Line 194 ⟶ 195:
 
==== Project management ====
* '''Documenting discussions.''' Our project includes three sub-projects and we meet regularly with team members to discuss updates and plan future moves. We keep notes of every meeting to keep track of what has been discussed and what decisions were made. This notes have proved helpful, foras exampledescribed toin writethe our[[Learning_patterns/Keeping_documentation_of_discussions_with_team submission| toKeeping WikiWorkshopdocumentation 2022.of Wediscussions havewith ingteam]] ourlearning submission to WikiWorkshop 2021pattern. This isnotes relatedhave toproved learninghelpful, patternfor [[Learning_patterns/Keeping_documentation_of_discussions_with_teamexample |to Keepingwrite documentationour ofsubmission discussionsto withWikiWorkshop team]]2022.
* '''Acknowledging failures and just telling people.''' During the first half of the project, I (the lead developer) was writing my PhD thesis and it soon became apparent that it was taking me longer than expected. This was causing lots of frustration because while I could not postpone writing any further, I was starting to feel the growing weight of the development sub-project backlog. Following Scann's advice, and as described in the invaluable [[Learning_patterns/When_things_go_wrong,_just_tell_people | When things go wrong, just tell people]] learning pattern, we finally decided to reschedule some development milestones and ask for a [[Grants_talk:Project/Diegodlh/Web2Cit:_Visual_Editor_for_Citoid_Web_Translators#Delays_and_new_midpoint_report_due_date | two-month extension]] of the midpoint report due date. This was such a relief! I finished writing my thesis in December 2021, and we were able to meet the rescheduled milestones [[Grants:Project/Diegodlh/Web2Cit:_Visual_Editor_for_Citoid_Web_Translators/Timeline#Month_8_(February) | on time]].
* '''Budget update.''' As mentioned in the [[#Finances | Finances]] section above, even though all parts agreed that the time estimation for the research sub-project was OK, we then found out that it was taking longer than originally planned. Even though we could have just continued, as the project manager I felt this would have been unfair, as described by the [[Learning_patterns/Grant_projects_are_not_startups | Grant projects are not startups]] learning pattern. We therefore decided to userequest using part of our contingency funds to increase the research budget accordingly.
 
==== Development ====
Line 205 ⟶ 206:
 
==== Communications & Community ====
* '''Advisory board.''' As suggested by grant officer [[User:Mjohnson_(WMF) | Marti Johnson]], we put together and Advisory Board with people from diverse backgrounds. This has proved to be a good decision and willit is helphelping us with:
** Thinking and discussing challenging aspects of the project. Learning(see learning patterns [[Learning_patterns/Expert_involvementSustaining_dialogue_with_your_community | ExpertSustaining involvementdialogue with your community]], [[Learning_patterns/Sustaining_dialogue_with_your_communityExpert_involvement | SustainingExpert dialogue with your communityinvolvement]], and [[Learning_patterns/Feedback_cycle | Feedback cycle]] are directly related to this).
** Spreading the voice to relevant communities, as suggested by(see learning pattern [[Learning_patterns/Let_the_community_know | Let the community know]]).
** Supporting the project once the grant finishes, related to(see learning pattern [[Learning_patterns/Community_impact | Community impact]]).
* '''Video documentation.''' Our project includes publishing documentation, such as technical documentation and user guidelines. Even though we acknowledge that in some cases written documentation may be better, as it can be kept up to date more easily, sometimes the time needed to write documentation hindersprevents documentationdocumentations from being written at all. For this cases, we have learned that releasing video documentation may be a good middle point. On the one hand, it is much easier to create. On the other hand, it may be later on adapteradapted by project members and volunteers, who may create written documentation based on thatthem. Finally, sometimes it may also be easier to consume and follow. TakeSee for example our Early adopters hands-ontheoretical demonstrationintroduction.<ref>[https://www.youtube.com/watch?v=gjdTG5UPW9k Theory for Web2Cit early adopters]</ref>
 
* Video documentation. Our project includes publishing documentation, such as technical documentation and user guidelines. Even though we acknowledge that in some cases written documentation may be better, as it can be kept up to date more easily, sometimes the time needed to write documentation hinders documentation at all. For this cases, we have learned that releasing video documentation may be a good middle point. On the one hand, it is much easier to create. On the other hand, it may be later on adapter by project members and volunteers who may create written documentation based on that. Finally, sometimes it may also be easier to consume and follow. Take for example our Early adopters hands-on demonstration.
 
==== Research ====
* '''PAWS.''' TheAs mentioned above, the research team uses [[w:Project_Jupyter | Jupyter notebooks]] to write the automated script. This is very useful because it interleaves code, comments and results. At one point we decided to try [[wikitech:PAWS | PAWS]], thea Jupyter notebooknotebooks instance hosted by Wikimedia,. andThis weturned sawout impressiveto improvementsbe ina speedgreat decision, with impressive performance improvements, as mentioneddescribed in our January 2021 monthly report [[Grants:Project/Diegodlh/Web2Cit:_Visual_Editor_for_Citoid_Web_Translators/Timeline#Month_7_(January) | January report]].
 
* PAWS. The research team uses Jupyter notebooks to write the automated script. This is very useful because it interleaves code, comments and results. At one point we decided to try PAWS, the Jupyter notebook hosted by Wikimedia, and we saw impressive improvements in speed, as mentioned in our [[Grants:Project/Diegodlh/Web2Cit:_Visual_Editor_for_Citoid_Web_Translators/Timeline#Month_7_(January) | January report]].
 
==Next steps and opportunities==
''What are the next steps and opportunities you’ll be focusing on for the second half of your project? Please list these as short bullet points.''
 
* Development
** Develop and release the [[phab:tag/w2c-editor | translation editor]] to enable editing configuration files visually, including internationalization and translation to other languages. Name and estimated dates have been updated in the project's [[Grants:Project/Diegodlh/Web2Cit:_Visual_Editor_for_Citoid_Web_Translators/Timeline | milestones]].
** Add core library support for translation tests ([[phab:T302722]]).
** Publish core library as npm package for easier reuse by other software projects ([[phab:T303294]]).
** Continue testing and fixing bugs in [[phab:tag/w2c-core | core library]] and [[phab:tag/w2c-server | translation server]].
** Internationalize and translate the translation server results page ([[phab:T304837]]).
** Continue writing technical documentation, including guidelines for developers.
** Develop and kick off the [[phab:tag/w2c-monitor | translation monitor]]. Name and estimated dates have been updated in the project's [[Grants:Project/Diegodlh/Web2Cit:_Visual_Editor_for_Citoid_Web_Translators/Timeline | milestones]].
 
* Research
** Fetch Citoid metadata for all 450k reference URLs, following the optimization strategies [[phab:T301510 | discussed]].
** Compare Citoid metadata vs extracted metadata to estimate Citoid's coverage gap. Group data by Wikipedia language and reference ___domain.
** Write and publish research results.
 
* Communication & community
** Create and publish end-user guidelines, including:
*** How to integrate Web2Cit translation to Wikipedia editing workflow.
*** How to use the translation editor to visually tweak translation configuration.
*** How to use the translation monitor to identify and fix translation issues.
** Organize workshops for early adopters and for end users.
** Promote using Web2Cit to address sources identified in our collaborative list of problematic URLs, and by the research team.
** Promote translation of the translation editor and other resources to other languages.
 
==Grantee reflection==
 
''We’d love to hear any thoughts you have on how the experience of being an grantee has been so far. What is one thing that surprised you, or that you particularly enjoyed from the past 3 months?''
 
''(This section is written from the perspective of the main grantee, [[User:Diegodlh | Diego]])''
 
I am really enjoying working on this project so far. On the one hand, as the development lead I am consolidating my skills and learning new ones, such as automated tests and Kubernetes, to mention just a few. Additionally, it let me further understand how much I enjoy the software design phase, about which I am eager to continue learning, including design patterns, unified modelling language, and software architecture in general.
 
On the other hand, as the project manager, this is being a great experience as well. It is the second time that I am leading a team of colleagues, which is a very challenging and gratifying task. This has further highlighted to me to what extent management is a role on its own, one that must be continuously learned and improved.
 
I am very happy with how all sub-projects are developing. I am glad to see that what we so carefully thought of and designed is finally becoming a concrete piece of working software, without having encountered unexpected obstacles that would have sent us back to the drawing board!
 
I am excited to see the research team working so proactively and independently, and with such promising preliminary results. This is the first time that a team of a project led by me is submitting a work to a research conference (WikiWorkshop 2022), and that makes me feel very well.
 
Finally, I also think we are starting to build a community of interested and enthusiastic people around the project, thanks to Evelin. We already have something to show, and I cannot wait to start with our workshops, and to see how people begin to use Web2Cit to improve Wikipedia's compatibility with other sources.
 
==References==