StrepHit: ?
HTML dumps
Enterprise maintains html dumps for 6 wikipedias (as of 2/25)
Coordinated project: Structured Wikipedia
Structure pages for external reuse. Do parsing that reusers already do or need
- HuggingFace (talk to Poli) -- detailed drop templates incl numerical conversions
- Other embeddings : often use a bespoke parsing (wikitext, not html)
- Note the high template/infobox count on some wikis
abstract / entity / sections / infobox / image / ORES scores / revert risk / redirects
- Todo: talk page activity, references, what links here, tables
- Investigation into annotation as upgrade to talk page sections
Other
Classification pages
Infoboxes
Images : extraction and listing
References
Sections