User:Sj/!/struct

This is an archived version of this page, as edited by Sj (talk | contribs) at 22:36, 24 February 2025 (Created page with "StrepHit: ? == HTML dumps == Enterprise maintains html dumps for 6 wikipedias (as of 2/25) == Coordinated project: Structured Wikipedia == [https://huggingface.co/datasets/wikimedia/structured-wikipedia Repository] Structure pages for external reuse. Do parsing that reusers already do or need * HuggingFace (talk to Poli) -- detailed drop templates incl numerical conversions * Other e..."). It may differ significantly from the current version.
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

StrepHit: ?

HTML dumps

Enterprise maintains html dumps for 6 wikipedias (as of 2/25)

Coordinated project: Structured Wikipedia

Repository

Structure pages for external reuse. Do parsing that reusers already do or need

  • HuggingFace (talk to Poli) -- detailed drop templates incl numerical conversions
  • Other embeddings : often use a bespoke parsing (wikitext, not html)
  • Note the high template/infobox count on some wikis

abstract / entity / sections / infobox / image / ORES scores / revert risk / redirects

Todo: talk page activity, references, what links here, tables
Investigation into annotation as upgrade to talk page sections

Other

Classification pages

Infoboxes

Images : extraction and listing

References

Sections