Module talk:Tabular data: Difference between revisions

Content deleted Content added
Bean49 (talk | contribs)
 
(26 intermediate revisions by 4 users not shown)
Line 40:
 
:{{ping|Timbaaa}} That's definitely feasible, though it might be easier to integrate something with {{tl|Graph:Lines}}, which is already pretty usable with tabular data, as seen in [[COVID-19 pandemic in the San Francisco Bay Area#Cases by county over time]]. It would look pretty similar to the existing <code>_wikitable()</code> function, but just the part that collects the <code>title</code>s of the elements in <code>data.schema.fields</code>. If you're planning to use this functionality inside a module instead of directly inside a template or article, I'd suggest working with <code>mw.ext.data.get(…).schema.fields</code> directly so you have maximum control over formatting. &ndash;&nbsp;[[User:Mxn|Minh <span style="font-variant: small-caps;">Nguyễn</span>]]&nbsp;<sup>[[User talk:Mxn|<span style="display: inline-block;">&#x1f4ac;</span>]]</sup> 19:46, 19 September 2020 (UTC)
 
== Search as Number ==
 
Great job!
 
For some reason, it doesn't work for me. For example, a request like this:
 
<pre><nowiki>{{#invoke: Tabular data | lookup | COVID-19 Slovenia cases per capita.tab | search_value = 261 | search_column = cases | output_column = name}}</nowiki></pre>
 
returns an empty string instead of "Ajdovščina".
 
Help me please.<!-- Template:Unsigned --><span class="autosigned" style="font-size:85%;">—&nbsp;Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[User:Игорь Темиров|Игорь Темиров]] ([[User talk:Игорь Темиров#top|talk]] • [[Special:Contributions/Игорь Темиров|contribs]]) 19:14, 8 November 2020 (UTC)</span><!-- Template:Xsign -->
 
: There's no {{tq|261}} in the {{mono|cases}} column of [[c:Data:COVID-19 Slovenia cases per capita.tab]]. The {{mono|cases}} value for ''Ajdovščina'' is 2204. — [[User:Guarapiranga|𝐆𝐮𝐚𝐫𝐚𝐩𝐢𝐫𝐚𝐧𝐠𝐚]]&nbsp;[[User talk:Guarapiranga|☎]] 01:05, 24 June 2021 (UTC)
 
== Search more than 1 column ==
 
Would it be possible to make it search two (or more) columns?<br>I.e.:<br>{{mlx|{{BASEPAGENAME}}|''lookup''|''Page name.tab''|search_value{{=}}|search_column{{=}}|{{uline|search_value2{{=}}{{!}}search_column2{{=}}}}|...|output_column={{=}}|output_column2={{=}}|...|output_format{{=}}}}<br>E.g.:<br>{{mlx|{{BASEPAGENAME}}|lookup|UN:Total population, both sexes combined.tab|search_value{{=}}''Afghanistan''|search_column{{=}}''Country''|search_value2{{=}}''1950''|search_column2{{=}}''Year''|output_column{{=}}''Value''}}
 
— [[User:Guarapiranga|𝐆𝐮𝐚𝐫𝐚𝐩𝐢𝐫𝐚𝐧𝐠𝐚]]&nbsp;[[User talk:Guarapiranga|☎]] 01:00, 24 June 2021 (UTC)
 
: I've created [[Module:Tabular_data/sandbox]] with a function to try and handle the second search requirement. It doesn't work. However, I can't get the existing module to return data from [[:c:Data:UN:Total population, both sexes combined.tab]].
{|
|-
|<pre>
{{#invoke:Tabular data|lookup
|search_column=date
|search_value=2020-03-16
|output_column=totalConfirmedCases
|COVID-19 cases in Santa Clara County, California.tab}}</pre>
|
{{#invoke:Tabular data|lookup
|search_column=date
|search_value=2020-03-16
|output_column=totalConfirmedCases
|COVID-19 cases in Santa Clara County, California.tab}}
|-
|<pre>
{{#invoke:Tabular data|lookup
|search_value=Afghanistan|search_column=Country
|output_column=Value
|UN:Total population, both sexes combined.tab}}
</pre>
|
{{#invoke:Tabular data|lookup
|search_value=Afghanistan|search_column=Country
|output_column=Value
|UN:Total population, both sexes combined.tab}}
|-
|<pre>
{{#invoke:Tabular data/sandbox|lookup
|UN:Total population, both sexes combined.tab
|search_value=Afghanistan|search_column=Country
|output_column=Value}}
</pre>
|
{{#invoke:Tabular data/sandbox|lookup
|UN:Total population, both sexes combined.tab
|search_value=Afghanistan|search_column=Country
|output_column=Value}}
|-
|<pre>
{{#invoke:Tabular data/sandbox|lookup2
|UN:Total population, both sexes combined.tab
|search_value=Afghanistan|search_column=Country
|search_value2=1950|search_column2=Year
|output_column=Value}}
</pre>
|
{{#invoke:Tabular data/sandbox|lookup2
|UN:Total population, both sexes combined.tab
|search_value=Afghanistan|search_column=Country
|search_value2=1950|search_column2=Year
|output_column=Value}}
|-
|<pre>
{{#invoke:Tabular data/sandbox|lookup2
|UN:Total population, both sexes combined.tab
|search_value=Zambia|search_column=Country
|search_value2=2020|search_column2=Year
|output_column=Value}}
</pre>
|
{{#invoke:Tabular data/sandbox|lookup2
|UN:Total population, both sexes combined.tab
|search_value=Zambia|search_column=Country
|search_value2=2020|search_column2=Year
|output_column=Value}}
|}
: What am I missing? Could it be the page name with a colon that is invalid? —&nbsp;<span style="font-family:Arial;background:#d6ffe6;border:solid 1px;border-radius:5px;box-shadow:darkcyan 0px 1px 1px;">&nbsp;[[User:Jts1882|Jts1882]]&nbsp;&#124;[[User talk:Jts1882|&nbsp;talk]]&nbsp;</span> 13:41, 30 June 2021 (UTC)
:: There were a few issues:
::# The page name is a numbered parameter so should be trimmed. The sandbox does this now and the non-sandbox example above is edited to remove white space and linefeeds.
::# The search comparisons assume string values. The population data has numbers so these need to be converted before comparing (as done in the sandbox) or the module modified to use the types (more involved).
:: Anyway, this shows how the data in the data page at commons can be retrieved. The two examples (Afghanistan 1950 and Zambia 2020) get the right numbers. —&nbsp;<span style="font-family:Arial;background:#d6ffe6;border:solid 1px;border-radius:5px;box-shadow:darkcyan 0px 1px 1px;">&nbsp;[[User:Jts1882|Jts1882]]&nbsp;&#124;[[User talk:Jts1882|&nbsp;talk]]&nbsp;</span> 16:32, 30 June 2021 (UTC)
:::Great! Thanks, {{u|Jts1882}}.
:::I wrapped it (that particular application) in a [[Template:UN pop|template]], and started testing it out [[Draft:List of countries and dependencies by population density|here]]. It works, but runs out of time limit pretty quickly. Could it be made more efficient? — [[User:Guarapiranga|𝐆𝐮𝐚𝐫𝐚𝐩𝐢𝐫𝐚𝐧𝐠𝐚]]&nbsp;[[User talk:Guarapiranga|☎]] 02:46, 1 July 2021 (UTC)
::::You probably don't need ''ustring'' (do you?), and ''string'' will do the trick, but I don't see that making a big difference there.
::::One option is specifying the columns by number so the module doesn't have to search for them by name. Inconvenient, but... faster. (Then, again, with only 3 columns on that table, I don't see it making much of a difference). — [[User:Guarapiranga|𝐆𝐮𝐚𝐫𝐚𝐩𝐢𝐫𝐚𝐧𝐠𝐚]]&nbsp;[[User talk:Guarapiranga|☎]] 02:55, 1 July 2021 (UTC)
:::::Oh, I see, you have to pull down the whole table at every call:<pre>local data = args.data or mw.ext.data.get(page)</pre>Yeah, it ain't small (it's at the 2MB limit). What's the alternative; slicing it into a different table for each year? Any batch proc for doing that? — [[User:Guarapiranga|𝐆𝐮𝐚𝐫𝐚𝐩𝐢𝐫𝐚𝐧𝐠𝐚]]&nbsp;[[User talk:Guarapiranga|☎]] 07:40, 1 July 2021 (UTC)
:::::: That was a concern for memory usage but I think that is cached. I did a few tests on a blank page and the memory usage didn't increase dramatically when calling the template multiple times. It's clearly processing time which goes up with the number of calls. There is no noticeable difference between Afghanistan and Zambia so the looping is fast (as expected). It's still possible <code>mw.ext.data.get()</code> is responsible, even if not loading each time, as dealing with the cache might take time. It needs some more tests.
:::::: Incidentally your template seems to have considerable overhead, both doubling the time and causing a high expansion depth. Using invoke gets the time down to just over 100ms each call. —&nbsp;<span style="font-family:Arial;background:#d6ffe6;border:solid 1px;border-radius:5px;box-shadow:darkcyan 0px 1px 1px;">&nbsp;[[User:Jts1882|Jts1882]]&nbsp;&#124;[[User talk:Jts1882|&nbsp;talk]]&nbsp;</span> 08:00, 1 July 2021 (UTC)
::::::: I've created a function that does the bare minimum (<code>p.lookup2_minimal()</code>, line 208). It doesn't reduce the time substantially (100-150ms depending on run). So I think this sort of template can only be used safely about 50 times on a page.
::::::: For generating tables, the alternative is a module that takes a list of countries like [[Module:Country population]]. A lot more work, but you can get it to do exactly what is needed with appropriate options. —&nbsp;<span style="font-family:Arial;background:#d6ffe6;border:solid 1px;border-radius:5px;box-shadow:darkcyan 0px 1px 1px;">&nbsp;[[User:Jts1882|Jts1882]]&nbsp;&#124;[[User talk:Jts1882|&nbsp;talk]]&nbsp;</span> 09:05, 1 July 2021 (UTC)
:::::::{{tq|Incidentally your template seems to have considerable overhead, both doubling the time and causing a high expansion depth. Using invoke gets the time down to just over 100ms each call.}}
:::::::Can you pinpoint what it is? I invoke the module once if the given date parameter is a year, and twice to interpolate values for a specific date (and once again before those to verify that the country is on the table). What do you reckon needs to be done to reduce the overhead, {{u|Jts1882}}? — [[User:Guarapiranga|𝐆𝐮𝐚𝐫𝐚𝐩𝐢𝐫𝐚𝐧𝐠𝐚]]&nbsp;[[User talk:Guarapiranga|☎]] 08:30, 3 July 2021 (UTC)
:::::::: It looks like it's invoked twice if given date parameter is a year and three times if a specific date. It's invoked once for the test and then once or twice for the output. Is the test necessary? I've removed it in the template and the output in the documentation is the same, but takes less processing time (a bit more than half). —&nbsp;<span style="font-family:Arial;background:#d6ffe6;border:solid 1px;border-radius:5px;box-shadow:darkcyan 0px 1px 1px;">&nbsp;[[User:Jts1882|Jts1882]]&nbsp;&#124;[[User talk:Jts1882|&nbsp;talk]]&nbsp;</span> 09:01, 3 July 2021 (UTC)
:::::::::I saw that a large chunk of the {{tq|transclusion expansion time}} was from {{t|density}}, so I replaced the call to {{t|convert}}, which in turn invokes a very general and flexible [[module:convert|module]], by a simple calc only for km{{sup|2}} and sqmi. {{t|Density}} is still at 77% of {{tq|transclusion expansion time}} (not sure how those pcts add up, with {{t|UN pop}} at 65%), and [[Draft:List of countries and dependencies by population density]] is still timing out Lua (it seems) at the 14th table row (Jersey). I see that 93% of the Lua time is consumed by {{mono|Scribunto_LuaSandboxCallback::get}} -- what's that? Cheers. — [[User:Guarapiranga|𝐆𝐮𝐚𝐫𝐚𝐩𝐢𝐫𝐚𝐧𝐠𝐚]]&nbsp;[[User talk:Guarapiranga|☎]] 12:24, 3 July 2021 (UTC)
:::::::::: I assume that is the function behind <code>mw.ext.data.get()</code>.
:::::::::: There is something odd in [[Draft:List of countries and dependencies by population density]]. While it starts giving timeout error in the 14th line, other lines display without errors down to line 59. What makes those lines avoid the timeout? —&nbsp;<span style="font-family:Arial;background:#d6ffe6;border:solid 1px;border-radius:5px;box-shadow:darkcyan 0px 1px 1px;">&nbsp;[[User:Jts1882|Jts1882]]&nbsp;&#124;[[User talk:Jts1882|&nbsp;talk]]&nbsp;</span> 13:46, 3 July 2021 (UTC)
::::::::::: Yeah, I noticed that. Bloody good question. I sort of assumed the invoke queue doesn't follow the order in the code. — [[User:Guarapiranga|𝐆𝐮𝐚𝐫𝐚𝐩𝐢𝐫𝐚𝐧𝐠𝐚]]&nbsp;[[User talk:Guarapiranga|☎]] 14:23, 3 July 2021 (UTC)
 
== Null value and output format error ==
 
If the lookup points to a null value cell and there is an output format, it gives an error.
{|
|-
|
<syntaxhighlight lang="wikitext">
A{{#invoke:Tabular data|lookup
|search_column=date
|output_column=hospitalized
|search_value=2020-01-27
|COVID-19 cases in Santa Clara County, California.tab}}B
</syntaxhighlight>
|
A{{#invoke:Tabular data|lookup
|search_column=date
|output_column=hospitalized
|search_value=2020-01-27
|COVID-19 cases in Santa Clara County, California.tab}}B
|-
|
<syntaxhighlight lang="wikitext" highlight="2">
A{{#invoke:Tabular data|lookup
|output_format=There are %d people
|search_column=date
|output_column=hospitalized
|search_value=2020-01-27
|COVID-19 cases in Santa Clara County, California.tab}}B
</syntaxhighlight>
|
A{{#invoke:Tabular data|lookup
|output_format=There are %d people
|search_column=date
|output_column=hospitalized
|search_value=2020-01-27
|COVID-19 cases in Santa Clara County, California.tab}}B
|}
Could you fix this, please? [[User:Bean49|Bean49]] ([[User talk:Bean49|talk]]) 20:05, 23 January 2024 (UTC)
 
Could be more output columns and only one null. <code>%d out of %d users are administrators</code> [[User:Bean49|Bean49]] ([[User talk:Bean49|talk]]) 20:24, 23 January 2024 (UTC)