Module talk:String

This is an old revision of this page, as edited by Trappist the monk (talk | contribs) at 18:52, 10 October 2013 (Match problem). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Latest comment: 11 years ago by Trappist the monk in topic Match problem

See also

  • {{[[Module:{{{1}}}|#invoke:{{{1}}}]]|function}}

sub

Why does sub only return a single character? It returns characters in the strong from position "i" to position "i" (only a single character). Shouldn't it go from "i" to "j" like the lua support page suggests? Banaticus (talk) 00:04, 21 February 2013 (UTC)Reply

Because I made a stupid typo. Already fixed by Tim Starling. Anomie 00:53, 21 February 2013 (UTC)Reply

error_category

The documentation/comment in the top says: error_category: The default category is ... [Category:Errors reported by Module String]. The category has regular double brackets I assume, or is there an exception in play? -DePiep (talk) 02:44, 27 February 2013 (UTC)Reply

Yeah, the issue is that double bracket gets interpreted as open / close comment in Lua, so you can't really write that in the middle of the documentation without messing things up. Dragons flight (talk) 03:44, 27 February 2013 (UTC)Reply

Error category: two arguments for one?

To set (overrule) the error cat, one can use two arguments: error_category=... and no_category=true. Why is that not one single argument: just enter error_category=<blank> could withhold the category adding. As it is now, there even is the futile situation: error_category=[[MyCategory]] and no_category=true. -DePiep (talk) 12:02, 6 March 2013 (UTC)Reply

The presence of two parameters came about in an effort to support the existing templates. As I recall, some want one kind of control, and some want the other. It could probably be standardized, but in the initial migration I was trying to avoid making too many changes to the behavior of existing templates. There is also a bit of notation problem if one has a default category, since then it becomes unclear whether error_category= (empty string) is meant as "no category" or as "use the default category". Dragons flight (talk) 17:24, 9 March 2013 (UTC)Reply

Match

Is there a way to use match to eliminate hyphens from ISBN numbers? For instance: 978-1-4200-9050-X to 978142009050X. I tried,

{{#invoke:String|match|s=978-1-4200-9050-X|pattern=^(%d*)-*(%d*)-*(%d*)-*(%d*)-*(%d*X*)}} > 978

but I couldn't make work. Anybody can help me? —– Jaider Msg 20:06, 12 March 2013 (UTC)Reply

If you just want to eliminate hyphens, shouldn't you replace them with empty strings, i.e.
{{#invoke:String|replace| source=978-1-4200-9050-X | pattern=- | replace= }} = 978142009050X
You can use match to ensure that the input or output has the appropriate ISBN form, if that is also important. Dragons flight (talk) 00:50, 13 March 2013 (UTC)Reply
Thanks! But my question is not just about ISBNs. How can we access several values returned from {{#invoke:String|match|...}}? (in other words, several (...) in patterns). And how can we use match to ensure that the input and output has the appropriate ISBN form? —– Jaider Msg 01:15, 13 March 2013 (UTC)Reply
At present, you can't access multiple (...), not from a template anyway. This is something I should think about how to address. As to using match for checking, something like:
{{#invoke:String|match|s=978-1-4200-9050-X|pattern=^%d[%d-]*X?$ | nomatch = Not ISBN }} = 978-1-4200-9050-X
{{#invoke:String|match|s=978-1-BARK-9050-X|pattern=^%d[%d-]*X?$ | nomatch = Not ISBN }} = Not ISBN
Will work if you aren't picky about the number of digits or the placement of dashes. If you want to be careful about the details you can build a more sophisticated test by using several test calls or writing a short script in Lua. Dragons flight (talk) 02:08, 13 March 2013 (UTC)Reply
Great! Well, I am not a programmer and I am not sure about Lua stuff, but I made the following script:
local p = {}
function p.isbn(frame)
    local isbnString = frame.args[1]  or ""
    local value1, value2, value3, value4, value5 = string.match(isbnString, "^(%d*)-*(%d*)-*(%d*)-*(%d*)-*(%d*X*)")
    return value1, value2, value3, value4, value5
end
return p

And it works ({{#invoke:SomePage|isbn|978-1-4200-9050-X}} = 978142009050X). Could it be a kind of a solution for several "(...)" in patterns? —– Jaider Msg 12:46, 13 March 2013 (UTC)Reply

Yes, Lua can match and return multiple patterns. The tricky part is writing a template interface that could access that in a sensible way, especially if you don't know in advance how many capture patterns (...) the template author might want to use. The string module exists mostly to support legacy template code and to provide some string functionality to editors who understand templates but aren't willing to try Lua directly. For a simple dedicated task, like finding an ISBN, writing a short Lua script is probably easier. Congratulations on your first one. Dragons flight (talk) 13:50, 13 March 2013 (UTC)Reply

Pages as strings?

Is it possible to modify this script to allow whole pages as input? For example, if one wanted to include information about article size to Wikipedia:Vital articles? Or would 1000 instances of the script be too much to run on every single page load? — Yerpo Eh? 12:07, 28 April 2013 (UTC)Reply

Yes, there are ways to operate on an entire page's content, though if all you wanted was page size then the parser function {{PAGESIZE:page name}} probably makes more sense. However, loading entire pages is expensive, which means somewhat slow and limited to no more than 500 times per page. That limit applies to the PAGESIZE: parser function as well, so neither Lua nor PAGESIZE: would work if you needed 1000 iterations on a single page. Dragons flight (talk) 17:15, 28 April 2013 (UTC)Reply
Ah, I somehow assumed that Lua would magically make loading pages trivial in terms of server load, silly me. Thanks. — Yerpo Eh? 18:19, 29 April 2013 (UTC)Reply
There are ways to minimize the need to call expensive parser functions repeatedly in some cases. In Lua the result can be stored in a variable for reuse. Similar can be done with templates by passing the result of an expensive parser function as the value of a template argument/parameter. Either way the result could be printed a trillion times with one invocation or template call, as long as the time allocated for Lua and template expansions isn't exceeded. If you look at the result of {{#invoke:string|rep|{{PAGESIZE}}•|1000}} for example, the page size is displayed 1000 times with only one invoke, because the expensive parser function was only called once. This wouldn't work for the specific use case you have in mind though, as there are more then 500 different pages to get the page size for. --darklama 19:52, 29 April 2013 (UTC)Reply

Not really a script-related subquestion, but come to think about it, parsing pages isn't necessarily unavoidable if all I want is page size. Is there an on-wiki handle available to extract it from page history? — Yerpo Eh? 05:51, 30 April 2013 (UTC)Reply

Besides {{PAGESIZE:page name}}, the MediaWiki API can be queried through JavaScript to find out page sizes. JavaScript is probably going to be the only way you will be able to include the current size for every page. --darklama 10:53, 30 April 2013 (UTC)Reply

Replication on other wikis

Hi, I have just discovered this new "Lua programming" functionality in wikimedia (I am an Italian user). I want to make a question (I don't know if other users have already talked about this). I have noticed that all wikis are replicating this base library (String), changing only the error messages (localization). Does not exists a feature for using only 1 shared String library among wikis (like images on Commons), instead of replicating it for each wiki? "String" is a very base library and if someone discover a bug here, the fix should to be propagated in each wiki (or vice versa).

If a shared library is not possible, at least it would be better to set the localization error strings as variables at the beginning of the source code, so that in other wikis we can cut&paste all the remaining part of the code without changing a line.

If somewhere you have already talked about these problems I would be happy to read about it. Thanks! --Rotpunkt (talk) 11:52, 1 May 2013 (UTC)Reply

No sharing mechanism currently exists, other than cut and paste. There has been general discussion at the WMF about creating a central code repository for key scripts, but that is likely to be at least months away. Yes, we should do a better job of making localization easier. Dragons flight (talk) 13:39, 1 May 2013 (UTC)Reply
Ok, thanks. I will looking for that discussion. As a repository, would be nice for example if the modules on Commons (http://commons.wikimedia.org/wiki/Commons:Lua/Modules) could be called from all wikis, so that we could put there the most used libraries (like String), in the same way we use Commons for images. I don't know where we could ask for such a feature... here: http://www.mediawiki.org/wiki/Extension_talk:Scribunto ? --Rotpunkt (talk) 14:39, 1 May 2013 (UTC)Reply
Translations might be possible with something like msg = mw.message.new('Empty string'):plain();. If I've understood the documentation correctly the message is retrieved from MediaWiki:Empty_string. $1, $2, etc. can be filled in by including additional parameters to mw.message.new. It might also be possible to use MediaWiki:Empty_string/it to include both English and Italian translations for example. If I've understood the documentation correctly this would cut down on needing to edit the module at all. --darklama 16:27, 1 May 2013 (UTC)Reply
To fully localize the script would be necessary also to localize the default error category.--Moroboshi (talk) 06:56, 3 May 2013 (UTC)Reply
Well, actually a full localization would also localize the arguments.--Snaevar (talk) 23:57, 3 May 2013 (UTC)Reply

Help needed

Please do not deactivate this {{help me}} until 09 Jun 2013 unless you are answering my question. I know that anyone who can help probably has this page watchlisted, but just in case... Now, my questions:

  1. Is there anyway I can shorten the following replace sequence?
    • {{#invoke:String|replace|{{#invoke:String|replace|{{Str sub old|{{{TEST-STRING}}}|0|25}}|[^%[%]\{}%`%^%-%w]|_|plain=false}}|^[^%[%]\{}%`%^%a]|_|plain=false}}
    • The process currently truncates {{{TEST-STRING}}} to 25 characters, replaces all characters outside of the "allowed" set [^%[%]\{}%`%^%-%w] with _, then finally replaces the first character of the string with _ if it is outside the "allowed" first character set [^%[%]\{}%`%^%a]
  2. The next question is, how do I test the result of the above process to see if all of the characters have been replaced with _?
    • I was thinking something like {{#ifeq:{{#invoke:String|len|{{{TEST-STRING}}}|MATCH|100% invalid input...|{{#invoke:String|replace|{{#invoke:String|replace|{{Str sub old|{{{TEST-STRING}}}|0|25}}|[^%[%]\{}%`%^%-%w]|_|plain=false}}|^[^%[%]\{}%`%^%a]|_|plain=false}}}} but I don't know how to count the instances of "_" in the string to fill in the "MATCH" section...
i think maybe it would be better if you try to explain what are you actually trying to do, rather than asking us to suggest methods to optimize some obscure piece of code, no? peace - קיפודנחש (aka kipod) (talk) 18:29, 7 June 2013 (UTC)Reply
It is for work on the Template:Freenode/sandbox that adds an argument to allow the person leaving the template to specify an IRC handle based on the user's wikipedia username. Technical 13 (talk) 18:34, 7 June 2013 (UTC)Reply
More accurately, to make sure the inputted string (username) is appropriately modified so it follows the IRC rules for names
  • Maximum 25 characters [So truncating the string]
  • First character cannot be number [So replacing first character by _]
  • No character can be outside a lit of characters (a-z,A-Z,0-9,_) [So replacing all of them by _]
Correct me if there are more rules/ the rules listed are incomplete.
TheOriginalSoni (talk) 19:02, 7 June 2013 (UTC)Reply
i still do not understand what you try to do. let me try to focus the question: are you looking for a template/function that will receive a string and will return a boolean (or 0/1 or whatever) that indicates whether this string is "kosher" (according to some criteria), or are you trying to create something that receives a string and cook a "legal" string out of it? or maybe something else entirely? if it's something else, can you explain it again? maybe i'll have better luck understanding it this time around... peace - קיפודנחש (aka kipod) (talk) 20:13, 7 June 2013 (UTC)Reply
You should write the logic in Lua instead of parser functions that call Lua, and then replace that mess in your template with {{#invoke:YourModule|functionName|{{{VARIABLE}}}}}. Seriously, there's no reason at all to do what you did there. And I also note that those replace calls won't even do what you want, since Freenode doesn't appear to allow UTF-8 in nicks. Anomie 20:41, 7 June 2013 (UTC)Reply
Anomie would you be willing to help me with that? I don't know how to write the logic in Lua yet. I came here to ask because I knew that there had to be an easier shorter way to do it, but I did not know how. To answer your question kipod, create something that receives a string and cook a "legal" string out of it is the goal. Technical 13 (talk) 21:07, 7 June 2013 (UTC)Reply
Something vaguely like this should get you started.
local p = {}

function p.guessNick( frame )
    local username = frame.args[1]
    local nick

    -- First, strip out non-ASCII as best we can
    -- Note this will totally fail for non-Latin-script usernames. Nothing much we can do about that.
    nick = mw.ustring.toNFD( username )
    nick = string.gsub( nick, '[^\32-\126]', '' )

    -- Next, replace other unacceptable characters
    if string.match( nick, '^[0-9%-]' ) then
        -- Begins with a number, so prepend an underscore
        nick = '_' .. nick
    end
    nick = string.gsub( nick, '[^a-zA-Z0-9_%-%[\%]{|}^`]+', '_' )

    -- Cut to 25 characters
    nick = string.sub( nick, 1, 25 )

    return nick
end

return p
Anomie 03:55, 8 June 2013 (UTC)Reply

Match problem

Is there a problem with match or is it something that I don't understand? Match is supposed to return the string that matches a pattern.

If I want all of the digits up to and including the '4' in the string '1234567890' I do this:

{{#invoke:String|match|1234567890|%w*4|nomatch=no match}} → 1234

If I want the length of a string I do this:

{{#invoke:String|len|1234}} → 4

If I want to find the length of the matched string I do this:

{{#invoke:String|len|{{#invoke:String|match|1234567890|%w*4|nomatch=nomatch}} }} → 5

Isn't '5' the wrong result?

Trappist the monk (talk) 14:32, 10 October 2013 (UTC)Reply

Try it without the space on the end. -- WOSlinker (talk) 18:19, 10 October 2013 (UTC)Reply
{{#invoke:String|len|1234}} → 4
{{#invoke:String|len|1234 }} → 5
{{#invoke:String|len|{{#invoke:String|match|1234567890|%w*4|nomatch=nomatch}}}} → 4
"O that he were here to write me down an idiot! But, masters, remember that I am an idiot; though it be not written down, yet forget not that I am an idiot." (apologies to Shakespeare's Dogberry).
Trappist the monk (talk) 18:52, 10 October 2013 (UTC)Reply