Talk:Comparison of programming languages (string functions): Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Add topic

Revision as of 05:18, 22 June 2009 edit SineBot (talk \| contribs) Bots 2,564,654 edits m Signing comment by 203.206.162.148 - "→ASC: new section" ← Previous edit		Latest revision as of 15:25, 27 July 2024 edit undo 5.189.81.197 (talk) →Mentioning strtok as C/C++ way of splitting strings: new section Tag: New topic
(19 intermediate revisions by 12 users not shown)
Line 1: {{WikiProject banner shell\|class=B\| {{WikiProject Computing\|importance=mid}} }} ==C function toupper() in UpperCase== Line 7 ⟶ 10: If c is a lowercase letter (a-z), topupper() returns the uppercase version (A-Z). Otherwise toupper() returns c unchanged. toupper() does not convert international characters (those with ASCII codes over 0x80), like ă or ç. To uppercase a whole string you need to write a function something like this: <~~source~~syntaxhighlight lang="c"> #include <ctype.h> //standard C header file with the prototype of toupper() Line 22 ⟶ 25: } } </syntaxhighlight> ~~</source>~~ In C strings are essentially pointers to a character and they end where there is a NULL ('\0') character. It would be worthwhile to explain what strings are in different languages.[[User:Senor Cuete\|Senor Cuete]] ([[User talk:Senor Cuete\|talk]]) 03:41, 10 May 2008 (UTC)Senor Cuete Line 28 ⟶ 31: The 1. should appear as a pound sign and the box is put there by Wiki's text engine. I didn't type it like that.[[User:Senor Cuete\|Senor Cuete]] ([[User talk:Senor Cuete\|talk]]) 03:44, 10 May 2008 (UTC)Senor Cuete :The <nowiki><~~source~~syntaxhighlight lang="...">...</~~source~~syntaxhighlight></nowiki> tag should fix it. [[User:Ghettoblaster\|Ghettoblaster]] ([[User talk:Ghettoblaster\|talk]]) 12:43, 10 May 2008 (UTC) == Compare (integer result, fast/non-human ordering) == Line 34 ⟶ 37: In the table row for C, why would you go through the hassle of writing your own function when you could call the C function strncmp? <~~source~~syntaxhighlight lang="C">#include <string.h> int strncmp(const char s1, const char s2, size_t n); </~~source~~syntaxhighlight>[[User:Senor Cuete\|Senor Cuete]] ([[User talk:Senor Cuete\|talk]]) 00:52, 16 May 2008 (UTC)Senor Cuete == substring == Shouldn't the table row for C just mention the C function strncpy? <~~source~~syntaxhighlight lang="C">#include <string.h> char strncpy(char s1, const char s2, size_t n); </syntaxhighlight> ~~</source>~~ Why concatenate when you can copy?[[User:Senor Cuete\|Senor Cuete]] ([[User talk:Senor Cuete\|talk]]) 00:53, 16 May 2008 (UTC)Senor Cuete Line 73 ⟶ 76: Not exactly equivalent to any string function in any language which handles strings differently, but in BASIC it was a string function. <span style="font-size: smaller;" class="autosigned">—Preceding [[Wikipedia:Signatures\|unsigned]] comment added by [[Special:Contributions/203.206.162.148\|203.206.162.148]] ([[User talk:203.206.162.148\|talk]]) 05:17, 22 June 2009 (UTC)</span><!-- Template:UnsignedIP --> <!--Autosigned by SineBot--> It's called ORD() in many languages (since the character set / language / font may not be ASCII, but the idea is the same). This Wikipedia Page String Function comparison could use a section on (number to/from string, character to/from string) http://rosettacode.org/wiki/Character_code#Python --[[User:BrianFennell\|BrianFennell]] ([[User talk:BrianFennell\|talk]]) 22:37, 3 September 2009 (UTC) == substring, startpos, base? == Ark! The substring table does not list the base for startpos and endpos. Is the startpos=1 the first character in the parent string, or the second? <span style="font-size: smaller;" class="autosigned">—Preceding [[Wikipedia:Signatures\|unsigned]] comment added by [[Special:Contributions/203.206.162.148\|203.206.162.148]] ([[User talk:203.206.162.148\|talk]]) 05:57, 22 June 2009 (UTC)</span><!-- Template:UnsignedIP --> <!--Autosigned by SineBot--> == Square bracket as syntax == There is a problem here: sometimes the square brackets indicate on optional field: string(1[,n]), and sometimes are part of the language: string[1,n]. That leaves the problem that we can't always see that part of the command is optional: string[1 /,n/]. <span style="font-size: smaller;" class="autosigned">—Preceding [[Wikipedia:Signatures\|unsigned]] comment added by [[Special:Contributions/203.206.162.148\|203.206.162.148]] ([[User talk:203.206.162.148\|talk]]) 06:03, 22 June 2009 (UTC)</span><!-- Template:UnsignedIP --> <!--Autosigned by SineBot--> :I see that it's been Fixed now - thank you whoever :~) [[Special:Contributions/203.206.162.148\|203.206.162.148]] ([[User talk:203.206.162.148\|talk]]) 07:22, 14 January 2010 (UTC) == LUA missing as programming language == I missed lua in this page. I'm willing to add lua examples (which might take some time) but there should be someone to cross-read them. Or are there reasons ''not'' to have lua in the examples? == LUA string.find and string.gsub misplaced? == These functions work with pattern matching, not with plain strings (well, find can be forced to do so with additional options) There should be at least a comment about this. [[User:Bassklampfe\|Bassklampfe]] ([[User talk:Bassklampfe\|talk]]) 15:12, 30 November 2010 (UTC) == Removal of "Compare (integer result, fast/non-human ordering)" == I am removing the '''Compare (integer result, fast/non-human ordering)''' section, for the following reasons: # ''This is not a common or primitive operation.'' Observe that of the languages listed, ''not one'' provides a built-in operator or standard library function to perform this type of comparison. Only one of the examples calls a single function, and that is in an uncommon third-party library. The rest are all implemented in terms of structural comparison of tuples (not a string operation at all) or sequential boolean OR (using the basic string comparison already detailed in the previous section). The section therefore does not in fact provide any new information about string functions at all. It merely describes an alleged optimisation technique. But... # ''This is not even an optimisation in most cases.'' The complicated "fast" approaches given in the article all involved more operations than the straightforward standard approach, nullifying any speed improvement they might have brought. The OCaml and Ruby examples were particularly bad, since the "fast" versions actually involved allocating and freeing memory on the heap!<br />I ran some benchmarks in Perl and OCaml, and I was unable to find any cases where the "fast" version was not actually ''slower'' than the standard approach. In one case (OCaml, comparing short strings), the code given in the article was literally 33% slower than a straightforward <tt>String.compare</tt>!<br />It's possible that things might be different in other languages, and the technique ''might'' be generally faster in some very restricted circumstances (maybe when comparing ''very'' long strings that are ''very'' similar?), but it is clearly not something that anyone should be using without benchmarking it against their own data; and it's unlikely that string comparisons will frequently be enough of a bottleneck to justify this kind of micro-optimisation in the first place. In short, this is not the kind of useful information that Wikipedia prides itself on spreading, and I don't think it belongs in this article. [[Special:Contributions/87.194.117.80\|87.194.117.80]] ([[User talk:87.194.117.80\|talk]]) 17:04, 26 July 2009 (UTC) == equivalence relation missing == This article deals with three ways to compare string (equality, compare, and strcmp). This might have some issues: * From my understanding, the three ones cover the same feature. * This feature is not defined as long as lexicographical order is not defined. * It is not clear if this comparison is a low level comparison, or on an equivalence basis. For instance, how do you compare [[Montréal]] and [[Montréal]] (the two canonically equivalents UTF16 unicode forms)? {\| class="wikitable" style="text-align: center;" \|+ \|''Montréal'' (a city in North America) with its two canonically equivalents UTF16 unicode forms (NFC and NFD)''' \|- ! style="width: 10em;" \|character \| M \|\| o \|\| n \|\| t \|\| r \|\|colspan="2" \| é \|\| a \|\| l \|- \| colspan="24" style="height: 1pt;" \| \|- ! rowspan="1" \|UTF16 NFC \|004d \|\|006f \|\|006e \|\|0074 \|\|0072 \|\| colspan="2" \|00e9 \|\|0061 \|\|006c \|- ! rowspan="1" \|UTF16 NFD \|004d \|\|006f \|\|006e \|\|0074 \|\|0072 \|\|0065 \|\|0301 \|\|0061 \|\|006c \|- ! UTF16 NFD (code points) \| M \|\| o \|\| n \|\| t \|\| r \|\|e \|\| ◌́ \|\|a \|\|l \|} == "Code" format == The "code" tags on the keywords in the tables (or perhaps other changes) have destroyed the formatting, making the tables almost illegible. If you go back a decade and look the original tables, you'll see that the keywords are clearly delimited, making the tables clear and easy to read. The present formating makes the whole excercise almost worthless: if you can't read it easily, whats the point of having pages of text? <!-- Template:Unsigned IP --><small class="autosigned">— Preceding [[Wikipedia:Signatures\|unsigned]] comment added by [[Special:Contributions/203.206.162.148\|203.206.162.148]] ([[User talk:203.206.162.148#top\|talk]]) 09:27, 18 July 2017 (UTC)</small> <!--Autosigned by SineBot--> == How-to guide == It seems to me that this article, as useful as it is, is outside of Wikipedia's scope, in light of the principle that [[WP:NOTHOWTO\|Wikipedia is not a how-to guide]], which is exactly what this article is. [[User:Largoplazo\|Largoplazo]] ([[User talk:Largoplazo\|talk]]) 10:00, 18 June 2020 (UTC) == Mentioning strtok as C/C++ way of splitting strings == <code>strtok(char restrict str, const char restrict delim)</code>returns tokens (aka split strings). This is essentially what string.split does in most other languages, except it doesn't allocate memory to store array of tokens, instead just mutating original string (replacing delimiter with '\0')[https://stackoverflow.com/questions/3889992/how-does-strtok-split-the-string-into-tokens-in-c] and returning tokens in order of occurence, one by one. Also it never returns empty tokens[https://en.cppreference.com/w/c/string/byte/strtok]. "Proper" implementation of split (using strtok) is something like:<syntaxhighlight lang=c>#include <stdlib.h> #include <string.h> struct stringArray { size_t size; char *strings; }; struct stringArray splitString(char restrict str, const char restrict delim) { char strings = malloc(sizeof(char )); if (strings == NULL) abort(); char token = strtok(str, delim); size_t count = 0, allocated = 1; while (token != NULL) { if (allocated >= count) { strings = realloc(strings, (allocated = 2) * sizeof(char )); // Doubling reallocation, to provide acceptable performance if (strings == NULL) abort(); } strings[count++] = token; token = strtok(NULL, delim); } if (allocated != count) { strings = realloc(strings, count sizeof(char *)); if (strings == NULL) abort(); } return (struct stringArray){count, strings}; } </syntaxhighlight> [[Special:Contributions/5.189.81.197\|5.189.81.197]] ([[User talk:5.189.81.197\|talk]]) 15:25, 27 July 2024 (UTC)