Talk:Null-terminated string/Archive 1: Difference between revisions

Content deleted Content added
Vegaswikian (talk | contribs)
m Archiving 1 discussion(s) from Talk:Null-terminated string) (bot
Line 605:
[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 03:23, 21 January 2011 (UTC)
:TThe reason nothing like this has been implemented is because, as I mentioned earlier, the C standard does not concern itself with complex structures, typedefs and pointer hiding. What purpose does "reference_count" have? Taking a substring is not O(1). Finding a tail by character is O(1). I could have put the two fields into a structure, but that would have meant I'd have to create a new object for the substring. ;) Besides, it would have bloated the code more. I wrote what was necessary to prove my point, and my point is still that C strings are not '''faster''' than every other string representation at obtaining a tail. I don't think it's suitable to pit the C string representation against an ambiguous variety of string representations and say "it's far faster". Aren't ''"can often", "often", "usually", etc'' weasel words? In terms of constant time, the representation I defined is no slower at reading a string from input than the C string representation; I could write the other operations to be just as fast, if not faster. In terms of specification, one should not attempt to compare because it is implementation that defines actual speed. The specification does not define any requirements for the actual speed of these functions. "I hereby declare that a call to any standard function that operates on a C string shall take at least one second." If you're comparing implementations, then your argument makes no sense because there are slow C implementations and fast C implementations; a slow C implementation is likely just as slow, or possibly slower at finding the tail of a string as a fast Javascript implementation. However, there is no reason a fast Javascript implementation can't be just as fast as a fast C implementation at finding the tail of a string; It's all about optimisation.[[User:Plebbeh|Plebbeh]] ([[User talk:Plebbeh|talk]]) 04:48, 21 January 2011 (UTC)
 
== character vs byte ==
 
I think we should name characters as ''characters'' since these functions are for character string manipulation. I agree that there's an issue with multi-byte characters, but using ''bytes'' doesn't completely remove the source of confusion either, as the reader still must know that there exist non single-byte characters. What if we changed ''bytes'' back to ''characters'' and added a notice that str* functions operate on '''single-byte''' characters?[[User:1exec1|1exec1]] ([[User talk:1exec1|talk]]) 19:05, 18 October 2011 (UTC)
 
:Saying "it only works on the one-byte characters" is wrong, because the string operations will work on the individual bytes that make up parts of multi-byte characters (for instance you can count the number of characters, assuming no bad encoding, by counting the bytes that don't start with 10 binary, thus there are useful operations you can do working with the bytes). The proper term for the units it operates on is "whatever your C compiler means when you say 'char'" but that is hard to read, looks like the word 'character' misspelled, and 'byte' is probably a much more popular term. The C99 documentation is technically correct because they define the word "character" as being "char", but that is not how the word "character" is defined in any wikipedia article about text.
:The main problem is that there are a lot of programmers out there who are just smart enough to do horrible things when they think that strlen() has to return the 'number of characters'. If they were a bit stupider we would be ok because they would not get anything to work. But there seems to be an overlap, perhaps best defined as 'idiot savants' or something, where they will actually write working, but horrific complicated code because they took the word "character" literally. These code writers are probably the biggest impediment to getting Unicode to work. There are active attempts to clear up the documentation, such as the BSD man pages which I was quoting, but there remains a huge amount of legacy documentation, including stuff from standards organizations. Anyway I see no reason not to have Wikipedia use modern notation.[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 02:05, 19 October 2011 (UTC)
 
:: Ok, I agree. C++11 uses ''byte string'' to name single byte character strings, so I think it's a good idea to stick with it. [[User:1exec1|1exec1]] ([[User talk:1exec1|talk]]) 02:26, 19 October 2011 (UTC)