Trimming (computer programming): Difference between revisions

Content deleted Content added
m removed obsolete tag
m Bot: http → https
 
(6 intermediate revisions by 6 users not shown)
Line 1:
{{Refimprove|date=February 2015}}
 
In [[computer programming]], '''trimming''' ('''trim''') or '''stripping''' ('''strip''') is a [[string (computer science)|string manipulation]] in which leading and trailing [[whitespace (computer science)character|whitespace]] is removed from a [[string (computer science)|string]].
 
For example, the string (enclosed by apostrophes)
 
<sourcesyntaxhighlight lang=text>' this is a test '</sourcesyntaxhighlight>
 
would be changed, after trimming, to
 
<sourcesyntaxhighlight lang=text>'this is a test'</sourcesyntaxhighlight>
 
==Variants==
;Left or right trimming
:The most popular variants of the trim function strip only the beginning or end of the string. Typically named '''ltrim''' and '''rtrim''' respectively, or in the case of Python: '''lstrip''' and '''rstrip'''. C# uses '''TrimStart''' and '''TrimEnd''', and Common Lisp '''string-left-trim''' and '''string-right-trim'''. Pascal and Java do not have these variants built-in, although [[Object Pascal]] (Delphi) has '''TrimLeft''' and '''TrimRight''' functions.<ref>{{cite web|url=http://www.freepascal.org/docs-html/rtl/sysutils/trim.html |title=Trim |publisher=Freepascal.org |date=2013-02-02 |accessdate=2013-08-24}}</ref>
 
;===Left or right trimming===
;Whitespace character list parameterization
:Many trim functions have an optional parameter to specify a list of characters to trim, instead of the default whitespace characters. For example, PHP and Python allow this optional parameter, while Pascal and Java do not. With Common Lisp's <code>string-trim</code> function, the parameter (called ''character-bag'') is required. The C++ [[Boost library]] defines space characters according to [[Locale (computer software)|locale]], as well as offering variants with a [[predicate (computer programming)|predicate]] parameter (a [[functor]]) to select which characters are trimmed.
 
:The most popular variants of the trim function strip only the beginning or end of the string. Typically named '''ltrim''' and '''rtrim''' respectively, or in the case of Python: '''lstrip''' and '''rstrip'''. C# uses '''TrimStart''' and '''TrimEnd''', and Common Lisp '''string-left-trim''' and '''string-right-trim'''. Pascal and Java do not have these variants built-in, although [[Object Pascal]] (Delphi) has '''TrimLeft''' and '''TrimRight''' functions.<ref>{{cite web|url=httphttps://www.freepascal.org/docs-html/rtl/sysutils/trim.html |title=Trim |publisher=Freepascal.org |date=2013-02-02 |accessdateaccess-date=2013-08-24}}</ref>
;Special empty string return value
:An uncommon variant of trim returns a special result if no characters remain after the trim operation. For example, [[Jakarta Project|Apache Jakarta]]'s '''StringUtils''' has a function called <code>stripToNull</code> which returns <code>null</code> in place of an empty string.
 
;===Whitespace character list parameterization===
;Space normalization
:Space normalization is a related string manipulation where in addition to removing surrounding whitespace, any sequence of whitespace characters within the string is replaced with a single space. Space normalization is performed by the function named <code>Trim()</code> in spreadsheet applications (including [[Microsoft Excel|Excel]], [[OpenOffice.org Calc|Calc]], [[Gnumeric]], and [[Google Docs]]), and by the <code>normalize-space()</code> function in [[XSL Transformations|XSLT]] and [[XPath]],
 
:Many trim functions have an optional parameter to specify a list of characters to trim, instead of the default whitespace characters. For example, PHP and Python allow this optional parameter, while Pascal and Java do not. With Common Lisp's <code>string-trim</code> function, the parameter (called ''character-bag'') is required. The C++ [[Boost library]] defines space characters according to [[Locale (computer software)|locale]], as well as offering variants with a [[predicate (computer programming)|predicate]] parameter (a [[functor]]) to select which characters are trimmed.
;In-place trimming
 
:While most algorithms return a new (trimmed) string, some alter the original string [[in-place]]. Notably, the [[Boost library]] allows either in-place trimming or a trimmed copy to be returned.
;===Special empty string return value===
 
:An uncommon variant of trim returns a special result if no characters remain after the trim operation. For example, [[Jakarta Project|Apache Jakarta]]'s '''StringUtils''' has a function called <code>stripToNull</code> which returns <code>null</code> in place of an empty string.
 
;===Space normalization===
 
:Space normalization is a related string manipulation where in addition to removing surrounding whitespace, any sequence of whitespace characters within the string is replaced with a single space. Space normalization is performed by the function named <code>Trim()</code> in spreadsheet applications (including [[Microsoft Excel|Excel]], [[OpenOffice.org Calc|Calc]], [[Gnumeric]], and [[Google Docs]]), and by the <code>normalize-space()</code> function in [[XSL Transformations|XSLT]] and [[XPath]],
 
;===In-place trimming===
 
:While most algorithms return a new (trimmed) string, some alter the original string [[in-place]]. Notably, the [[Boost library]] allows either in-place trimming or a trimmed copy to be returned.
 
==Definition of whitespace==
The characters which are considered whitespace varies between programming languages and implementations. For example, C traditionally only counts space, tab, line feed, and carriage return characters, while languages which support [[Unicode]] typically include all Unicode space characters. Some implementations also include [[ASCII]] control codes (non-printing characters) along with whitespace characters.
 
Java's trim method considers ASCII spaces and control codes as whitespace, contrasting with the Java <code>isWhitespace()</code> method,<ref>{{cite web|url=httphttps://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character.html#isWhitespace(char) |title=Character (Java 2 Platform SE 5.0) |publisher=Java.sun.com |access-date= |accessdate=2013-08-24}}</ref> which recognizes all Unicode space characters.
 
Delphi's Trim function considers characters U+0000 (NULL) through U+0020 (SPACE) to be whitespace.
Line 47 ⟶ 53:
 
==External links==
*[httphttps://www.tcl.tk/man/tcl8.4/TclCmd/string.htm#M46 Tcl: string trim]
*[httphttps://blog.stevenlevithan.com/archives/faster-trim-javascript Faster JavaScript Trim] - compares various JavaScript trim implementations
*[http://webwidetutor.com/php/PHP-Change-String-value-behaviour-or-look-?id=8 php string cut and trimming]- php string cut and trimming