Rope (data structure): Difference between revisions

Content deleted Content added
Kimbly (talk | contribs)
copyedit summary
Bender the Bot (talk | contribs)
m External links: HTTP to HTTPS for SourceForge
 
(212 intermediate revisions by more than 100 users not shown)
Line 1:
{{Short description|Data structure for storing strings}}
{{copyedit|date=May 2011}}
[[ImageFile:Rope_exampleVector Rope example.jpgsvg|right|x300pxx200px|thumb|A simple rope built on the string of "Hello_my_name_is_Simon".]]
 
In [[computer programming]] a '''rope''', or '''cord''', is a [[data structure]] for efficiently storing and manipulating a very long [[String (computer science)|string]] of characters. For example, a text editing program will typically use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently.<ref name="Boehm">
In [[computer programming]], a '''rope''', or '''cord''', is a [[data structure]] composed of smaller [[String (computer science)|strings]] that is used to efficiently store and manipulate longer strings or entire texts. For example, a [[text editing]] program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently.<ref name="Boehm">
{{cite journal
| last = Boehm
| first = Hans-J |author2=Atkinson, Russ |author3=Plass, Michael
| title = Ropes: an Alternative to Strings
| coauthors = Atkinson, Russ; and Plass, Michael
| journal = Software: Practice and Experience
| title = Ropes: an Alternative to Strings
| volume = 25
| journal = Software—Practice & Experience
| volumeissue = 2512
| issuepages = 121315–1330
| publisher = John Wiley & Sons, Inc.
| pages = 1315–1330
| publisher___location = JohnNew Wiley &York, SonsNY, Inc.USA
| date = December 1995
| ___location = New York, NY, USA
| url = https://www.cs.tufts.edu/comp/150FP/archive/hans-boehm/ropes.pdf
| date = December 1995
| doi = 10.1002/spe.4380251203
| url = http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.14.9450&rep=rep1&type=pdf
| archive-url = https://web.archive.org/web/20200308005351/https://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.14.9450&rep=rep1&type=pdf
| format = [[PDF]]
| archive-date = 2020-03-08
| doi = 10.1002/spe.4380251203
| url-status = live
}}</ref>
 
 
==Description==
RopeA rope is a type of [[binary tree]] in whichwhere each leaf (end node) v hasholds a weightstring (v).of Allmanageable leavessize and somelength one-child(also internodeknown contains an array storingas a light ''weight string (a short string''), and each ofnode thosefurther nodesup hasthe a weight(v) equalstree toholds the lengthsum of the shortlengths string.of Forall the other nodes, the weight is the sum of weightsleaves in its left [[subtree]]. So,A thenode with two-child nodeschildren in Rope can be considered asthus dividingdivides the whole string into two parts: the left subtree representstores the first part of the string and, the right subtree representstores the second part of the string., Andand thea node's weight can be considered asis the length of the substring in leftfirst part. We can set the weight for each node by doing an in-order traversal and set the weight[[Image:Formula_for_rope.jpg|x120px|border|link=|alt=]]
 
For rope operations, the strings stored in nodes are assumed to be constant [[immutable object]]s in the typical nondestructive case, allowing for some [[copy-on-write]] behavior. Leaf nodes are usually implemented as [[string (computer science)|basic fixed-length strings]] with a [[reference counting|reference count]] attached for deallocation when no longer needed, although other [[Garbage collection (computer science)|garbage collection]] methods can be used as well.
From another view, the tree of Rope can be seen as several levels of node. The bottom level contains all the nodes which contain an extra field string (v) and when the level become higher, the number of nodes become less, and finally, there is just one node in the highest level. So, we can build the Rope by putting all the nodes with light-weight string in the bottom level, then create the second least level by randomly pick half nodes from bottom level to create new node as the parents. for the right half nodes in bottom level who has no parents, they become the right child of the nodes located in its left. In this way, we build the higher level, finally there would be only one node in some level, which means this node is the root.
 
==Operations==
In the following definitions, ''N'' is the length of the rope, that is, the weight of the root node.
 
=== SearchCollect leaves ===
: ''Definition:'' Create a stack ''S'' and a list ''L''. Traverse down the left-most spine of the tree until you reach a leaf l', adding each node ''n'' to ''S''. Add l' to ''L''. The parent of l' (''p'') is at the top of the stack. Repeat the procedure for p's right subtree.
 
<syntaxhighlight lang="java">
Definition : Search(i): finding the character at position i
final class InOrderRopeIterator implements Iterator<RopeLike> {
 
private final Deque<RopeLike> stack;
To execute the query for i-th character which must be in a one-child internode or a leaf, we begin our search at the root. When the search reaches some node u, there are four cases:
 
InOrderRopeIterator(@NonNull RopeLike root) {
1) If weight (u) <i, go to the right child of u and change i to value of i-weight(u), which means, search for the character in the position i-weight(u) in right part of string which is divided by node u;
stack = new ArrayDeque<>();
</br>
var c = root;
while (c != null) {
stack.push(c);
c = c.getLeft();
}
}
 
@Override
2) If weight (u)>=i and u is a two-child node, go to the left child.
public boolean hasNext() {
return stack.size() > 0;
}
 
@Override
3) If weight (u)>=i and u has no left child, the position i in the array of node u is what we are searching for.
public RopeLike next() {
</br>
val result = stack.pop();
 
if (!stack.isEmpty()) {
For example, we are looking for the tenth character in following rope, we start at root with the weight of 22,because 22>10, go to leftchild , because 9<10, then go to rightchild and looking for 10-9=1, then because 7>1 so go to leftchild, then because 6>1 then go leftchild, then compare 2 >1 , now it may go to left child but this node has no leftchild which mean we reached bottom level, so go to the array of that node and get the first character which is exactly what we are looking for.
var parent = stack.pop();
<center>[[Image:Search_rope.jpg|400px|A small complete binary tree stored in an array]]</center>
var right = parent.getRight();
if (right != null) {
stack.push(right);
var cleft = right.getLeft();
while (cleft != null) {
stack.push(cleft);
cleft = cleft.getLeft();
}
}
}
 
return result;
=== Split ===
}
}
</syntaxhighlight>
 
=== Rebalance ===
Definition: Split (i, S): split the string S into two new strings S1 and S2, S1 = C1,…Ci and S2 = Ci+1, …, Cm.
: ''Definition:'' Collect the set of leaves ''L'' and rebuild the tree from the bottom-up.
 
<syntaxhighlight lang="java">
There are two cases, firstly, i-th character is the end of an array like the following picture; secondly, if the character is in the middle of an array, first we split the node into two node with each part of array and make the second node as the right child of first node, and then this case become the same situation as first case. we can explain this operation by the following example.
static boolean isBalanced(RopeLike r) {
val depth = r.depth();
if (depth >= FIBONACCI_SEQUENCE.length - 2) {
return false;
}
return FIBONACCI_SEQUENCE[depth + 2] <= r.weight();
}
 
static RopeLike rebalance(RopeLike r) {
For example, we want to split the following rope into two parts. First we query the i-th character to locate the node v at the bottom level. Then we cut down the link between v and right(v) – v’. Then go up to the parent u and minus weight(u) by weight of v’. Since the parent has right(u) – u’ , then modify u’ to link to v’ and add the weight of u’ by weight of its new left child right v’. And the former left child of u’ become the right child of v’. The result is shown in the following second picture. In this way, we continue go up and reach the parent(u) – w. First minus the weight(w) by weight of v’. Then modify right(w) - w’ to link to u’ and the former child of w’ become the right child of u’ . Then add weight w’ by weight of v’. Then go up and reach parent(w) – x, because w is already the right child of x, so no need to do other modification. Then go up and reach paren(x) – y, minus x(weight) by weight(w’).Clearly, the cost is O(logN) in expectation.
if (!isBalanced(r)) {
<center>[[Image:Split1_rope.jpg|300px|original]]</center>
val leaves = Ropes.collectLeaves(r);
<center>[[Image:Split2_rope.jpg|300px|step1]]</center>
return merge(leaves, 0, leaves.size());
<center>[[Image:Split3_rope.jpg|300px|step2]]</center>
}
return r;
}
 
static RopeLike merge(List<RopeLike> leaves) {
return merge(leaves, 0, leaves.size());
}
 
static RopeLike merge(List<RopeLike> leaves, int start, int end) {
int range = end - start;
if (range == 1) {
return leaves.get(start);
}
if (range == 2) {
return new RopeLikeTree(leaves.get(start), leaves.get(start + 1));
}
int mid = start + (range / 2);
return new RopeLikeTree(merge(leaves, start, mid), merge(leaves, mid, end));
}
</syntaxhighlight>
 
=== Insert ===
: ''Definition:'' <code>Insert(i, S’)</code>: insert the string ''S’'' beginning at position ''i'' in the string ''s'', to form a new string {{math|''C''<sub>1</sub>, ..., ''C<sub>i</sub>'', ''S''', ''C''<sub>''i'' + 1</sub>, ..., ''C<sub>m</sub>''}}.
: ''Time complexity:'' {{tmath|O(\log N)}}.
 
This operation can be done by a <code>Split()</code> and two <code>Concat()</code> operations. The cost is the sum of the three.
<syntaxhighlight lang="java">
public Rope insert(int idx, CharSequence sequence) {
if (idx == 0) {
return prepend(sequence);
}
if (idx == length()) {
return append(sequence);
}
val lhs = base.split(idx);
return new Rope(Ropes.concat(lhs.fst.append(sequence), lhs.snd));
}
</syntaxhighlight>
=== Index ===
 
[[File:Vector Rope index.svg|right|x200px|thumb|Figure 2.1: Example of index lookup on a rope.]]
 
: ''Definition:'' <code>Index(i)</code>: return the character at position ''i''
: ''Time complexity:'' {{tmath|O(\log N)}}
 
To retrieve the ''i''-th character, we begin a [[Recursion|recursive]] search from the root node:
 
<syntaxhighlight lang="java">
@Override
public int indexOf(char ch, int startIndex) {
if (startIndex > weight) {
return right.indexOf(ch, startIndex - weight);
}
return left.indexOf(ch, startIndex);
}
</syntaxhighlight>
 
For example, to find the character at {{code|1=i=10}} in Figure 2.1 shown on the right, start at the root node (A), find that 22 is greater than 10 and there is a left child, so go to the left child (B). 9 is less than 10, so subtract 9 from 10 (leaving {{code|1=i=1}}) and go to the right child (D). Then because 6 is greater than 1 and there's a left child, go to the left child (G). 2 is greater than 1 and there's a left child, so go to the left child again (J). Finally 2 is greater than 1 but there is no left child, so the character at index 1 of the short string "na" (ie "n") is the answer. (1-based index)
 
=== Concat ===
[[File:Vector Rope concat.svg|right|x200px|thumb|Figure 2.2: Concatenating two child ropes into a single rope.]]
: ''Definition:'' <code>Concat(S1, S2)</code>: concatenate two ropes, ''S''<sub>1</sub> and ''S''<sub>2</sub>, into a single rope.
: ''Time complexity:'' {{tmath|O(1)}} (or {{tmath|O(\log N)}} time to compute the root weight)
 
A concatenation can be performed simply by creating a new root node with {{mono|1=left = S1}} and {{mono|1=right = S2}}, which is constant time. The weight of the parent node is set to the length of the left child ''S''<sub>1</sub>, which would take {{tmath|O(\log N)}} time, if the tree is balanced.
Definition: Concat(S1, S2): concatenate two rope S1, S2 into one single rope.
 
As most rope operations require balanced trees, the tree may need to be re-balanced after concatenation.
This operation can be considered as the reversion of merge. The time complexity is also O(log N).
 
=== InsertSplit ===
[[File:Vector Rope split.svg|right|x600px|thumb|Figure 2.3: Splitting a rope in half.]]
 
: ''Definition:'' <code>Split (i, S)</code>: split the string ''S'' into two new strings ''S''<sub>1</sub> and ''S''<sub>2</sub>, {{math|1=''S''<sub>1</sub> = ''C''<sub>1</sub>, ..., ''C<sub>i</sub>''}} and {{math|1=''S''<sub>2</sub> = ''C''<sub>''i'' + 1</sub>, ..., ''C<sub>m</sub>''}}.
Definition: Insert(i, S’): insert the string S’ beginning at position i in the string s, to form a new string C1,….,Ci, S’, Ci+1,…, Cm.
: ''Time complexity:'' {{tmath|O(\log N)}}
 
There are two cases that must be dealt with:
This operation can be done by a split() and a concrete(). First split the rope at the i-th character, the add a new node v with string(v) = S’ to the right child of the rightmost node of the first rope. Then update the weight of nodes in the path from the new node to root. Finally concatenate the two ropes. Because the split() and concat() both cost O(logN) time, the time complexity of this operation is also O(logN).
# The split point is at the end of a string (i.e. after the last character of a leaf node)
# The split point is in the middle of a string.
 
The second case reduces to the first by splitting the string at the split point to create two new leaf nodes, then creating a new node that is the parent of the two component strings.
=== Delete ===
 
For example, to split the 22-character rope pictured in Figure 2.3 into two equal component ropes of length 11, query the 12th character to locate the node ''K'' at the bottom level. Remove the link between ''K'' and ''G''. Go to the parent of ''G'' and subtract the weight of ''K'' from the weight of ''D''. Travel up the tree and remove any right links to subtrees covering characters past position 11, subtracting the weight of ''K'' from their parent nodes (only node ''D'' and ''A'', in this case). Finally, build up the newly orphaned nodes ''K'' and ''H'' by concatenating them together and creating a new parent ''P'' with weight equal to the length of the left node ''K''.
Definition: Delete(i, j): delete the substring Ci, …, Ci+j-1, from s to form a new string C1, …, Ci-1, Ci+j, …, Cm.
 
As most rope operations require balanced trees, the tree may need to be re-balanced after splitting.
This operation can be done by two split() and a concat(). First, split the rope into three ropes divided by i-th and j-th character respectively,that is, first split S into S1 and S2 at i-th character, then split S1 into S3 and S4 at (j-i)-th character, then concatenate the S1 and S2. Because the split() and concat() both cost O(logN) time, the time complexity of this operation is also O(logN).
 
<syntaxhighlight lang="java">
=== Report ===
public Pair<RopeLike, RopeLike> split(int index) {
if (index < weight) {
val split = left.split(index);
return Pair.of(rebalance(split.fst), rebalance(new RopeLikeTree(split.snd, right)));
} else if (index > weight) {
val split = right.split(index - weight);
return Pair.of(rebalance(new RopeLikeTree(left, split.fst)), rebalance(split.snd));
} else {
return Pair.of(left, right);
}
}
</syntaxhighlight>
 
=== Delete ===
Definition: Report(i, j): output the string Ci, …, Ci+j-1.
: ''Definition:'' <code>Delete(i, j)</code>: delete the substring {{math|''C<sub>i</sub>'', …, ''C''<sub>''i'' + ''j'' − 1</sub>}}, from ''s'' to form a new string {{math|''C''<sub>1</sub>, …, ''C''<sub>''i'' − 1</sub>, ''C''<sub>''i'' + ''j''</sub>, …, ''C<sub>m</sub>''}}.
: ''Time complexity:'' {{tmath|O(\log N)}}.
 
This operation can be done by two <code>Split()</code> and one <code>Concat()</code> operation. First, split the rope in three, divided by ''i''-th and ''i+j''-th character respectively, which extracts the string to delete in a separate node. Then concatenate the other two nodes.
To report the string Ci, …, Ci+j-1, we first search for the node u that contains ci and weight(u) >=j, and then traverse T starting at node u. we can then output Ci, …, Ci+j-1 in O(j+logN) expected time by doing an in-order traversal of T starting at node u.
 
<syntaxhighlight lang="java">
==Advantages and Disadvantages==
@Override
public RopeLike delete(int start, int length) {
val lhs = split(start);
val rhs = split(start + length);
return rebalance(new RopeLikeTree(lhs.fst, rhs.snd));
}
</syntaxhighlight>
 
=== Report ===
The main advantages of ropes as compared
: ''Definition:'' <code>Report(i, j)</code>: output the string {{math|''C<sub>i</sub>'', …, ''C''<sub>''i'' + ''j'' − 1</sub>}}.
to storing strings as character arrays is that they enable much faster insertion and deletion than ordinary strings and don't use extra memory while array need to use extra O(N)memory to do copy, and don't require a large contiguous memory space to store a string. The main disadvantages are a little greater overall space usage to store the nodes without string and the split and concrete cost a little more time, although there are all the cost for a much faster insertion and deletion.
: ''Time complexity:'' {{tmath|O(j + \log N)}}
 
To report the string {{math|''C<sub>i</sub>'', …, ''C''<sub>''i'' + ''j'' − 1</sub>}}, find the node ''u'' that contains ''C<sub>i</sub>'' and {{code|1=weight(u) >= j}}, and then traverse ''T'' starting at node ''u''. Output {{math|''C<sub>i</sub>'', …, ''C''<sub>''i'' + ''j'' − 1</sub>}} by doing an [[in-order traversal]] of ''T'' starting at node ''u''.
==Comparison to array-based strings==
 
==Comparison with monolithic arrays==
This table compares the ''algorithmic'' characteristics of string and rope implementations, not their "raw speed". Array-based strings have smaller overhead, so (for example) concatenation and split operations are faster. But Array-based strings are used on stable and small data, the insertion and deletion is unacceptable not only because the time complexity is too large but it cost extra memory to rebuilt new strings for each change about the data. However, rope does good on each operations and support dynamic data very well. Moreover, the space complexity for rope and array are all O(n). In summary, rope is a better algorithm for large and dynamic data.
{| class="wikitable floatright"
|+ Complexity{{citation needed|date=October 2010}}
! Operation !! Rope !! String
|- align="center"
| Index<ref name="Boehm" /> || {{bad|O(log n)}} || {{yes|O(1)}}
|- align="center"
| Split<ref name="Boehm" /> || {{bad|O(log n)}} || {{yes|O(1)}}
|- align="center"
| Concatenate || {{yes|O(1) amortized, O(log n) worst case}}{{Citation needed|date=June 2022|reason=earlier this article claims "A concatenation can be performed simply by ... which is constant time.", or O(1).}} || {{bad|O(n)}}
|- align="center"
| Iterate over each character<ref name="Boehm" />
| {{okay|O(n)}} || {{okay|O(n)}}
|- align="center"
| Insert<ref name=":0" />{{Failed verification|date=September 2023}} || {{yes|O(log n)}} || {{bad|O(n)}}
|- align="center"
| Append<ref name=":0">{{Cite web|url=https://www.sgi.com/tech/stl/ropeimpl.html|title=Rope Implementation Overview|website=www.sgi.com|access-date=2017-03-01
|archive-url=https://web.archive.org/web/20171219030153/https://www.sgi.com/tech/stl/ropeimpl.html
|archive-date=2017-12-19
}}</ref>{{Failed verification|date=September 2023}} || {{yes|O(1) amortized, O(log n) worst case}} || {{bad|O(1) amortized, O(n) worst case}}
|- align="center"
| Delete || {{yes|O(log n)}} || {{bad|O(n)}}
|- align="center"
| Report || {{bad|O(j + log n)}} || {{yes|O(j)}}
|- align="center"
| Build || {{okay|O(n)}} || {{okay|O(n)}}
|}
 
Advantages:
{| class="wikitable"
* Ropes enable much faster insertion and deletion of text than monolithic string arrays, on which operations have time complexity O(n).
! Operation !! Rope performance !! String performance
* Ropes do not require O(n) extra memory when operated upon (arrays need that for copying operations).
|- align="middle"
* Ropes do not require large contiguous memory spaces.
| search || mid|O(log n) || {{yes|O(1)}}
* If only nondestructive versions of operations are used, rope is a [[persistent data structure]]. For the text editing program example, this leads to an easy support for multiple [[undo]] levels.
|- align="middle"
 
| split || O(log n) || {{yes|O(1)}}
Disadvantages:
|- align="middle"
* Greater overall space use when not being operated on, mainly to store parent nodes. There is a trade-off between how much of the total memory is such overhead and how long pieces of data are being processed as strings. The strings in example figures above are unrealistically short for modern architectures. The overhead is always O(n), but the constant can be made arbitrarily small.
| concat || {{yes|O(log n)}} || O(n)
* Increase in time to manage the extra storage
|- align="middle"
* Increased complexity of source code; greater risk of bugs
| insert || {{yes|O(log n)}} || O(n)
 
|- align="middle"
This table compares the ''algorithmic'' traits of string and rope implementations, not their ''raw speed''. Array-based strings have smaller overhead, so (for example) concatenation and split operations are faster on small datasets. However, when array-based strings are used for longer strings, time complexity and memory use for inserting and deleting characters becomes unacceptably large. In contrast, a rope data structure has stable performance regardless of data size. Further, the space complexity for ropes and arrays are both O(n). In summary, ropes are preferable when the data is large and modified often.
| delete || {{yes|O(log n)}} || O(n)
|- align="middle"
| report || O(log n) || {{yes|O(1)}}
|-
| align="middle" | build || {{yes|O(n)}} ||{{yes|O(n)}}
|}{{citation needed|date=October 2010}}
 
==See also==
* The [[Cedar (programming language)|Cedar]] programming environment, which used ropes "almost since its inception"<ref name="Boehm"/>
* The [[Enfilade (Xanadu)|Model T enfilade]], a similar data structure from the early 1970s.
* [[Gap buffer]], a data structure commonly used in text editors that allows efficient insertion and deletion operations clustered near the same ___location
* [[Piece table]], another data structure commonly used in text editors
 
==References==
{{one source|date = September 2011}}
<references/>
{{Reflist}}
 
==External links==
{{Commons category}}
* [http://www.sgi.com/tech/stl/Rope.html SGI's implementation of ropes for C++]
*[https://github.com/abseil/abseil-cpp/blob/master/absl/strings/cord.h "absl::Cord" implementation of ropes within The Abseil library]
* [http://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.3/a00223.html libstdc++ support for ropes]
*[https://github.com/ivmai/bdwgc/ "C cords" implementation of ropes within the Boehm Garbage Collector library]
* [http://ahmadsoft.org/ropes/ Ropes for Java]
*[https://web.archive.org/web/20121225183151/http://www.sgi.com/tech/stl/Rope.html SGI C++ specification for ropes] (supported by STLPort and [https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.3/a00223.html libstdc++])
* [http://rope.forge.ocamlcore.org/doc/Rope.html Ropes] for [[Ocaml]]
* [https://github.com/Ramarrenthyer/ropesRopes ropesRopes] for [[CommonC Sharp (programming Lisplanguage)|C#]]
*[https://github.com/Ramarren/ropes ropes] for [[Common Lisp]]
*[http://ahmadsoft.org/ropes/ Ropes for Java]
*[https://github.com/sunshower-io/sunshower-arcus/tree/master/arcus-lang/src/main/java/io/sunshower/lang/primitives String-Like Ropes for Java]
*[https://github.com/component/rope Ropes for JavaScript]
*[https://github.com/KenDickey/Limbo-Ropes Ropes] for [[Limbo (programming language)|Limbo]]
*[https://nim-lang.org/docs/ropes.html ropes] for [[Nim (programming language)|Nim]]
*[https://github.com/Chris00/ocaml-rope Ropes] for [[OCaml]]
*[https://sourceforge.net/projects/pyropes/files/?source=navbar pyropes] for [[Python (programming language)|Python]]
*[https://github.com/KenDickey/Cuis-Smalltalk-Ropes Ropes] for [[Smalltalk]]
*[https://github.com/fweez/SwiftRope SwiftRope] for [[Swift (programming language)|Swift]]
*[https://docs.rs/ropey/ "Ropey"] for [[Rust (programming language)|Rust]]
*[https://pub.dev/packages/rope/ Rope] for Dart
*[https://zed.dev/blog/zed-decoded-rope-sumtree/ Rope & SumTree] in Zed Editor
 
{{Strings |state=collapsed}}
{{DEFAULTSORT:Rope (Computer Science)}}
 
{{DEFAULTSORT:Rope Data Structure}}
[[Category:Binary trees]]
[[Category:String data structures]]
 
 
{{datastructure-stub}}
 
[[fr:Corde (informatique)]]
[[th:โร้ป]]