Content deleted Content added
AtlasDuane (talk | contribs) m Moved tag to the top of the page |
m Task 70: Update syntaxhighlight tags - remove use of deprecated <source> tags |
||
Line 61:
* <code>?</code> (the only type with [[pointer arithmetic]] allowed, [[fat pointer|"fat" pointer]]s).
The purpose of introducing these new pointer types is to avoid common problems when using pointers. Take for instance a function, called <code>foo</code> that takes a pointer to an int:
<
int foo(int *);
</syntaxhighlight>
Although the person who wrote the function <code>foo</code> could have inserted <code>NULL</code> checks, let us assume that for performance reasons they did not. Calling <code>foo(NULL);</code> will result in [[undefined behavior]] (typically, although not necessarily, a [[SIGSEGV]] [[Unix signal|signal]] being sent to the application). To avoid such problems, Cyclone introduces the <code>@</code> pointer type, which can never be <code>NULL</code>. Thus, the "safe" version of <code>foo</code> would be:
<
int foo(int @);
</syntaxhighlight>
This tells the Cyclone compiler that the argument to <code>foo</code> should never be <code>NULL</code>, avoiding the aforementioned undefined behavior. The simple change of <code>*</code> to <code>@</code> saves the programmer from having to write <code>NULL</code> checks and the operating system from having to trap <code>NULL</code> pointer dereferences. This extra limit, however, can be a rather large stumbling block for most C programmers, who are used to being able to manipulate their pointers directly with arithmetic. Although this is desirable, it can lead to [[buffer overflow]]s and other "off-by-one"-style mistakes. To avoid this, the <code>?</code> pointer type is delimited by a known bound, the size of the array. Although this adds overhead due to the extra information stored about the pointer, it improves safety and security. Take for instance a simple (and naïve) <code>strlen</code> function, written in C:
<
int strlen(const char *s)
{
Line 80:
return iter;
}
</syntaxhighlight>
This function assumes that the string being passed in is terminated by NULL (<code>'\0'</code>). However, what would happen if <code>char buf[6] = {'h','e','l','l','o','!'};</code> were passed to this string? This is perfectly legal in C, yet would cause <code>strlen</code> to iterate through memory not necessarily associated with the string <code>s</code>. There are functions, such as <code>strnlen</code> which can be used to avoid such problems, but these functions are not standard with every implementation of [[ANSI C]]. The Cyclone version of <code>strlen</code> is not so different from the C version:
<
int strlen(const char ? s)
{
Line 94:
return n;
}
</syntaxhighlight>
Here, <code>strlen</code> bounds itself by the length of the array passed to it, thus not going over the actual length. Each of the kinds of pointer type can be safely cast to each of the others, and arrays and strings are automatically cast to <code>?</code> by the compiler. (Casting from <code>?</code> to <code>*</code> invokes a [[bounds checking|bounds check]], and casting from <code>?</code> to <code>@</code> invokes both a <code>NULL</code> check and a bounds check. Casting from <code>*</code> to <code>?</code> results in no checks whatsoever; the resulting <code>?</code> pointer has a size of 1.)
===Dangling pointers and region analysis===
Consider the following code, in C:
<
char *itoa(int i)
{
Line 106:
return buf;
}
</syntaxhighlight>
Function <code>itoa</code> allocates an array of chars <code>buf</code> on the stack and returns a pointer to the start of <code>buf</code>. However the memory used on the stack for <code>buf</code> is deallocated when the function returns, so the returned value cannot be used safely outside of the function. While [[GNU Compiler Collection|gcc]] and other compilers will warn about such code, the following will typically compile without warnings:
<
char *itoa(int i)
{
Line 116:
return z;
}
</syntaxhighlight>
[[GNU Compiler Collection|gcc]] can produce warnings for such code as a side-effect of option -O2 or -O3, but there are no guarantees that all such errors will be detected.
Cyclone does regional analysis of each segment of code, preventing dangling pointers, such as the one returned from this version of <code>itoa</code>. All of the local variables in a given scope are considered to be part of the same region, separate from the heap or any other local region. Thus, when analyzing <code>itoa</code>, the Cyclone compiler would see that <code>z</code> is a pointer into the local stack, and would report an error.
|