Defensive programming: Difference between revisions

Content deleted Content added
removed undesired carriage return
BrainStack (talk | contribs)
Link suggestions feature: 3 links added.
 
(331 intermediate revisions by more than 100 users not shown)
Line 1:
{{Short description|Software development methodology}}
Defensive programming is a form of [[defensive design]] intended to ensure the continuing function of a piece of software in spite of unforeseeable usage of that piece of software. The idea can be viewed as reducing or eliminating the prospect of [[Murphy's Law]] having effect. Defensive programming techniques come into their own when a piece of software could be misused mischievously or inadvertantly to catastrophic effect.
{{Use American English|date=November 2020}}
{{howto|date=March 2012}}
 
'''Defensive programming''' is a form of [[defensive design]] intended to develop programs that are capable of detecting potential security abnormalities and make predetermined responses.<ref>{{Citation |last=Boulanger |first=Jean-Louis |title=6 - Technique to Manage Software Safety |date=2016-01-01 |url=https://www.sciencedirect.com/science/article/pii/B9781785481178500064 |work=Certifiable Software Applications 1 |pages=125–156 |editor-last=Boulanger |editor-first=Jean-Louis |publisher=Elsevier |language=en |isbn=978-1-78548-117-8 |access-date=2022-09-02}}</ref> It ensures the continuing function of a piece of [[software]] under unforeseen circumstances. Defensive programming practices are often used where [[high availability]], [[safety]], or [[computer security|security]] is needed.
Here are some hints on '''defensive programming techniques''' to avoid creating security problems.
Many of these techniques also improve general quality of code, because almost any major bug can be potentially used by a [[cracker (computing)|cracker]]
for a [[Denial of Service]] or other attack.
 
Defensive programming is an approach to improve software and [[source code]], in terms of:
Note that the techniques below are ''not'' sufficient to ensure security: see the articles [[computer insecurity]] and [[secure computing]] for more information.
* General quality – reducing the number of [[software bug]]s and problems.
* Making the source code comprehensible – the source code should be readable and understandable so it is approved in a [[code audit]].
* Making the software behave in a predictable manner despite unexpected inputs or user actions.
 
Overly defensive programming, however, may safeguard against errors that will never be encountered, thus incurring run-time and maintenance costs.
-----------------------
 
== Secure programming ==
<i>Please expand this article. These random notes should be changed to a more coherent article.</i>
{{main|Secure coding}}
 
Secure programming is the subset of defensive programming concerned with [[computer security]]. Security is the concern, not necessarily safety or availability (the [[software]] may be allowed to fail in certain ways). As with all kinds of defensive programming, avoiding bugs is a primary objective; however, the motivation is not as much to reduce the likelihood of failure in normal operation (as if safety were the concern), but to reduce the [[attack surface]] – the programmer must assume that the software might be misused actively to reveal bugs, and that bugs could be exploited maliciously.
* One of the most common problems is unchecked use of constant-size structures and functions for dynamic-size data (the [[buffer overflow]] problem). This is especially common for [[string]] data in [[C programming language|C]].
You should never use functions like <tt>gets</tt> and <tt>scanf</tt>.
* Never make your code more complex than necessary. Complexity breeds bugs, including security problems.
* Either leave your code available to everyone on the Net (see [[Free software]] or [[open source definition]]) or hire someone who will do [[security audit]] for you.
* If possible, reuse code instead of writing from scratch.
* Encrypt all important data that flows the Net.
* All data is important until proved otherwise.
* All code is unsecure until proven otherwise.
* Never make your program [[setuid]] unless you're <b>really</b> sure it's secure.
*If you check data for correctness, check if it's correct, not if it is incorrect.
Crackers are likely to invent new kinds of incorrect data. For example, if
you checked if a requested file is not "/etc/passwd", a cracker might pass another
name of this file, like "/etc/../etc/passwd".
 
<syntaxhighlight lang="c">int risky_programming(char *input) {
----
char str[1000];
Preconditions, Postconditions and Invariants validation are also part of Defensive Programming. This may involve checking arguements to a function or method for validity before execution of the body of the function. After the body of a function, doing a check of object state (in OO languages) or other held data and the return value before exits (break/return/throw/error code) is also wise.
// ...
strcpy(str, input); // Copy input.
// ...
}</syntaxhighlight>
The function will result in undefined behavior when the input is over 1000 characters. Some programmers may not feel that this is a problem, supposing that no user will enter such a long input. This particular bug demonstrates a vulnerability which enables [[buffer overflow]] [[exploit (computer security)|exploit]]s. Here is a solution to this example:
 
<syntaxhighlight lang="c">int secure_programming(char *input) {
Within functions, you may want to double check that you aren't referencing something that is not valid (ie: null) and that array lengths are valid before referencing elements with indexes on all temporary/local instantiations. A good heuristic is to not trust the libraries you didn't write either. So any time you call them, check what you get back from them. It often helps to create a small library of "asserting" and "checking" functions to do this along with a logger so you can trace your path and reduce the need for extensive debugging cycles in the first place. With the advent of logging libraries and aspect oriented programming, many of the tedious aspects (yes, a pun) of defensive programming are mitigated.
char str[1000+1]; // One more for the null character.
 
// ...
Generally speaking then, it is preferrable to throw intelligible exception messages that enforce part of your API contract and guide the client programmer instead of returning values that a client programmer is likely to be unprepared for and hence minimize their complaints and increase robustness and security of your software.
-------------------------------------
 
// Copy input without exceeding the length of the destination.
Books:
strncpy(str, input, sizeof(str));
* William R. Cheswick and Steven M. Bellovin, <cite>Firewalls and Internet Security: Repelling the Wily Hacker</cite> ISBN 0201633574 http://www.wilyhacker.com/
 
// If strlen(input) >= sizeof(str) then strncpy won't null terminate.
External references:
// We counter this by always setting the last character in the buffer to NUL,
* [http://www.dwheeler.com/secure-programs "Secure Programming for Linux and Unix HOWTO"] by [[David A. Wheeler]]
// effectively cropping the string to the maximum length we can handle.
// One can also decide to explicitly abort the program if strlen(input) is
// too long.
str[sizeof(str) - 1] = '\0';
 
// ...
}</syntaxhighlight>
 
== Offensive programming ==
{{main|Offensive programming}}
 
Offensive programming is a category of defensive programming, with the added emphasis that certain errors should ''not'' be [[graceful degradation|handled defensively]]. In this practice, only errors from outside the program's control are to be handled (such as user input); the software itself, as well as data from within the program's line of defense, are to be trusted in this [[methodology]].
 
=== Trusting internal data validity ===
 
;Overly defensive programming
<syntaxhighlight lang="c">
const char* trafficlight_colorname(enum traffic_light_color c) {
switch (c) {
case TRAFFICLIGHT_RED: return "red";
case TRAFFICLIGHT_YELLOW: return "yellow";
case TRAFFICLIGHT_GREEN: return "green";
}
return "black"; // To be handled as a dead traffic light.
}
</syntaxhighlight>
 
;Offensive programming
<syntaxhighlight lang="c">
const char* trafficlight_colorname(enum traffic_light_color c) {
switch (c) {
case TRAFFICLIGHT_RED: return "red";
case TRAFFICLIGHT_YELLOW: return "yellow";
case TRAFFICLIGHT_GREEN: return "green";
}
assert(0); // Assert that this section is unreachable.
}
</syntaxhighlight>
 
=== Trusting software components ===
 
;Overly defensive programming
<syntaxhighlight lang="c">
if (is_legacy_compatible(user_config)) {
// Strategy: Don't trust that the new code behaves the same
old_code(user_config);
} else {
// Fallback: Don't trust that the new code handles the same cases
if (new_code(user_config) != OK) {
old_code(user_config);
}
}
</syntaxhighlight>
 
;Offensive programming
<syntaxhighlight lang="c">
// Expect that the new code has no new bugs
if (new_code(user_config) != OK) {
// Loudly report and abruptly terminate program to get proper attention
report_error("Something went very wrong");
exit(-1);
}
</syntaxhighlight>
 
== Techniques ==
Here are some defensive programming techniques:
 
=== Intelligent source code reuse ===
If existing code is tested and known to work, reusing it may reduce the chance of bugs being introduced.
 
However, reusing code is not ''always'' good practice. Reuse of existing code, especially when widely distributed, can allow for exploits to be created that target a wider audience than would otherwise be possible and brings with it all the security and vulnerabilities of the reused code.
 
When considering using existing source code, a quick review of the modules(sub-sections such as classes or functions) will help eliminate or make the developer aware of any potential vulnerabilities and ensure it is suitable to use in the project. {{Citation needed|reason=Cannot find source, Was from a video viewed~April 2015|date=November 2021}}
 
==== Legacy problems ====
Before reusing old source code, libraries, APIs, configurations and so forth, it must be considered if the old work is valid for reuse, or if it is likely to be prone to [[Legacy system|legacy]] problems.
 
Legacy problems are problems inherent when old designs are expected to work with today's requirements, especially when the old designs were not developed or tested with those requirements in mind.
 
Many software products have experienced problems with old legacy source code; for example:
* [[Legacy code]] may not have been designed under a defensive programming initiative, and might therefore be of much lower quality than newly designed source code.
* Legacy code may have been written and tested under conditions which no longer apply. The old quality assurance tests may have no validity any more.
** '''Example 1''': legacy code may have been designed for ASCII input but now the input is [[UTF-8]].
** '''Example 2''': legacy code may have been compiled and tested on 32-bit architectures, but when compiled on 64-bit architectures, new arithmetic problems may occur (e.g., invalid signedness tests, invalid type casts, etc.).
** '''Example 3''': legacy code may have been targeted for offline machines, but becomes vulnerable once network connectivity is added.
* Legacy code is not written with new problems in mind. For example, source code written in 1990 is likely to be prone to many [[code injection]] vulnerabilities, because most such problems were not widely understood at that time.
 
Notable examples of the legacy problem:
* [[BIND|BIND 9]], presented by Paul Vixie and David Conrad as "BINDv9 is a [[Rewrite (programming)|complete rewrite]]", "Security was a key consideration in design",<ref>{{Cite web|url=http://impressive.net/archives/fogo/20001005080818.O15286@impressive.net|title=fogo archive: Paul Vixie and David Conrad on BINDv9 and Internet Security by Gerald Oskoboiny|website=impressive.net|access-date=2018-10-27}}</ref> naming security, robustness, scalability and new protocols as key concerns for rewriting old legacy code.
* [[Microsoft Windows]] suffered from "the" [[Windows Metafile vulnerability]] and other exploits related to the WMF format. Microsoft Security Response Center describes the WMF-features as ''"Around 1990, WMF support was added... This was a different time in the security landscape... were all completely trusted"'',<ref>{{Cite news|url=http://blogs.technet.com/msrc/archive/2006/01/13/417431.aspx|title=Looking at the WMF issue, how did it get there?|work=MSRC|access-date=2018-10-27|language=en-US|archive-url=https://web.archive.org/web/20060324152626/http://blogs.technet.com/msrc/archive/2006/01/13/417431.aspx|archive-date=2006-03-24|url-status=dead}}</ref> not being developed under the security initiatives at Microsoft.
* [[Oracle Corporation|Oracle]] is combating legacy problems, such as old source code written without addressing concerns of [[SQL injection]] and [[privilege escalation]], resulting in many security vulnerabilities which have taken time to fix and also generated incomplete fixes. This has given rise to heavy criticism from security experts such as [[David Litchfield]], [[Alexander Kornbrust]], [[Cesar Cerrudo]].<ref>{{Cite web|url=http://seclists.org/lists/bugtraq/2006/May/0039.html|title=Bugtraq: Oracle, where are the patches???|last=Litchfield|first=David|website=seclists.org|access-date=2018-10-27}}</ref><ref>{{Cite web|url=http://seclists.org/lists/bugtraq/2006/May/0045.html|title=Bugtraq: RE: Oracle, where are the patches???|last=Alexander|first=Kornbrust|website=seclists.org|access-date=2018-10-27}}</ref><ref>{{Cite web|url=http://seclists.org/lists/bugtraq/2006/May/0083.html|title=Bugtraq: Re: [Full-disclosure] RE: Oracle, where are the patches???|last=Cerrudo|first=Cesar|website=seclists.org|access-date=2018-10-27}}</ref> An additional criticism is that default installations (largely a legacy from old versions) are not aligned with their own security recommendations, such as [[Oracle Database]] Security Checklist, which is hard to amend as many applications require the less secure legacy settings to function correctly.
 
=== Canonicalization ===
Malicious users are likely to invent new kinds of representations of incorrect data. For example, if a program attempts to reject accessing the file "/etc/[[Passwd (file)|passwd]]", a cracker might pass another variant of this file name, like "/etc/./passwd". [[Canonicalization]] libraries can be employed to avoid bugs due to non-[[Canonical form|canonical]] input.
 
=== Low tolerance against "potential" bugs ===
Assume that code constructs that appear to be problem prone (similar to known vulnerabilities, etc.) are bugs and potential security flaws. The basic rule of thumb is: "I'm not aware of all types of [[security exploit]]s. I must protect against those I ''do'' know of and then I must be proactive!".
 
===Other ways of securing code===
* One of the most common problems is unchecked use of constant-size or pre-allocated structures for dynamic-size data{{cn|date=December 2023}} such as inputs to the program (the [[buffer overflow]] problem). This is especially common for [[string (computer programming)|string]] data in [[C (programming language)|C]]{{cn|date=December 2023}}. C library functions like <code>gets</code> should never be used since the maximum size of the input buffer is not passed as an argument. C library functions like <code>scanf</code> can be used safely, but require the programmer to take care with the selection of safe format strings, by sanitizing it before using it.
* Encrypt/authenticate all important data transmitted over networks. Do not attempt to implement your own encryption scheme, use a [[Cryptography standards|proven one]] instead. Message checking with a hash or similar technology will also help secure data sent over a network.
 
====The three rules of data security====
* All [[data]] is important until proven otherwise.
* All data is tainted until proven otherwise.
* All code is insecure until proven otherwise.
** You cannot prove the security of any code in [[userland (computing)|userland]], or, more commonly known as: ''"never trust the client"''.
These three rules about data security describe how to handle any data, internally or externally sourced:
 
'''All data is important until proven otherwise''' - means that all data must be verified as garbage before being destroyed.
 
'''All data is tainted until proven otherwise''' - means that all data must be handled in a way that does not expose the rest of the runtime environment without verifying integrity.
 
'''All code is insecure until proven otherwise''' - while a slight misnomer, does a good job reminding us to never assume our code is secure as bugs or [[undefined behavior]] may expose the project or system to attacks such as common [[SQL injection]] attacks.
 
====More Information====
* If data is to be checked for correctness, verify that it is correct, not that it is incorrect.
* [[Design by contract]]
* [[Assertion (computing)|Assertions]] (also called '''assertive programming''')
* Prefer [[Exception handling|exceptions]] to return codes
** Generally speaking, it is preferable{{According to whom|date=December 2023}} to throw exception messages that enforce part of your [[application programming interface|API]] [[Design by contract|contract]] and guide the developer instead of returning error code values that do not point to where the exception occurred or what the program stack looked liked, Better logging and exception handling will increase robustness and security of your software{{cn|date=December 2023}}, while minimizing developer stress{{cn|date=December 2023}}.
 
==See also==
* [[Computer security]]
 
== References ==
<references />
 
==External links==
* [https://www.securecoding.cert.org/confluence/display/seccode/SEI+CERT+Coding+Standards CERT Secure Coding Standards]
 
[[Category:Programming paradigms]]
[[Category:Programming principles]]