Low-level programming language: Difference between revisions

Content deleted Content added
That emdash looked bad with the spaces.
Tags: Mobile edit Mobile web edit
Most of the references for the statement about non-portability are talking about assembly language as the only example of a low-level language; move that to the section about assembly language.
 
(193 intermediate revisions by more than 100 users not shown)
Line 1:
{{Short description |Programming language that provides little or no abstraction from underlying hardware}}
{{multiple issues|
{{original research|date=March 2017}}
{{RefimproveMore citations needed|date=July 2015}}
}}
 
A '''low-level programming language''' is a [[programming language]] that provides little or no [[Abstraction (computer science)|abstraction]] from a computer's [[instruction set architecture]]—commands, memory or underlying physical hardware; commands or functions in the language mapare closelystructurally similar to a processor's instructions. Generally,These thislanguages refersprovide tothe eitherprogrammer [[machinewith code]]full orcontrol [[assemblyover language]].program Thememory wordand "low"the refersunderlying tomachine thecode smallinstructions. orBecause nonexistentof amountthe low level of [[abstraction (computerhence the term science"low-level")|abstraction]] between the language and machine language; because of this, low-level languages are sometimes described as being "close to the hardware". Programs written in low-level languages tend to be relatively [[Software portability|non-portable]].
 
== Machine code ==
Low-level languages can convert to machine code without a compiler or interpreter – [[second-generation programming language]]s use a simpler processor called an [[Assembly language#Assemble|assembler]] – and the resulting code runs directly on the processor. A program written in a low-level language can be made to run very quickly, with a small [[memory footprint]]. An equivalent program in a [[high-level language]] can be less efficient and use more memory. Low-level languages are simple, but considered difficult to use, due to numerous technical details that the programmer must remember. By comparison, a [[high-level programming language]] isolates execution semantics of a computer architecture from the specification of the program, which simplifies development.
[[File:Digital pdp8-e2.jpg|thumb|Front panel of a [[PDP-8/e]] minicomputer. The row of switches at the bottom can be used to toggle in machine code.]]
 
[[Machine code]], classified as a [[first-generation programming language]],<ref name=":3">{{Cite web |date=2017-10-22 |title=Generation of Programming Languages |url=https://www.geeksforgeeks.org/generation-programming-languages/ |access-date=2024-04-27 |website=GeeksforGeeks |language=en-US}}</ref><ref name=":4">{{Cite web |title=What is a Generation Languages? |url=https://www.computerhope.com/jargon/num/1gl.htm |access-date=2024-04-27 |website=www.computerhope.com |language=en}}</ref> is [[data]] [[encoded]] and structured per the [[instruction set architecture]] of a [[CPU]]. The instructions imply operations such as moving values in and out of memory locations, Boolean logic, arithmetic, comparing values, and flow control (branching and jumping).
Low-level programming languages are sometimes divided into two categories: ''first generation'' and ''second generation''.{{cn|date=March 2017}}
 
Programmers almost never program directly in machine code; instead, they use an [[assembly language]] or a higher-level programming language.<ref name=":0" /> Although few programs are written in machine languages, some programmers learn to read it through experience with [[core dump]]s and debugging.
==Machine code==
[[File:Digital pdp8-e2.jpg|thumb|Front panel of a PDP-8/E minicomputer. The row of switches at the bottom can be used to toggle in a machine language program.]]
[[Machine code]] is the only language a computer can process directly without a previous transformation. Currently, programmers almost never write programs directly in machine code, because it requires attention to numerous details that a high-level language handles automatically. Furthermore it requires memorizing or looking up numerical codes for every instruction, and is extremely difficult to modify.
 
== Assembly language ==
True ''machine code'' is a stream of raw, usually [[Binary code|binary]], data. A programmer coding in "machine code" normally codes instructions and data in a more readable form such as [[decimal]], [[octal]], or [[hexadecimal]] which is translated to internal format by a program called a [[Loader (computing)|loader]] or toggled into the computer's memory from a [[front panel]].
An [[assembly language]], classified as a [[second-generation programming language]],<ref name=":3"/><ref name=":4"/> provides a level of abstraction on top of machine code. A program written in assembly language is [[Software portability |non-portable]], due to being written and optimized for a particular architecture.<ref name=":0">{{Cite web |date=2021-03-05 |title=3.1: Structure of low-level programs |url=https://workforce.libretexts.org/Bookshelves/Information_Technology/Information_Technology_Hardware/Advanced_Computer_Organization_Architecture_(Njoroge)/03%3A_Computer_Organization_and_low-level_Programming/3.01%3A_Structure_of_low-level_programs |access-date=2023-04-03 |website=Workforce LibreTexts |language=en}}</ref><ref>{{Cite web |date=2023-11-19 |title=What is a Low Level Language? |url=https://www.geeksforgeeks.org/what-is-a-low-level-language/ |access-date=2024-04-27 |website=GeeksforGeeks |language=en-US}}</ref><ref>{{Cite web |title=Low Level Language? What You Need to Know {{!}} Lenovo US |url=https://www.lenovo.com/us/en/glossary/low-level-language/ |access-date=2024-04-27 |website=www.lenovo.com |language=en |url-status=dead |archive-url=https://web.archive.org/web/20240724093734/https://www.lenovo.com/us/en/glossary/low-level-language/ |archive-date=2024-07-24}}</ref><ref>{{Cite web |title=Low-level languages - Classifying programming languages and translators - AQA - GCSE Computer Science Revision - AQA |url=https://www.bbc.co.uk/bitesize/guides/z4cck2p/revision/2 |access-date=2024-04-27 |website=BBC Bitesize |language=en-GB}}</ref>
 
Assembly language has little [[Semantics (computer science)|semantics]] or formal specification, being only a mapping of human-readable symbols, including symbolic addresses, to [[opcode]]s, [[memory address|addresses]], numeric constants, [[string (computer science)|strings]] and so on. Typically, one [[machine instruction (computing)|machine instruction]] is represented as one line of assembly code, commonly called a ''mnemonic''.<ref>{{Cite web |title=Machine Language/Assembly Language/High Level Language |url=https://www.cs.mtsu.edu/~xyang/2170/computerLanguages.html |access-date=2024-04-27 |website=www.cs.mtsu.edu |archive-url=https://web.archive.org/web/20241214053921/https://www.cs.mtsu.edu/~xyang/2170/computerLanguages.html |archive-date=2024-12-14 |url-status=dead}}</ref> Assemblers produce [[object file]]s that can [[linker (computing)|link]] with other object files or be [[loader (computing)|loaded]] on their own. Most assemblers provide [[macro (computer science)|macros]] to generate common sequences of instructions.
Although few programs are written in machine language, programmers often become adept at reading it through working with [[core dump]]s or debugging from the front panel.
 
In the early days of coding on computers like [[TX-0]] and [[PDP-1]], the first thing [[MIT]] [[Hacker culture|hackers]] did was to write assemblers.<ref name=":1">{{cite book|last=Levy|first=Stephen|year=1994|title=Hackers: Heroes of the Computer Revolution|title-link=Hackers: Heroes of the Computer Revolution|publisher=Penguin Books|page=32|isbn=0-14-100051-1}}</ref>
Example: A function in hexadecimal representation of 32-bit [[x86]] machine code to calculate the ''n''th [[Fibonacci number]]:
8B542408 83FA0077 06B80000 0000C383
FA027706 B8010000 00C353BB 01000000
B9010000 008D0419 83FA0376 078BD989
C14AEBF1 5BC3
 
== C programming language ==
==Assembly==
The [[C (programming language)|C programming language]], a [[third-generation programming language]],<ref name=":3"/><ref name=":4"/> is sometimes classified as high or low depending on what one means by high vs. low level.<ref>{{cite journal |last1=Jindal |first1=G. |first2=P. |last2=Khurana |first3=T. |last3=Goel |date=January 2013 |title=Comparative study of C, Objective C, C++ programming language |journal=International Journal of Advanced Trends in Computer Science and Engineering |volume=2 |issue=1 |page=203}}</ref> The syntax of C is inherently higher level than that of an assembly language since an assembly language is syntactically platform dependent whereas the C syntax is platform independent. C does support low-level programming {{endash}} directly accessing computer hardware {{endash}} but other languages, sometimes considered higher level than C, also can access hardware directly. With C, developers might need to handle relatively low-level aspects that other languages abstract (provide higher level support for) such as memory management and pointer arithmetic. But, C can encode abstractions that hide details such as hardware access, memory management and pointer arithmetic such that at least part of a C [[codebase]] might be as conceptually high-level as if constructed in any other language. Whether C is classified as high or low level language is contended, but it is higher level than assembly languages (especially syntactically) and is lower level than many other languages in some aspects.
Second-generation languages provide one abstraction level on top of the machine code. In the early days of coding on computers like the [[TX-0]] and [[PDP-1]], the first thing MIT hackers did was write assemblers.<ref>Levy, Stephen (1994). [[Hackers: Heroes of the Computer Revolution]], Penguin Books. p. 32. {{ISBN|0-14-100051-1}}</ref>
[[Assembly language]] has little [[Semantics (computer science)|semantics]] or formal specification, being only a mapping of human-readable symbols, including symbolic addresses, to [[opcode]]s, [[memory address|addresses]], numeric constants, [[string (computer science)|strings]] and so on. Typically, one [[machine instruction (computing)|machine instruction]] is represented as one line of assembly code. Assemblers produce [[object file]]s that can [[linker (computing)|link]] with other object files or be [[loader (computing)|loaded]] on their own.
 
Although C is not architecture independent, it can be used to write code that is [[cross-platform]] even though doing so can be technically challenging. An aspect of C that facilitates cross-platform development is the [[C standard library]] that provides “an [[interface (computing)|interface]] to system-dependent objects that is itself relatively system independent”.<ref>{{cite book |last=Kernighan |first=B. |author-link1=Brian Kernighan |last2=Ritchie |first2=D. |author-link2=Dennis Ritchie |date=1988 |title=The C Programming Language, 2nd Edition |page=163}}</ref>
Most assemblers provide [[macro (computer science)|macros]] to generate common sequences of instructions.
 
==Comparison==
Example: The same [[Fibonacci number]] calculator as above, but in x86 assembly language using [[MASM]] syntax:
The following is [[x86-64]] machine code for an algorithm to calculate the ''n''th [[Fibonacci number]]; with values in [[hexadecimal]] representation and each line corresponding to one instruction:
<source lang="nasm">
 
<pre>
89 f8
85 ff
74 26
83 ff 02
76 1c
89 f9
ba 01 00 00 00
be 01 00 00 00
8d 04 16
83 f9 02
74 0d
89 d6
ff c9
89 c2
eb f0
b8 01 00 00
c3
</pre>
 
The following is the same algorithm written in [[x86 assembly language|x86-64 assembly language]] using [[Intel syntax]]. The [[Processor register |registers]] of the x86-64 processor are named and manipulated directly. The function loads its 64-bit argument from {{code|rdi}} in accordance to the [[x86 calling conventions#System V AMD64 ABI|System V application binary interface for x86-64]] and performs its calculation by manipulating values in the {{code|rax}}, {{code|rcx}}, {{code|rsi}}, and {{code|rdi}} registers until it has finished and returns. Note that in this assembly language, there is no concept of returning a value. The result having been stored in the {{code |rax}} register, again in accordance with System V application binary interface, the {{code |ret}} instruction simply removes the top 64-bit element on the [[Stack-based memory allocation |stack]] and causes the next instruction to be fetched from that ___location (that instruction is usually the instruction immediately after the one that called this function), with the result of the function being stored in {{code |rax}}. x86-64 assembly language imposes no standard for passing values to a function or returning values from a function (and in fact, has no concept of a function); those are defined by an [[application binary interface]] (ABI), such as the System V ABI for a particular instruction set.
 
<syntaxhighlight lang="asm">
fib:
mov rax, rdi ; The argument is stored in rdi, put it into rax
mov edx, [esp+8]
test rdi, rdi ; Is the argument zero?
cmp edx, 0
je .return_from_fib ; Yes - return 0, which is already in rax
ja @f
cmp rdi, 2 ; No - compare the argument to 2
mov eax, 0
jbe .return_1_from_fib ; If it is less than or equal to 2, return 1
ret
mov rcx, rdi ; Otherwise, put it in rcx, for use as a counter
mov rdx, 1 ; The first previous number starts out as 1, put it in rdx
@@:
mov rsi, 1 ; The second previous number also starts out as 1, put it in rsi
cmp edx, 2
.fib_loop:
ja @f
lea rax, [rsi + rdx] ; Put the sum of the previous two numbers into rax
mov eax, 1
cmp rcx, 2 ; Is the counter 2?
ret
je .return_from_fib ; Yes - rax contains the result
mov rsi, rdx ; No - make the first previous number the second previous number
@@:
dec rcx ; Decrement the counter
push ebx
mov rdx, rax ; Make the current number the first previous number
mov ebx, 1
jmp .fib_loop ; Keep going
mov ecx, 1
.return_1_from_fib:
mov rax, 1 ; Set the return value to 1
@@:
.return_from_fib:
lea eax, [ebx+ecx]
ret cmp edx, 3 ; Return
</syntaxhighlight>
jbe @f
mov ebx, ecx
mov ecx, eax
dec edx
jmp @b
@@:
pop ebx
ret
</source>
 
The following is the same algorithm again, but in C. This is similar in structure to the assembly example but there are significant differences in abstraction:
In this code example, hardware features of the x86 processor (its [[Processor register|registers]]) are named and manipulated directly. The function loads its input from a precise ___location in the [[Call stack|stack]] (8 bytes higher than the ___location stored in the ''ESP'' stack pointer) and performs its calculation by manipulating values in the '''EAX''', '''EBX''', '''ECX''' and '''EDX''' registers until it has finished and returns. Note that in this assembly language, there is no concept of returning a value. The result having been stored in the '''EAX''' register, the '''RET''' command simply moves code processing to the code ___location stored on the stack (usually the instruction immediately after the one that called this function) and it is up to the author of the calling code to know that this function stores its result in '''EAX''' and to retrieve it from there. x86 assembly language imposes no standard for returning values from a function (and so, in fact, has no concept of a function); it is up to the calling code to examine state after the procedure returns if it needs to extract a value.
* The input (parameter {{code |n}}) is an abstraction that does not specify any storage ___location on the hardware. In practice, the C compiler follows one of many possible [[calling convention]]s to determine a storage ___location for the input.
* The local variables {{code|f_nminus2}}, {{code|f_nminus1}}, and {{code|f_n}} are abstractions that do not specify any specific storage ___location on the hardware. The C compiler decides how to actually store them for the target architecture.
* The return function specifies the value to return, but does not dictate ''how'' it is returned. The C compiler for any specific architecture implements a '''standard''' mechanism for returning the value. Compilers for the x86-64 architecture typically (but not always) use the {{code |rax}} register to return a value, as in the assembly language example (the author of the assembly language example has ''chosen'' to use the System V application binary interface for x86-64 convention but assembly language does not require this).
 
These abstractions make the C code compilable without modification for any architecture that is supported by a C compiler; whereas the assembly code above only runs on processors using the x86-64 architecture.
Compare this with the same function in C:
 
<sourcesyntaxhighlight lang="c">
unsigned int fib(unsigned int n)
{
if (!n)
{
return 0;
}
else if (n <= 2)
{
return 1;
else {}
else
unsigned a, c;
{
for (a = c = 1; ; --n) {
unsigned int f_nminus2, cf_nminus1, += af_n;
for (f_nminus2 = iff_nminus1 (n <= 3)1, returnf_n c= 0; ; --n)
a = c - a;{
f_n = f_nminus2 + f_nminus1;
}
if (n <= 2)
}
{
return f_n;
}
f_nminus2 = f_nminus1;
f_nminus1 = f_n;
}
}
}
</syntaxhighlight>
</source>
 
==Low-level programming in high-level languages==
This code is very similar in structure to the assembly language example but there are significant differences in terms of abstraction:
Some [[High-level programming language |high-level languages]], such as [[IBM PL/S|PL/S]], [[BLISS]], [[BCPL]], extended [[ALGOL]] and [[NEWP]], and C, can access lower-level programming languages. One method for doing this is [[inline assembly]], in which assembly code is embedded in the high-level language code. Some of these languages also allow architecture-dependent [[Optimizing compiler |compiler optimization directives]] to adjust the way a compiler uses the target processor architecture.
 
The following block of C code from the [[GNU C Compiler]] (GCC) demonstrates its inline assembly feature.<ref>{{Cite web |title=Extended Asm (Using the GNU Compiler Collection (GCC)) |url=https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html |access-date=2024-04-27 |website=gcc.gnu.org}}</ref> <syntaxhighlight lang="c">
* While the input (parameter '''n''') is loaded from the stack, its precise position on the stack is not specified. The C compiler calculates this based on the calling conventions of the target architecture.
int src = 1;
* The assembly language version loads the input parameter from the stack into a register and in each iteration of the loop decrements the value in the register, never altering the value in the memory ___location on the stack. The C compiler could do the same or could update the value in the stack. Which one it chooses is an implementation decision completely hidden from the code author (and one with no side effects, thanks to C language standards).
int dst;
* The local variables a, b and c are abstractions that do not specify any specific storage ___location on the hardware. The C compiler decides how to actually store them for the target architecture.
* The return function specifies the value to return, but does not dictate ''how'' it is returned. The C compiler for any specific architecture implements a '''standard''' mechanism for returning the value. Compilers for the x86 architecture typically (but not always) use the EAX register to return a value, as in the assembly language example (the author of the assembly language example has ''chosen'' to copy the C convention but assembly language does not require this).
 
asm ("mov %1, %0\n\t"
These abstractions make the C code compilable without modification on any architecture for which a C compiler has been written. The x86 assembly language code is specific to the x86 architecture.
"add $1, %0"
: "=r" (dst)
: "r" (src));
 
printf("%d\n", dst);
==Low-level programming in high-level languages==
</syntaxhighlight>
In the late 1960s, [[High-level programming language|high-level languages]] such as [[IBM PL/S|PL/S]], [[BLISS]], [[BCPL]], extended [[ALGOL]] (for [[Burroughs large systems]]) and [[C (programming language)|C]] included some degree of access to low-level programming functions. One method for this is [[Inline assembly]], in which assembly code is embedded in a high-level language that supports this feature. Some of these languages also allow architecture-dependent [[Optimizing compiler|compiler optimization directives]] to adjust the way a compiler uses the target processor architecture.
 
For example low level language is known as c and 'O' level language.{{clarify|date=May 2019}}
== References ==
{{Reflist}}
 
==References Bibliography ==
* {{cite book |last1=Zhirkov |first1=Igor |title=Low-level programming: C, assembly, and program execution on Intel 64 architecture |date=2017 |publisher=Apress |___location=California |isbn=978-1-4842-2402-1}}
{{reflist}}
 
{{Types of programming languages}}
{{Programming language}}
{{X86 assembly topics}}
{{Authority control}}
 
{{DEFAULTSORT:Low-Level Programming Language}}
[[Category:Low-level programming languages| ]]
[[Category:Programming language classification]]
[[Category:Articles with example C code]]