Java class file: Difference between revisions

Content deleted Content added
Rryk (talk | contribs)
According to the documentation "The value of the constant_pool_count item is equal to the number of entries in the constant_pool table plus one." and since we start with index 1, count is maximum index + 1.
 
(85 intermediate revisions by 74 users not shown)
Line 1:
{{Short description|Executable Java file format}}
{{aboutAbout|the data format|classes in Java|Class (computer programming)}}
{{Infobox file format
| name = ClassJava class file
| icon =
| logo =
| screenshot =
| caption =
| extension = <tt>.class</tt>
| mime =
| type code =
Line 15 ⟶ 14:
| latest release version =
| latest release date =
| genre = [[Bytecode]]
| container for =
| contained by =
Line 25 ⟶ 23:
}}
 
A '''Java class file''' is a [[Computer file|file]] (with the <tt>{{mono|.class</tt>}} [[filename extension]]) containing a [[Java bytecode]] whichthat can be executed on the [[Java Virtualvirtual Machinemachine|Java Virtual Machine (JVM)]]. A Java class file is usually produced by a [[Java compiler]] from [[Java (programming language)|Java programming language]] [[source file]]s (<tt>{{mono|.java</tt>}} files) containing Java [[Class (programming)|classes]] (alternatively, other [[JVM languages]] can also be used to create class files). If a source file has more than one class, each class is compiled into a separate class file. Thus, it is called a {{mono|.class}} file because it contains the bytecode for a single class.
 
JVMs are available for many [[platform (computing)|platform]]s, and thea class file compiled inon one platform will execute inon a JVM of another platform. This makes Java applications [[cross-platform|platform-independent]].
 
==History==
{{AsOn 11 December of|2006}}, the modification of the class file format is beingwas consideredmodified under [[Java Specification Request]] (JSR) 202.<ref>[http://www.jcp.org/en/jsr/detail?id=202 JSR 202] Java Class File Specification Update</ref>
 
==File layout and structure==
 
===Sections===
There are 10 basic sections to the Java Classclass Filefile structure:
* '''[[Magic number (programming)|Magic Number]]''': <code>0xCAFEBABE</code>
* '''Version of Class File Format''': the minor and major versions of the class file
* '''Constant Pool''': Pool of constants for the class
Line 44 ⟶ 42:
* '''[[Interface (object-oriented programming)|Interfaces]]''': Any interfaces in the class
* '''Fields''': Any fields in the class
* '''[[Method (computingcomputer programming)|Method]]s''': Any methods in the class
* '''Attributes''': Any attributes of the class (for example the name of the sourcefile, etc.)
 
===Magic Number===
Class files are identified by the following 4 [[byte]] [[header (computing)|header]] (in [[hexadecimal]]): <code>CA FE BA BE</code> (the first 4 entries in the below table below). The history of this [[Magic number (programming)|magic number]] was explained by [[James Gosling]] referring to a restaurant in [[Palo Alto, California|Palo Alto]]:<ref>[http://radio.-weblogs.com/0100490/2003/01/28.html James Gosling private communication to Bill Bumgarner]</ref>
<blockquote>
"We used to go to lunch at a place called St Michael's Alley. According to local legend, in the deep dark past, the [[Grateful Dead]] used to perform there before they made it big. It was a pretty funky place that was definitely a Grateful Dead Kinda Place. When [[Jerry Garcia|Jerry]] died, they even put up a little Buddhist-esque shrine. When we used to go there, we referred to the place as Cafe Dead. Somewhere along the line it was noticed that this was a HEX number. I was re-vamping some file format code and needed a couple of [[Magic number (programming)|magic numbers]]: one for the persistent object file, and one for classes. I used CAFEDEAD for the object file format, and in [[grep]]ping for 4 character hex words that fit after "CAFE" (it seemed to be a good theme) I hit on BABE and decided to use it.
At that time, it didn't seem terribly important or destined to go anywhere but the trash-can of history. So CAFEBABE became the class file format, and CAFEDEAD was the persistent object format. But the persistent object facility went away, and along with it went the use of CAFEDEAD - it was eventually replaced by [[Java remote method invocation|RMI]]."
</blockquote>
 
Line 59 ⟶ 58:
* '''u2''': an unsigned [[16-bit]] integer in [[Endianness|big-endian]] byte order
* '''u4''': an unsigned [[32-bit]] integer in big-endian byte order
* '''table''': an array of variable-length items of some type. The number of items in the table is identified by a preceding count number (the count is a u2), but the size in bytes of the table can only be determined by examining each of its items.
 
Some of these fundamental types are then re-interpreted as higher-level values (such as strings or floating-point numbers), depending on context.
Line 67 ⟶ 66:
{| class="wikitable"
|-
! byteByte offset
! sizeSize
! typeType or value
! Description
! description
|-
| 0
Line 96 ⟶ 95:
| rowspan="2" | 2 bytes
| rowspan="2" | u2
| rowspan="2" | major version number of the class file format being used.<br /ref>J2SE 7{{Cite web|url= 51 (0x33 hex),<br />J2SE 6.0 = 50 (0x32 hex),<br />J2SE 5.0 = 49 (0x31 hex),<br />JDK 1.4 = 48 (0x30 hex),<br />JDK 1.3 = 47 (0x2F hex),<br />JDK 1.2 = 46 (0x2E hex),<br />JDK 1.1 = 45 (0x2D hex).<br />For details of earlier version numbers see footnote 1 at [httphttps://docs.oracle.com/javase/specs/jvms/se5.0se23/html/ClassFile.docjvms-4.html#75883jvms-4.1-200-B.2|title The= JavaTMTable Virtual4.1-A. Machineclass Specificationfile 2ndformat edition]major versions}}</ref><br />
Java SE 25 = 69 (0x45 hex),<br />
Java SE 24 = 68 (0x44 hex),<br />
Java SE 23 = 67 (0x43 hex),<br />
Java SE 22 = 66 (0x42 hex),<br />
Java SE 21 = 65 (0x41 hex),<br />
Java SE 20 = 64 (0x40 hex),<br />
Java SE 19 = 63 (0x3F hex),<br />
Java SE 18 = 62 (0x3E hex),<br />
Java SE 17 = 61 (0x3D hex),<br />
Java SE 16 = 60 (0x3C hex),<br />
Java SE 15 = 59 (0x3B hex),<br />
Java SE 14 = 58 (0x3A hex),<br />
Java SE 13 = 57 (0x39 hex),<br />
Java SE 12 = 56 (0x38 hex),<br />
Java SE 11 = 55 (0x37 hex),<br />
Java SE 10 = 54 (0x36 hex),<ref>{{cite web |url=http://www.oracle.com/technetwork/java/javase/10-relnote-issues-4108729.html#Remaining |title = JDK 10 Release Notes}}</ref><br />
Java SE 9 = 53 (0x35 hex),<ref>{{cite web |url=https://bugs.openjdk.java.net/browse/JDK-8148785 |title = [JDK-8148785] Update class file version to 53 for JDK-9 - Java Bug System}}</ref><br />
Java SE 8 = 52 (0x34 hex),<br />Java SE 7 = 51 (0x33 hex),<br />Java SE 6.0 = 50 (0x32 hex),<br />Java SE 5.0 = 49 (0x31 hex),<br />JDK 1.4 = 48 (0x30 hex),<br />JDK 1.3 = 47 (0x2F hex),<br />JDK 1.2 = 46 (0x2E hex),<br />JDK 1.1 = 45 (0x2D hex).<br />For details of earlier version numbers see footnote 1 at [https://docs.oracle.com/javase/specs/jvms/se6/html/ClassFile.doc.html The JavaTM Virtual Machine Specification 2nd edition]
|-
| 7
Line 149 ⟶ 166:
| rowspan="4" | ''isize'' (variable)
| rowspan="4" | table
| rowspan="4" | interface table,: ana variable-length array of variable-sizedconstant pool indexes describing the interfaces implemented by this class
|-
| ...
Line 168 ⟶ 185:
| rowspan="4" | table
| rowspan="4" | field table, variable length array of fields
each element is a field_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.5
|-
| ...
Line 186 ⟶ 204:
| rowspan="4" | table
| rowspan="4" | method table, variable length array of methods
each element is a method_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.6
|-
| ...
Line 204 ⟶ 223:
| rowspan="4" | table
| rowspan="4" | attribute table, variable length array of attributes
each element is an attribute_info structure defined in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.7
|-
| ...
Line 212 ⟶ 232:
|}
 
===Representation inof a C-likeclass programming languagefile===
The following is a representation of a {{mono|.class}} file as if it were a C-style struct.
Since [[C (programming language)|C]] doesn't support multiple variable length arrays within a struct, the code below won't compile and only serves as a demonstration.
<sourcesyntaxhighlight lang="ccpp">
struct Class_File_FormatClassFileFormat {
u4 magic_numbermagicNumber;
 
u2 minor_versionminorVersion;
u2 major_versionmajorVersion;
 
u2 constant_pool_countconstantPoolCount;
ConstantPoolInfo[constantPoolCount - 1] constantPool;
cp_info constant_pool[constant_pool_count - 1];
 
u2 access_flagsaccessFlags;
 
u2 this_classthisClass;
u2 super_classsuperClass;
 
u2 interfaces_countinterfacesCount;
u2[interfacesCount] interfaces;
u2 interfaces[interfaces_count];
 
u2 fields_countfieldsCount;
FieldInfo[fieldsCount] fields;
field_info fields[fields_count];
 
u2 methods_countmethodsCount;
MethodInfo[methodsCount] methods;
method_info methods[methods_count];
 
u2 attributes_countattributesCount;
AttributeInfo[attributesCount] attributes;
attribute_info attributes[attributes_count];
}
</syntaxhighlight>
</source>
 
===The constant pool===
Line 249 ⟶ 269:
The constant pool table is where most of the literal constant values are stored. This includes values such as numbers of all sorts, strings, identifier names, references to classes and methods, and type descriptors. All indexes, or references, to specific constants in the constant pool table are given by 16-bit (type u2) numbers, where index value 1 refers to the first constant in the table (index value 0 is invalid).
 
Due to historic choices made during the file format development, the number of constants in the constant pool table is not actually the same as the constant pool count which precedes the table. First, the table is indexed starting at 1 (rather than 0), sobut the count should actually be interpreted as the maximum index plus one .<ref name="jvms-4.4">http{{Cite web|url=https://docs.oracle.com/javase/specs/jvms/se7se11/html/jvms-4.html#jvms-4.4|title=Chapter 4. The class File Format}}</ref>. Additionally, two types of constants (longs and doubles) take up two consecutive slots in the table, although the second such slot is a phantom index that is never directly used.
 
The type of each item (constant) in the constant pool is identified by an initial byte ''tag''. The number of bytes following this tag and their interpretation are then dependent upon the tag value. The valid constant types and their tag values are:
Line 258 ⟶ 278:
! Additional bytes
! Description of constant
! Version introduced
|-
| 1
| 2+''x'' bytes<br />(variable)
| UTF-8 (Unicode) string: a character string prefixed by a 16-bit number (type u2) indicating the number of bytes in the encoded string which immediately follows (which may be different than the number of characters). Note that the encoding used is not actually [[UTF-8]], but involves a slight modification of the Unicode standard encoding form.
| 1.0.2
|-
| 3
| 4 bytes
| Integer: a signed 32-bit [[two's complement]] number in big-endian format
| 1.0.2
|-
| 4
| 4 bytes
| Float: a 32-bit single-precision [[IEEE 754]] floating-point number
| 1.0.2
|-
| 5
| 8 bytes
| Long: a signed 64-bit two's complement number in big-endian format (takes two slots in the constant pool table)
| 1.0.2
|-
| 6
| 8 bytes
| Double: a 64-bit double-precision IEEE 754 floating-point number (takes two slots in the constant pool table)
| 1.0.2
|-
| 7
| 2 bytes
| Class reference: an index within the constant pool to a UTF-8 string containing the fully qualified class name (in ''internal format'') (big-endian)
| 1.0.2
|-
| 8
| 2 bytes
| String reference: an index within the constant pool to a UTF-8 string (big-endian too)
| 1.0.2
|-
| 9
| 4 bytes
| Field reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor. (big-endian)
| 1.0.2
|-
| 10
| 4 bytes
| Method reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor. (big-endian)
| 1.0.2
|-
| 11
| 4 bytes
| Interface method reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor. (big-endian)
| 1.0.2
|-
| 12
| 4 bytes
| Name and type descriptor: two indexes to UTF-8 strings within the constant pool, the first representing a name (identifier) and the second a specially encoded type descriptor.
| 1.0.2
|-
| 15
| 3 bytes
| Method handle: this structure is used to represent a method handle and consists of one byte of type descriptor, followed by an index within the constant pool.<ref name="jvms-4.4" />
| 7
|-
| 16
| 2 bytes
| Method type: this structure is used to represent a method type, and consists of an index within the constant pool.<ref name="jvms-4.4" />
| 7
|-
| 17
| 4 bytes
| Dynamic: this is used to specify a dynamically computed constant produced by invocation of a bootstrap method.<ref name="jvms-4.4" />
| 11
|-
| 18
| 4 bytes
| InvokeDynamic: this is used by an ''invokedynamic'' instruction to specify a bootstrap method, the dynamic invocation name, the argument and return types of the call, and optionally, a sequence of additional constants called static arguments to the bootstrap method.<ref name="jvms-4.4" />
| 7
|-
| 19
| 2 bytes
| Module: this is used to identify a module.<ref name="jvms-4.4" />
| 9
|-
| 20
| 2 bytes
| Package: this is used to identify a package exported or opened by a module.<ref name="jvms-4.4" />
| 9
|}
 
Line 308 ⟶ 370:
Class names in Java, when fully qualified, are traditionally dot-separated, such as "java.lang.Object". However within the low-level Class reference constants, an internal form appears which uses slashes instead, such as "java/lang/Object".
 
The Unicode strings, despite the moniker "UTF-8 string", are not actually encoded according to the Unicode standard, although it is similar. There are two differences (see [[UTF-8]] for a complete discussion). The first is that the codepointcode point U+0000 is encoded as the two-byte sequence <code>C0 80</code> (in hex) instead of the standard single-byte encoding <code>00</code>. The second difference is that supplementary characters (those outside the [[Basic Multilingual Plane|BMP]] at U+10000 and above) are encoded using a surrogate-pair construction similar to [[UTF-16]] rather than being directly encoded using UTF-8. In this case each of the two surrogates is encoded separately in UTF-8. For example, U+1D11E is encoded as the 6-byte sequence <code>ED A0 B4 ED B4 9E</code>, rather than the correct 4-byte UTF-8 encoding of <code>F0 9D 84 9E</code>.
 
==See also==
{{Portal|JavaComputer programming}}
* [[Java bytecode]]
 
==References==
Line 321 ⟶ 383:
| author = [[Tim Lindholm]], Frank Yellin
| title = The Java Virtual Machine Specification
| edition = Second Edition
| publisher = Prentice Hall
| year = 1999
| isbn = 0-201-43294-3
| url = httphttps://javadocs.sunoracle.com/docsjavase/booksspecs/vmspecjvms/2nd-editionse6/html/VMSpecTOCClassFile.doc.html
| accessdateaccess-date = 2008-10-13
}} The official defining document of the [[Java virtual machine|Java Virtual Machine]], which includes the class file format. Both the first and second editions of the book are freely available [httphttps://javadocs.sunoracle.com/docs/booksjavase/vmspecspecs/ online for viewing and/or download].
 
{{Java (Sun)}}
 
{{DEFAULTSORT:Class (File Format)}}
[[Category:Java platform]]
[[Category:Computer file formats]]
[[Category:Java platform]]