This is a list of differences between Java and C, and assumes that the reader knows less about the latter. It's worth familiarising yourself with all the points, even if you don't fully understand them, then you may be aware of the cause of any problem you might encounter.
Many features of C, particularly its standard library facilities, are not dealt with in any great depth, since you can look those up in help files or manual pages, or a good reference book.
- Item 1: Portability
- Item 2: Ease of development
- Item 3: Speed
- Item 4: Primitive types
- Item 5: Comments
- Item 6: Limited encapsulation
- Item 7: Structures instead of classes
- Item 8: Enumerations
- Item 9: Unions
- Item 10: Single namespace for functions and global variables
- Item 11: Lack of function name overloading
- Item 12: Type aliasing
- Item 13: Declarations and definitions
- Item 14: Functions and their prototypes
- Item 15: Global objects
- Item 16: Local objects
- Item 17: Scope
- Item 18: Empty parameter lists
- Item 19: Program modularity
- Item 20: Preprocessing
- Item 21: File inclusion
- Item 22: Macros
- Item 23: Conditional compilation
- Item 24: Pointers instead of references
- Item 25: Pointer types
- Item 26: Null and undefined pointers
- Item 27: Dangling pointers
- Item 28: Passing arguments by reference
- Item 29: Pointers to structures and unions
- Item 30: Pointers to functions
- Item 31: Pointers to pointers
- Item 32: Generic pointers
- Item 33: Arrays and pointer arithmetic
- Item 34: Initialising arrays
- Item 35: Array-pointer relationship
- Item 36: Passing arrays to functions
- Item 37: Array length
- Item 38: Arrays as function parameters
- Item 39:
const
instead offinal
- Item 40: Pointers to
const
objects - Item 41:
const
pointers - Item 42: Inline functions
- Item 43: Characters and strings
- Item 44: Wide characters
- Item 45: Dynamic memory management
- Item 46: Lack of exceptions
- Item 47:
main()
function - Item 48: Standard library facilities
Item 1: Portability
Java source and binaries are entirely portable, since the source format is standardized, and the binaries run on a software emulation of a standardized processor (JVM).
In C, binaries are not usually portable from one platform to another, because they use the platform's native hardware processor directly. However, C source can be portable with little modification if it adheres to an ISO C standard, e.g., C90 (ISO/IEC 9899-1990), C95 (ISO/IEC 9899-1995), C99 (ISO/IEC 9899-1999) or C11 (ISO/IEC 9899-2011), so long as the additional libraries it uses are also portable.
Item 2: Ease of development
Java is a very safe language, in that when you do something wrong, you get a relatively clear indication of what you did. If you attempt to access beyond the bounds of an array, or do anything else wrong, you get an immediate and diagnostic failure. If you lose all references to an object, it gets cleaned up by the garbage collector. There's no way to attempt to access inaccessible or unallocated memory.
Not so with C. A C program will happily access beyond the bounds of an array, possibly resulting in an immediate crash of the program (with little hint of where it was when it happened), no effect, or a delayed effect, as corrupted data is only detected later when accessed. Dynamic memory management is under manual control, and if you forget to deallocate something, the memory is wasted until reclaimed when the program exits. It's also very easy to deallocate memory, but continue to use it accidentally, again with unpredictable results.
In C, these activities require greater responsibility from the programmer.
Item 3: Speed
C is usually a compiled language, i.e., a compiler translates it into the native machine code of a specific target platform, so that platform's hardware processor can interpret it directly, yielding a considerable speed advantage. Java is also compiled, but its target platform is a virtual machine, the JVM, and it must be interpreted by an emulation of that machine, which itself will be a native machine-code program running on the physical processor.
One would therefore expect Java programs to be slower than C programs, and that was probably true when Java first appeared. However, modern JVMs take several steps to improve execution speed:
-
Java bytecode can be converted to native code by the JVM. Not all code is converted, but the JVM will try to identify the most used pieces of code.
-
The JVM can also in-line methods dynamically.
-
The JVM can be careful about loading only parts of the (rather large) standard library that are actually used.
These optimizations are less effective on short-lived programs, because they must be repeated anew for each execution, but a big advantage of applying run-time optimizations is that existing compiled Java programs can take advantage of them without having to be recompiled.
So, is C still faster than Java? It depends so much on things like program longevity and the type of activity, that there's no simple answer, and you will choose a language based on other needs.
Item 4: Primitive types
In C, the primitive types are referred to using
a combination of the keywords char
, int
, float
, double
, signed
,
unsigned
,
long
,
short
and
void
. The allowable combinations
are listed below, but their meanings depend on the
compiler and platform in use, unlike Java.
unsigned char
-
The narrowest unsigned integral type, typically (and always at least) 8 bits wide
signed char
-
The narrowest signed integral type, of the same width as
unsigned char
char
-
An integral type equivalent to one or other of the signed/unsigned variants, but its signedness is implementation-dependent — C treats it as a distinct type, though.
unsigned short
-
An unsigned integral type at least as wide as
unsigned char
, typically (and always at least) 16 bits short
-
A signed integral type of the same width as
unsigned short
unsigned
-
An unsigned integral type at least as wide as
unsigned short
, and wider than thechar
types — 16- or 32-bit widths are common. int
-
A signed integral type of the same size as
unsigned
unsigned long
-
An unsigned integral type at least as wide as
unsigned
, typically (and always at least) 32 bits long
-
A signed integral type of the same size as
unsigned long
unsigned long long
-
In C99, an unsigned integral type at least as wide as
unsigned long
, typically (and always at least) 64 bits long long
-
In C99, a signed integral type of the same size as
unsigned long long
float
-
A single-precision floating-point type
double
-
A double-precision floating-point type
long double
-
An extended double-precision floating-point type
void
-
An empty type — It has no values, and cannot be accessed. As in Java, C functions with no return value are defined to return
void
. Unlike Java, a function with no parameters hasvoid
in its parameter list.
In summary, C appears to have a lot of the same
types as Java, but this is not so, as you can't
guarantee that a C int
(for example) means the same as a Java int
. Furthermore, C's signed
types do not have to use two's-complement notation,
and Java does not have unsigned types.
Note also that there is no boolean type.
Instead, the test conditions of if
, while
and
for
statements, and
the operands of the logical operators (!
, &&
and ||
), are integer expressions with a
boolean interpretation: zero means false, non-zero
means true. The relational operators (==
, !=
,
<=
, >=
, <
and >
) and logical operators return
0
for false and
1
for true.
In C99, there is
a boolean type bool
(which is really
just a very small integer type) and symbolic values
true
and false
(i.e. just 1 and 0), but the other
integer types work just as well as before.
Item 5: Comments
Java allows the use of these forms of comment:
/* a multiline comment */ // a single-line comment
Prior to C99, C does not permit the single-line form.
In both C and Java, you might disable large sections of temporarily unwanted code by using comments, although that can be problematic because comments do not nest. However, as C uses a preprocessor [Item 20], it has a more robust method:
/* enabled code */ #if 0 /* disabled code */ #endif /* enabled code */
Item 6: Limited encapsulation
Languages like Java and C++ were developed out of C to support (i.a.) better encapsulation and information hiding. C only provides very basic support, and otherwise expects enough discipline from the programmer to avoid breaking any intended encapsulation.
There are no classes [Item 7] and no packages [Item 10] in C, so there is nothing for C functions to belong to, and they must be carefully named to avoid clashes. C structures [Item 7] encapulate several data as a unit, but do not restrict access, so there is no abstraction.
The use of header files [Item 21] and separate
modules [Item 19] can provide some hiding of
internals. An empty structure type might be used in
the published header, while its full declaration
would appear only in the source file that needed
it, or in an unpublished header if needed by
several. Functions and globals local to a module
can be hidden from other modules with static
.
Here's a fairly robust template for abstract types in C. First, write a header declaring (say) a type for a handle to access a very simple database:
/* file db.h */ #ifndef db_included #define db_included /* pointer to incomplete structure type */ typedef struct db_handle *db_ref; /* a constructor */ db_ref db_open(const char *addr); /* methods */ int db_get(db_ref, const char *key); /* etc */ /* a destructor */ void db_close(db_ref); #endif
Note that C does not have any notion of ‘constructor’ or ‘method’. They are just ordinary functions alike.
Now write a source file to complete the structure type, and define the functions:
/* file db.c */ /* Include the header, so we ensure that our definitions and declarations are consistent. */ #include "db.h" struct db_handle { /* . . . */ }; /* Usestatic
for internal state and private functions... */ static void normalize_key(char *to, const char *from) { /* . . . */ } db_ref db_open(const char *addr) { /* Allocate astruct db_handle
, and initialize it... */ } int db_get(db_ref db, const char *key) { /* Use info indb
to access an entry... */ } void db_close(db_ref) { /* Release memory... */ }
As a result, the internal structure of your
database handle can change over time without
affecting its users, as they can't see inside it.
The functions declared with static
are not
visible outside db.c, so they won't
clash with identically named functions in other
parts of the program. However, the compiler will
not force a user of your library to initialize a
db_ref
correctly with
db_open
, or to release
it after use correctly with db_close
. He is expected to have
enough self-discipline to do that himself.
Further reading
- Item 7: Structures instead of classes
- Item 10: Single namespace for functions and global variables
- Item 11: Lack of function name overloading
Item 7: Structures instead of classes
C does not allow you to declare class types (as
you can in Java using the class
construct), but you can
declare C structures using the struct
construct. A
C structure is like a Java class that only contains
public data members — there must be no functions,
and all parts are visible to any code that knows
the declaration. For example:
struct point { int x, y; };
This declares a type called struct point
(NB:
‘struct
’ is
part of the name; point
is known as the structure type's
tag).
Members of a C structure are accessed using the
.
operator, as class
members can be in Java:
struct point location; location.x = 10; location.y = 13;
A structure object may be initialised where it is defined:
struct point location = { 10, 13 }; /* okay; initialisation (part of definition) */ location = { 4, 5 }; /* illegal; assignment (not part of definition) */
In C99, you can create anonymous structure objects to perform compound assignement:
location = (struct point) { 4, 5 }; /* legal in C99 */
In C99, a structure initialisation can specify which members are being set:
struct point location = { .y = 13, .x = 10 }; /* legal in C99 */
Unlike Java, where class variables are references to objects, C structure variables are the objects themselves. Assigning one to another causes copying of the members:
struct point a = { 1, 2 }; struct point b; b = a; /* copiesa.x
tob.x
, anda.y
tob.y
*/ b.x = 10; /* does not affecta.x
*/
Background
- Item 6: Limited encapsulation
Item 8: Enumerations
An enumeration is a range of (usually) distinct symbolic constants.
Until Java 1.5, an enumeration was simply an informal grouping of static final variables:
public static final int RED = 0; public static final int REDAMBER = 1; public static final int GREEN = 2; public static final int AMBER = 3;
(Java also has a class Enumeration
,
which serves a different, unrelated purpose.)
Java 1.5 introduced a specific concept for enumerations:
public enum LightState { RED, REDAMBER, GREEN, AMBER }
These are distinct instances of a class type, and don't correspond to integer values (apart from their order).
C has a concept with a similar syntax, but with semantics rather more like static finals:
enum light { RED, REDAMBER, GREEN, AMBER };
This defines a new type enum light
, and
defines the symbols RED
for 0
, REDAMBER
for 1
, GREEN
for 2
, and AMBER
for 3
.
The first symbol is assigned the value
0
, and each subsequent
symbol is assigned the next integer. However, a
symbol can be assigned a particular value:
enum light { RED = 3, REDAMBER, GREEN = 1, AMBER };
This also implies that REDAMBER
is 4
, and that AMBER
is 2
.
If a new type is not required, the tag can be omitted:
enum { RED, REDAMBER, GREEN, AMBER };
The symbols can be used in any expression, and
may be assigned to any integral type, not just the
enum
type. For
this reason, the tag is rarely used.
C enum
s are no
more sophisticated than that. In contrast to Java,
the symbols are not objects, and cannot take
parameters.
Item 9: Unions
C allows an area of memory to be occupied by data of several types, though only one at a time, using a union. Unions are syntactically similar to structures:
union number { char c; int i; float f; double d; };
This declares a type called union number
(NB:
‘union
’ is part of the
name; number
is known
as the union's tag).
Members of a C union are accessed using the
.
operator, just as
structure members are accessed:
union number n; int j; n.i = 10; j = n.i;
Only the member to which a value was last assigned contains valid information to be read. There is no way to determine that member implicitly, so the programmer must take steps to identify it, for example, by using a separate variable to indicate the type:
union number n; enum { CHAR, INT, FLOAT, DOUBLE } nt; n.i = 10; nt = INT; switch (nt) { case CHAR: /* accessn.c
*/ break; case INT: /* accessn.i
*/ break; case FLOAT: /* accessn.f
*/ break; case DOUBLE: /* accessn.d
*/ break; }
Java does not have unions, although it is
possible for a reference to refer to any class
derived from its own class. A reference of type
Object
can refer
to any class of object, since all classes are
originally derived from Object
.
Item 10: Single namespace for functions and global variables
Each class in Java defines a namespace which
allows functions and variables in separate,
unrelated classes to share the same name. When
identifying a function or variable in Java, the
namespace must be expressed, or implied using an
import
directive.
For example, the method Integer.toString()
is distinct from Long.toString()
.
Java packages similarly allow distinct classes and
interfaces to share the same name. For example, the
name Object
could
refer to either java.lang.Object
or org.omg.CORBA.Object
.
In C, all functions are global, and must share a
single namespace (i.e., one
per program). Global variables can also be declared
and defined, and they also share that namespace.
Care must be taken in choosing names for functions
in large projects, and often a strategy of using a
common prefix for groups of related functions is
employed. For example, WSA
prefixes most of the
WinSock functions.
Note that other namespaces exist in a C progam: a single namespace is shared by the tags of all structures, unions and enumerations; each structure and union holds a unique namespace for its members; each block statement holds a namespace for local variables.
Background
- Item 6: Limited encapsulation
Item 11: Lack of function name overloading
In Java, two functions in the same namespace may share the same name if their parameter types are sufficiently different. In C, this is simply not the case, and all function names must be unique.
void myfunc(int a)
{
/* . . . */
}
void myfunc(float b) /* error: myfunc
already defined */
{
/* . . . */
}
Background
- Item 6: Limited encapsulation
Item 12: Type aliasing
New names or aliases for existing types may be
created using typedef
. For
example:
typedef int int32_t;
This allows int32_t
to be
used anywhere in place of int
.
Such aliases are often used to hide implementation-
or platform-specific details, or to allow the
choice of a widely-used type to be changed
easily.
typedef
s are also
useful for expressing complex compound types. For
example, a prototype for the standard-library
function signal
has the
following, rather cryptic form (in ISO C):
void (*signal(int signum, void (*handler)(int)))(int);
Erm, what? It becomes a little clearer when POSIX (an Operating System standard which incorporates the C standard) declares it:
typedef void (*sighandler_t)(int); sighandler_t signal(int signum, sighandler_t handler);
Now we can see that the function's second parameter has the same type as its return value, and that that type is, in fact, a pointer-to-function type.
Note that a typedef
is
syntactically similar to a variable
declaration, with the new type name appearing in
the place of the variable name.
There is no equivalent of type aliasing in Java.
Item 13: Declarations and definitions
C programs are built from collections of functions (which have behaviour) and objects (which have values; variables are objects), the natures of which are indicated by their types. C compilers read through source files sequentially, looking for names of types, objects and functions being referred to by other types, objects and functions.
A declaration of a type, object or function tells the compiler that a name exists and how it may be used, and so may be referred to later in the file. If the compiler encounters a name that does not have a preceding declaration, it may generate an error or a warning because it does not understand how the name is to be used.
In contrast, a Java compiler can look forward or back, or even into other source files, to find definitions for referenced names.
A definition of an object or function tells the compiler which module the object or function is in (see Program modularity [Item 19]). For an object, the definition may also indicate its initial value. For a function, the definition gives the function's behaviour.
Further reading
- Item 14: Functions and their prototypes
- Item 15: Global objects
- Item 16: Local objects
- Item 17: Scope
Item 14: Functions and their prototypes
In Java, the use of a function may appear
earlier than its definition. In C, all functions
being used in a source file EM(should) be declared
somewhere earlier than their invocations in that
file, allowing the compiler to check if the call's
arguments match the function's formal parameters. A
function declaration (or prototype)
looks like a function definition, but its body (the
code between and including the braces
(‘{
’ and ‘}
’)) is replaced by a semicolon
(syntactically similar to a native
method, or an interface
method, in Java). If the compiler finds a function
invocation before any declaration, it will try to
infer a declaration from the invocation, and this
may not match the true definition. A proper
declaration can be inferred from a function
definition, should that be encountered first.
/* a declaration; parameter names may be omitted */
int power(int base, int exponent);
/* From here until the end of the file, we can make calls to power()
,
even though the definition hasn't been encountered. */
/* a definition; parameter names do not need to match declaration */
int power(int b, int e)
{
int r = 1;
while (e-- > 0)
r *= b;
return r;
}
Background
- Item 13: Declarations and definitions
Item 15: Global objects
Global objects also have distinct declarative and definitive forms. A definition may be accompanied by an initialiser, e.g.:
int globval = 34; /* initialized */ int another; /* initialized with 0 */
…while a declaration should not have an
initialiser, and should be preceded by extern
:
extern int globval; extern int another;
(extern
can also
appear before a function declaration, but it is
optional.)
Background
- Item 13: Declarations and definitions
Item 16: Local objects
For local objects in C, the definition and declaration are not distinguished. Unlike Java, all local variables must be defined at the beginning of their enclosing block, before any statements are reached. This restriction does not apply in C99.
{ int x; /* a definition */ x = 10; /* a statement */ int y; /* illegal; follows a statement */ }
Furthermore, an iteration variable in a
for
loop cannot be
declared within the initialisation of the
statement:
{ for (int x = 0; x < 10; x++) { /* illegal */ /* ... */ } }
This restriction does not apply in C99.
Background
- Item 13: Declarations and definitions
Item 17: Scope
All declarations have scope, which is the part of the program in which the declared name has the meaning it is declared to have. ‘File scope’ means from the declaration to the end of the file, and applies to types, functions and global objects.
‘Block scope’ means from the declaration to the end of the block statement in which it is declared. This always applies to local objects (including formal parameters), but can also apply to types, functions and global objects. All of the following declarations have block scope, and can be used by the trailing statements, but not by statements beyond the block:
{
/* a local type */
typedef int MyInteger;
/* a local variable */
MyInteger x;
/* global variable */
extern int y;
/* function (extern
is implicit) */
int power(int base, int exponent);
/* statements... */
}
Unlike Java, a local variable in an inner block may hide one in an outer block by having the same name:
{ int x; { int x; /* hides the other */ } /* first one visible again */ }
Background
- Item 13: Declarations and definitions
Item 18: Empty parameter lists
In Java, a function that takes no arguments is
expressed using ()
.
In C, such a function should be expressed with
(void)
in its declaration and definition. However, it is
still invoked with ()
:
/* prototype/declaration */ int myfunc(void); /* definition */ int myfunc(void) { /* ... */ } /* invocation */ myfunc();
The form ()
is
permitted in declarations, but it means
‘unspecified arguments’ rather than ‘no arguments’.
This tells the compiler to abandon type-checking of
arguments where that function is invoked. It comes
from an obsolete pre-standard version of C, and is
not recommended.
Item 19: Program modularity
Java programs, particularly large ones, are usually built in a modular fashion that supports code re-use. The source code is spread over several source files (.java), and is used to generate Java byte-code in class files (.class) which are identified by the class they represent, so in Java, there is a direct relationship between the name of a class and the file containing the code for that class. These are combined at run-time to produce the executing program. Java's standard library of utilities for file access, GUIs, internationalisation, etc, is a practical example of such modular programming.
A large C program may also be split into several source files (usually with a .c extension), and separate compilation of each of these produces an object file of (usually) the same name with a different extension (.o or .obj). These are the modules of C that can be combined to form an executable program. An object file contains named representations of the functions and global data defined in its source file, and allows them to refer to other functions and data by name, even if in a separate module. In C, there doesn't have to be any relationship between the names of functions and variables and the names of the modules that contain them.
A final executable program is produced by supplying all the relevant modules (as object files) to a linker (which is often built into the compiler). This attempts to resolve all the referred names into the memory addresses required by the generated machine code, and linking will fail if some names cannot be resolved, or if there are two representations of the same name.
For example, the object file generated from the code below would contain references to the names pow (because it is invoked as a function) and errno (because it is accessed as a global variable). The file would also provide a representation of the name func (because the source contains a definition of that function).
extern int errno; void func(void) { double pow(double, double); double x = 3.0, y = 12.7, r; int e; r = pow(x, y); e = errno; /* ... */ }
Like Java, C comes with a standard library of general-purpose support routines, an implementation of which is supplied with your compiler. Its source code is not usually required, since it has already been compiled into object files for your system, and these will be used automatically when linking.
Other pre-compiled libraries may also exist (e.g. to support sockets), but it will normally be necessary to link with them explicitly to use them.
Here is an illustration of a program built from several components:
The source code consists of four source files (foo.c, bar.c, baz.c, quux.c) and three header files for preprocessing ("yan.h", "tan.h", "tither.h"; see File inclusion [Item 21]). The program also uses some header files (<wibble.h>, <wobble.h>) from an additional library. Compiling each of the source files in turn generates the object files foo.o, bar.o, baz.o, quux.o, and these are linked with an archive of pre-compiled objects (libwubble.a) from the library to produce an executable program myprog.
Item 20: Preprocessing
Each C source file undergoes a lexical preprocessing stage which serves several purposes, including conditional compilation and macro expansion. The main purpose is to allow declarations of commonly used types, global data and functions to be conveniently and consistently made available to modules which need to access them. In general, the preprocessor is able to insert, remove or replace text from the source code as it is supplied to the compiler. (The original source code doesn't change.)
There is no equivalent of preprocessing in Java, but its purposes don't usually apply to Java anyway.
Further reading
- Item 21: File inclusion
- Item 22: Macros
- Item 23: Conditional compilation
Item 21: File inclusion
When a large C program is split over several modules, code in one module might need to make references to named code in another, or two modules might need to refer to the same type declaration consistently. The usual way to achieve these is to precede the reference with a declaration that shows what the name means. Some example declarations:
/* This declares the typestruct point
. */ struct point { int x, y; }; /* This declares the global variableerrno
. */ extern int errno; /* this declares the functiongetchar
. */ int getchar(void);
It would be tedious to repeat such declarations
in each source file that requires them,
particularly if they need to be modified as the
program develops. Instead, these could be placed in
a separate file (usually with a .h
extension), and inserted automatically by the
preprocessor when it encounters an #include
directive embedded in the source code, for
example:
#include "mydecls.h"
These header files are also
preprocessed, and so may contain further
#include
(or
other) directives.
Header files containing declarations for the
standard library are also available to the
preprocessor. These are normally accessed with a
variant of the #include
directive:
/* Include declarations for input/output routines. */ #include <stdio.h>
You should normally use the ""
form for your own headers rather
than <>
.
Do not put definitions of functions or variables in header files — it may result in multiple definitions of the same name within one program, so linking will fail. Header files should normally only contain types, function prototypes, variable declarations, and macro definitions. Note that inline functions [Item 42] are exceptional.
Background
- Item 20: Preprocessing
Item 22: Macros
The preprocessor allows macros to be defined which serve a number of purposes:
-
Some macros are used to hold constants or expressions:
#define PI 3.14159 double pi_twice = PI * 2;
PI
will be replaced by the numeric value wherever it is used. -
Some macros take arguments:
#define MAX(A,B) ((A) > (B) ? (A) : (B))
…that provide a convenient way to emulate functions without the overhead of a real function call. (See a good book on C for the limitations of this.)
-
Some macros are merely defined to exist:
#define JOB_DONE
…and are used in conditional compilation [Item 23].
Background
- Item 20: Preprocessing
Item 23: Conditional compilation
The preprocessor allows code to be compiled
selectively, depending on some condition. For
example, if we assume that the macro __unix__
is defined only when
compiling for a UNIX system, and that the macro
__windows__
is defined
only when compiling for a Windows system, then we
could provide a single piece of code containing two
possible implementations depending on the intended
target:
int file_exists(const char *name) { #if defined __unix__ /* Use UNIX system calls to find out if the file exists. */ . . . #elif defined __windows__ /* Use Windows system calls to find out if the file exists. */ . . . #else /* Don't know what to do - abort compilation. */ #error "No implementation for your platform." #endif }
The most common use of conditional compilation,
though, is to prevent the declarations in a header
file from being made more than once, should the
file be inadvertently #include
d more
than once:
/* in the file mydecls.h */ #if !defined(mydecls_header) #define mydecls_header typedef int myInteger; #endif
You should routinely protect all your header files in this way.
Background
- Item 20: Preprocessing
Item 24: Pointers instead of references
All variables of non-primitive types in Java are references. C has no concept of ‘reference’, but instead has pointers, which Java does not. They are similar, but you can do much more with pointers, with a correspondingly greater risk of mistakes.
A pointer is an address in memory of some ordinary data. A variable may be of pointer type, i.e. it holds the address of some data in memory.
/* We'll assume we're inside some block statement, as in a function. */ int i, j; /*i
andj
are integer variables. */ int *ip; /*ip
is a variable which can point to an integer variable. */ i = 10; j = 20; /* values assigned */ ip = &i; /*ip
points toi
. */ *ip = 5; /* Indirectly assign5
toi
. */ ip = &j; /*ip
points toj
. */ *ip += 7; /*j
now contains27
. */ i += *ip; /*i
now contains32
. */
The &
operator
obtains the address of a variable. The *
operator dereferences
the pointer. While ip
points to i
, then
*ip
is synonymous with
i
, and you can use it
in any expression where you could use i
. A dereferenced pointer can be
used on the left-hand side of an assignment,
i.e. it is a modifiable
lvalue (‘el-value’), as in the two examples
above.
The characters &
and
*
also serve as binary
operators for bitwise AND and multiplication, but
the pointer operators are unary, so the syntax
ensures that there is no ambiguity. Furthermore, it
is a common convention to put spaces around binary
operators, but not between a unary operator and its
operand, so it's usually easy to see what these
characters are being used for at a glance.
C doesn't care whether you declare int
*ip;
or int*
ip;
. The type of ip
is int
*
(or pointer-to-int
)
either way. However, some prefer the first form,
because it acts as a mnemonic that *ip
is an int
;
while some prefer the second, because it keeps the
type separate from the variable name. Note that the
*
does not propagate
across multiple variables in a single declaration,
such as int
*ip, i;
— i
is
still just an int
in that case.
Further reading
- Item 25: Pointer types
- Item 26: Null and undefined pointers
- Item 27: Dangling pointers
- Item 28: Passing arguments by reference
- Item 29: Pointers to structures and unions
- Item 30: Pointers to functions
- Item 31: Pointers to pointers
- Item 32: Generic pointers
Item 25: Pointer types
For every type, there is a pointer type. Since
there is an int
type, there is also a pointer-to-int
type, written int *
.
float *
is the pointer-to-float
type. When assigning a pointer value to a variable,
or comparing two pointer values, the types must
match. Given these declarations:
int i, j; float f; int *ip; float *fp;
…then i
is of type
int
,
so the expression &i
must be of type int *
.
ip
is also of type
int *
,
so you can assign &i
to it. &j
is of type int *
,
so it can be compared with &i
, and so on.
But &f
is of
type float *
,
so it cannot be assigned to ip
, or compared with ip
, &i
or &j
.
Background
- Item 24: Pointers instead of references
Item 26: Null and undefined pointers
A valid value for a pointer is null (it equals
0
), indicating that it
points to no object. Many of the standard header
files define a macro for a null pointer,
NULL
, which
many programmers may prefer.
#include <stdlib.h> int *ip; ip = NULL;
It is permissible to use pointers as integer expressions treated as boolean expressions to detect a null pointer. (Null means ‘false’ in this context.) For example:
int *ip; if (ip) { /*ip
is not null. */ } if (!ip) { /*ip
is null. */ }
Direct comparisons are also possible
(e.g. ip != NULL
).
If a pointer variable has not been given a value, it could be pointing anywhere, or be null. Its value is indeterminate. (Java refuses to compile programs that try to use indeterminate values.)
Do not dereference a null
pointer. Do not dereference an
indeterminate pointer. Unlike Java, where
trying to access an object through a null
reference will immediately
raise a NullPointerException
,
dereferencing an invalid pointer in C could have
any effect! The program might fail immediately, or
later, or it might not appear to fail before it
terminates normally.
Background
- Item 24: Pointers instead of references
Item 27: Dangling pointers
In Java, an object will remain in existence so
long as there is a reference to it. In C, an object
may go out of existence even if there are pointers
to it — the programmer is entirely responsible for
ensuring that pointers contain valid addresses
(either 0
, or the
address of an existing object) when used. This
badly written function returns a pointer to an
integer variable:
int *badfunc(void)
{
int x = 18;
return &x; /* Bad - x
won't exist after the call has finished. */
}
The pointer returned by badfunc()
is invalid because the
variable no longer exists. Although the memory for
the variable still exists, it is no longer
allocated to that variable, and it might not even
be accessible any more. Do not dereference
a dangling pointer! Although the likely
result is that it may appear to work, it could fail
at once if the memory is no longer accessible, or
if it has been corrupted. It might quietly fail
later if you attempt to overwrite it, thus
corrupting what it is now being used for.
Background
- Item 24: Pointers instead of references
Item 28: Passing arguments by reference
In Java, all primitive types are passed to functions by value — the function is unable to change values of variables in the invoking context. All class types are passed by reference — the function can alter the public contents of the referenced object.
In C, almost all types are passed by value, and so no variables supplied as arguments can be altered by a function. It can only alter its local copies of the variables. However, by passing a pointer to the variable, the function is able to dereference its copy of the pointer, and indirectly assign to the variable. Consider these two functions which are intended to swap the values of two variables:
void badswap(int a, int b) { int tmp = b; b = a; a = tmp; /*a
andb
are swapped but they're only copies. */ } void goodswap(int *ap, int *bp) { int tmp = *bp; *bp = *ap; *ap = tmp; } /* Assume we're in a function body. */ int x = 10, y = 4; /* Print state of variables. */ printf("1: x = %d y = %d\n", x, y); badswap(x, y); /*x
andy
are copied, and the copies are swapped sox
andy
are unchanged. */ printf("2: x = %d y = %d\n", x, y); goodswap(&x, &y); /* Pointers tellgoodswap()
where we storex
andy
. */ printf("3: x = %d y = %d\n", x, y);
This reports:
1: x = 10 y = 4 2: x = 10 y = 4 3: x = 4 y = 10
…indicating that badswap
had no effect on the
variables given as arguments.
Background
- Item 24: Pointers instead of references
Item 29: Pointers to structures and unions
A pointer to a variable of structure type may
exist. Accessing a member of the structure is
straight-forward: dereference the pointer, and
apply the .
operator.
However, the syntax requires parentheses to ensure
the correct meaning, but a short form also exists
(and is widely used) for convenience:
struct point loc;
struct point *locp = &loc;
(*locp).x = 10; /* correct */
*locp.x = 10; /* incorrect; same as *(locp.x)
*/
locp->x = 10; /* correct, shorter form */
Syntactically, pointers to unions are accessed identically.
Background
- Item 24: Pointers instead of references
Item 30: Pointers to functions
Functions also have addresses, for which there are pointer-to-function types expressing the parameters and return type. The pointers can be passed to or returned from other functions just as other data can.
void goodswap(int *, int *); void (*swapfunc)(int *, int *); /* a pointer calledswapfunc
*/ int x, y; swapfunc = &goodswap; /* Now it points to a function */ /* with matching parameters. */ (*swapfunc)(&x, &y); /* Invokesgoodswap(&x, &y)
. */
Since pointers to functions are just values like any other, they can be passed to and returned from functions, so that ‘behaviour’ itself becomes just another form of data.
Background
- Item 24: Pointers instead of references
Item 31: Pointers to pointers
A pointer may point to variable which itself holds another pointer, and this is expressed in the pointer's type:
int i; /*i
holds an integer. */ int *ip = &i; /*ip
points toi
. */ int **ipp = &ip; /*ipp
points toip
. */ int ***ippp = &ipp; /*ippp
points toipp
. */ /* et cetera */
The fact that the pointed-to object also holds a pointer does not fundamentally change the behaviour of the pointer that points to it. It just allows a further level of indirection — in practice, you rarely need more than a couple of levels.
Background
- Item 24: Pointers instead of references
Item 32: Generic pointers
It is sometimes necessary to store or pass
pointers without knowing what type they point to.
For this, you can use the generic pointer type
void *
. You can convert
between the generic pointer type and other pointer
types (except pointer-to-function types) whenever
you need to:
int x; int *xp, *yp; void *vp; xp = &x; vp = xp; /* Types are compatible. */ /* later... */ yp = vp; /* Types are compatible. */
A generic pointer cannot be dereferenced, nor can pointer arithmetic [Item 33] be applied to it.
x = *vp; /* error: cannot dereferencevoid *
*/ vp++; /* error: cannot do arithmetic onvoid *
*/
The generic pointer type simply allows you to tell the compiler that you're taking responsibility for a pointer's interpretation, and so no error messages or warnings are to be reported when assigning. It is the programmer's responsibility to ensure that the pointer value is interpreted as the correct type.
int *ip; float *fp; void *vp; fp = ip; /* error: incompatible types */ vp = ip; /* okay */ fp = vp; /* no compiler error, but is misuse */
Generic pointers are used with dynamic memory management [Item 45], among other things.
Background
- Item 24: Pointers instead of references
Item 33: Arrays and pointer arithmetic
Arrays in Java are object types whose elements are accessed only by integer offset. In C, arrays are groups of variables of the same type guaranteed to be in adjacent memory. An array of integers may look like this:
int array[10]; /* numbered0
to9
*/ int i = 6; array[3] = 12; array[i] = 13;
Allocation for dynamic arrays is handled by the programmer.
Further reading
- Item 34: Initialising arrays
- Item 35: Array-pointer relationship
- Item 36: Passing arrays to functions
- Item 37: Array length
- Item 38: Arrays as function parameters
Item 34: Initialising arrays
Arrays may be initialised when defined:
int myArray[4] = { 9, 8, 7, 6 };
The size is optional in this case, since the compiler sees that there are four elements in the initialiser. The initialiser must not be bigger than the size if specified, but it can be smaller. Either way, the size must be known at compile time — it can not be an expression in terms of the values of other objects or function calls. In C99, this restriction does not exist.
In C99, you can specify which elements of an array are initialised:
int myArray[4] = { [2] = 7, [0] = 9, [1] = 8, [3] = 6 };
Background
- Item 33: Arrays and pointer arithmetic
Item 35: Array-pointer relationship
The address of an array element can be taken, and simple arithmetic can be applied to it. Adding one to the address makes it point to the next element in the array. Subtracting one instead makes it point to the previous element.
int myArray[4] = { 9, 8, 7, 6 }; int *aep = &myArray[2]; int x, i; *(aep + 1) = 2; /* SetmyArray[3]
to2
. */ *(aep - 1) += 11; /* SetmyArray[1]
to19
. */ x = *(aep - 2); /* Setx
to9
. */
By definition, *(aep + i)
is equivalent
to aep[i]
, and in many
contexts, an array name such as myArray
evaluates to the address of
the first element, which is how expressions such as
myArray[2]
work (it
becomes *(myArray + 2)
). The code
above could be written as:
int myArray[4] = { 9, 8, 7, 6 }; int *aep = &myArray[2]; int x, i; aep[1] = 2; /* SetmyArray[3]
to2
. */ aep[-1] += 11; /* SetmyArray[1]
to19
. */ x = aep[-2]; /* Setx
to9
. */
Note that an array name such as myArray
can not be made to point
elsewhere:
int myArray[4]; int i; int *ip; ip = myArray; /* Okay:myArray
is a legal expression;ip
now points tomyArray[0]
. */ myArray = &i; /* Error:myArray
is not a variable. */
Background
- Item 33: Arrays and pointer arithmetic
Item 36: Passing arrays to functions
Arrays are effectively passed to functions by reference. The array name evaluates to a pointer to the first element, so the function's parameter has a type of ‘pointer-to-element-type’. For example, given the function:
void fill_array_with_square_numbers(int *first, int length) { int i; for (i = 0; i < length; i++) first[i] = i * i; }
…we could write code such as:
int squares[4], moresquares[10]; void fill_array_with_square_numbers(int *first, int length); fill_array_with_square_numbers(squares, 4); fill_array_with_square_numbers(moresquares + 2, 7);
The second call only fills part of the array
moresquares
.
Note that the programmer must take steps to indicate the length of the array, in this case by defining the function to take a length argument. (An alternative would be to identify a special value within the array to mark its end.) The second call only has elements 2 to 8 set (an array of length 7).
Background
- Item 33: Arrays and pointer arithmetic
Item 37: Array length
If the declaration of an array is visible, one can find its length by dividing its total size by the size of one element:
int squares[4]; int len = sizeof squares / sizeof squares[0];
Because squares
above is the name of an
array, we can obtain its length using
sizeof squares
,
which yields the total size as a number of
char
s. sizeof squares[0]
yields the size (in char
s)
of one element, and since all the elements are of
the same size, the ratio of these two sizeof
s is the
number of elements in the array:
void fill_array_with_square_numbers(int *first, int length); int squares[4]; fill_array_with_square_numbers(squares, sizeof squares / sizeof squares[0]);
(For arrays of char
s,
the divisor can be omitted, since sizeof(char)
is defined to be 1
.)
However, this technique doesn't work if the
argument to sizeof
is only
a pointer that happens to point to an element of an
array, rather than an array name. Consider that
such a pointer looks identical to a pointer to a
single object, as far as the compiler is concerned
— they don't contain any information about the
length. This is why the example function above
requires the length as a separate argument: within
the function, sizeof first
would only give the size of a pointer to an
integer, not the length of the array.
Background
- Item 33: Arrays and pointer arithmetic
Item 38: Arrays as function parameters
Note that a function parameter of array type isn't treated as an array, but as a pointer. (The array syntax is allowed, but only pointer semantics are implemented.) The following two declarations are equivalent:
void fill_array_with_square_numbers(int *first, int length); void fill_array_with_square_numbers(int first[], int length);
Within the definition of this function,
sizeof first
will still equal sizeof(int *)
,
even if we place a length inside the square
brackets (such a value is ignored anyway).
Background
- Item 33: Arrays and pointer arithmetic
Item 39: const
instead of final
Java uses the keyword final
to indicate ‘variables’
which can only be assigned to once (usually where
they are declared). C uses the keyword const
with
an object declaration to indicate a constant object
that can (and must) be initialised, but cannot
subsequently be assigned to — it is not a variable,
but it still has an address and a size, so you can
write &obj
or
sizeof
obj
.
double sin(double); /* mathematical function sine */ const double pi = 3.14159; double val; val = sin(pi); /* legal expression */ pi = 3.0; /* illegal; not a modifiable lvalue */
Further reading
- Item 40: Pointers to
const
objects - Item 41:
const
pointers
Item 40: Pointers to const
objects
const
is
useful when declaring functions that take pointers
or arrays as arguments, but do not modify the
dereferenced contents:
int sum(const int *ar, int len) { int s = 0, i; for (i = 0; i < len; i++) s += ar[i]; return s; } int array[] = { 1, 2, 4, 5 }; int total = sum(array, 4);
The const
assures the caller that the invocation will not
attempt to assign to *array
(or array[1]
, array[2]
, etc), even though the elements of
array
are modifiable in
other contexts.
Background
- Item 39:
const
instead offinal
Item 41: const
pointers
Pointers themselves can be declared const
just
like other objects. In these cases, the pointer
can't be made to point elsewhere, but does not
prevent modification of what it points. Careful
positioning of the keyword const
is
required to distinguish constant pointers from
pointers to constants:
int array = { 1, 2, 4, 5 }; int *ip = array; /* a pointer to an integer */ int *const ipc = array; /* a constant pointer to an integer */ const int *const icpc = array; /* a constant pointer to a constant integer */ ipc[0] = ipc[1] + ipc[2]; /* okay */ ip += 2; /* okay */ ipc += 1; /* wrong; pointer is constant */ icpc[1] += 4; /* wrong; pointed-to object is constant */
This example shows a modifiable array whose members are being accessed through four pointers with slightly different types.
Background
- Item 39:
const
instead offinal
Item 42: Inline functions
C99 supports inline
functions. The programmer can indicate to the
compiler that a function's speed is critical by
marking it inline
:
inline int square(int x) { return x * x; }
If this definition is in scope, and you make a call to it, the compiler may choose not to translate the C call into a machine-code call, but instead replace it with a copy of the function, thus avoiding the overhead of true call.
Inline function definitions can (and often should) appear in header files [Item 21] instead of their prototypes [Item 14]. A normal (‘external’) definition must still be provided — for example, some part of your program may try to obtain a pointer [Item 30] to the function, and only a normal definition can provide that.
If the inline definition is in scope, an
equivalent external definition can be generated
from it by simply re-declaring the function with
extern
:
extern int square(int x);
If the inline definition isn't in scope, you could provide a normal definition which doesn't actually match the inline definition — but this could lead to confusing behaviour.
Java doesn't have explicit inline functions, but virtual machines are permitted to inline functions automatically at runtime.
Item 43: Characters and strings
A Java variable of type char
can hold any 16-bit Unicode
character. In C, the char
type can represent any character in a character set
that depends on the type of system or platform for
which the program is compiled. This is usually a
variation of US ASCII, but it doesn't have to
be, so beware. In particular, it could be a
multibyte encoding, where a larger set of
characters are represented by several char
objects, e.g. UTF-8. A basic
set of characters, however, are always represented
as single char
s.
Java strings are objects of class String
or
StringBuilder
,
and represent sequences of char
.
Strings in C are just arrays of, or pointers to,
char
,
and don't exist as a formal type. Functions which
handle strings typically assume that the string is
terminated with a null character '\0'
, rather than being passed
length parameter. A character array can be
initialised like other arrays:
char word[] = { 'H', 'e', 'l', 'l', 'o', '!', '\0' }; char another[] = "Hello!";
Note that the second initialiser is a shorter form of the first, including the terminating null character. Such a string literal can also appear in an expression. It evaluates to a pointer to the first character.
const char *ptr; ptr = "Hello!";
ptr
now points to an
anonymous, statically allocated array of
characters. Attempting to write to a string literal
like this has undefined behaviour, so the use of
const
ensures that such attempts are detected while
compiling.
Utilities for handling character strings are
declared in <string.h>
.
For example, the function to copy a string from one
place to another is declared as:
char *strcpy(char *to, const char *from);
…and may be used like this:
#include <string.h> char words[100]; strcpy(words, "Madam, I'm Adam.");
Like many of the other <string.h>
functions, strcpy
assumes that you have already allocated
sufficient space to store the string.
Further reading
- Item 44: Wide characters
Item 45: Dynamic memory management
Dynamic memory management is built into Java
through its new
keyword and its garbage collector. In C, it is
available through two functions in <stdlib.h>
which are declared as:
void *malloc(size_t s); /* Reserve memory fors
char
s. */ void free(void *); /* Release memory reserved withmalloc
. */
(size_t
is
an alias for an unsigned integral type.)
malloc(s)
returns a pointer to the start of a block of memory
big enough for s char
s.
It returns a generic pointer which can be assigned
to a pointer variable of any type. The memory is
not initialised. All such allocated memory must be
released when it is no longer required, by passing
a pointer to its start to free()
. Only
pointer values returned by malloc()
can be passed to free()
.
You can find out the amount of memory needed to
store an object of a particular type using
sizeof(type)
.
For an array, multiply this by the number of
elements required in the array.
long *lp; long *lap; lp = malloc(sizeof(long)); lap = malloc(sizeof(long) * 10); /* Now we can access*lp
as a long integer, andlap[0]
..lap[9]
form an array. */ free(lap); free(lp); /* Now we can't. */
malloc()
returns a null pointer (0
) if it cannot allocate the
requested amount of memory.
Item 46: Lack of exceptions
Java supports exceptions to cover application-defined mistakes as well as more serious system or memory-access errors, such as accessing beyond the bounds of an array.
In C, application-defined error conditions are normally expressed through careful definition of the meaning of values returned by functions. More serious errors, such as an attempt to access memory that hasn't been allocated in some way, may go unnoticed, because the behaviour is undefined. Write-access to such memory may cause corruption of critical hidden data, which only results in an error at a later stage, so the original cause of the error may be difficult to trace. Just because some activity is illegal in C, it doesn't mean that you will necessarily be told about it when you do it, either by the compiler or by the running program.
Item 47: main()
function
In a Java application, execution begins in a
static method (void
main(String[])
) of a specified class. In C,
execution also begins at a function called
main
, but it has
the following prototype:
int main(int argc, char **argv);
The parameters represent an array of character
strings that form the command that ran the program.
argv[0]
is usually the
name of the program, argv[1]
is the first argument,
argv[2]
is the second,
…, argv[argc - 1]
is the
last, and argv[argc]
is
a null pointer. For example, the command:
myprog wibbly wobbly
…may cause main
to be invoked
as if by:
char a1[] = "myprog"; char a2[] = "wibbly"; char a3[] = "wobbly"; char *argv[4] = { a1, a2, a3, NULL }; main(3, argv);
The parameters are optional (you can replace
them with a single
void
), but main
always
returns int
in any portable program. Returning 0
tells the environment that the
program completed successfully. Other values
(implementation-defined) indicate some sort of
failure. <stdlib.h>
defines the macros EXIT_
and EXIT_
as symbolic return codes.
Item 48: Standard library facilities
Java comes with a rich and still-developing set of classes to support I/O, networking, GUIs, etc, to access a process's environment.
Similarly, the C language has a core of facilities to access its environment. These functions, types and macros form C's Standard Library. However, it is necessarily limited in order to support maximum portability. Here are some obvious omissions:
- GUI
- Networking
- Collections and containers
Access to other facilities is through additional libraries that are usually specific to your platform.
The headers of the C Standard Library are briefly summarised below:
<stddef.h>
-
Some essential macros and additional type declarations
<stdlib.h>
-
Access to environment; dynamic memory allocation; miscellaneous utilities
<stdio.h>
-
Streamed input and output of characters
<string.h>
-
String handling
<ctype.h>
-
Classification of characters (upper/lower case, alphabetic/numeric etc)
<limits.h>
-
Implementation-defined limits for integral types
<float.h>
-
Implementation-defined limits for floating-point types
<math.h>
-
Mathematical functions
<assert.h>
-
Diagnostic utilities
<errno.h>
-
Error identification
<locale.h>
-
Regional/national variations in character sets, time formats, etc
<stdarg.h>
-
Support for functions with variable numbers of arguments
<time.h>
-
Representations of time, and clock access
<signal.h>
-
Handling of exceptional run-time events
<setjmp.h>
-
Restoration of execution to a previous state
C95 additionally provides the following headers:
<iso646.h>
-
Alphabetic names for operators
<wchar.h>
-
Manipulation of wide-character streams and strings
<wctype.h>
-
Classification of wide characters (upper/lower case, alphabetic/numeric etc)
C99 additionally provides the following headers:
<stdbool.h>
-
The boolean type and constants
<complex.h>
-
The complex types and constants
<inttypes.h>
<stdint.h>
-
Integer types of specific or minimum widths
<fenv.h>
-
Access to the floating-point environment
<tgmath.h>
-
Type-generic mathematics functions
C11 additionally provides the following headers:
<stdalign.h>
-
Alignment
<stdatomic.h>
-
Atomic types and operations
<stdnoreturn.h>
-
Non-returning functions
<threads.h>
-
Threads
<uchar.h>
-
Unicode characters