C Data Types
The compiler in the C programming language must know the type of the data in order to operate on it. Once the data type of a value is known, it is possible to know the characteristics of that value and how to manipulate it.
There are three basic data types: character
(char), integer
(int), and floating point
(float). Complex types are built on top of them.
Table of Contents
Character types
A character
type is a single character, and the type declaration uses the char
keyword.
char c = 'B';
The above example declares the variable c
as a character
type and assigns it to the letter B
.
The C language specifies that character constants
must be placed inside single quotes.
Character types are stored in a single byte (8 bits) and are treated as integers by the C programming language, so a character type is an integer with a width of one byte. Each character corresponds to an integer (ASCII code), for example, B
corresponds to the integer 66
.
The default range of character types varies from computer to computer. Some systems default to -128
to 127
, while others default to 0
to 255
. These two ranges cover exactly the ASCII character range of 0
to 127
.
Integers
and characters
are interchangeable and can be assigned to variables of the character type as long as they are in the range of the character
type.
char c = 66;
// equal to
char c = 'B';
In the above example, the variable c
is a character type and the value assigned to it is the integer 66
, which has the same effect as the value assigned to the character B
.
Two variables of character type can perform mathematical operations.
char a = 'B'; // equal to char a = 66;
char b = 'C'; // equal to char b = 67;
printf("%d\n", a + b); // output 133
In the above example, the character variables a
and b
are added together as if they were two integers. The placeholder %d
indicates the output decimal integer, so the output is 133
.
The single quote
itself is also a character, and in order to represent this character constant
, it must be escaped using a backslash
.
char t = '\'';
In the above example, the variable t
is a single-quoted character, and since character constants must be placed inside single quotes, the internal single quotes are escaped with a backslash
.
This escaped writing style is mainly used to represent some non-printable control characters defined in ASCII codes that are also character type values.
\a
: alarm, which causes the terminal to sound an alarm or appear to blink, or both at the same time.\b
: backspace, the cursor goes back one character, but does not delete the character.\f
: page break, the cursor moves to the next page.\n
: newline character.\r
: carriage return character\t
: tab character, the cursor moves to the next horizontal tab position, usually the next multiple of 8.\v
: vertical separator, the cursor moves to the next vertical tab, usually the same column of the next line.\0
: null character, representing no content. Note that this value is not equal to the number 0.
char x = 'B';
char x = 66;
char x = '\102'; // octal
char x = '\x42'; // hexadecimal
All four of the above examples are written in equivalent ways.
Integer Types
The integer type is used to represent larger integers, and the type declaration uses the int
keyword.
int a;
The above example declares an integer variable a
.
The size of the int
type varies from computer to computer. It is more common to use 4 bytes
(32 bits) to store a value of type int
, but 2 bytes
(16 bits) or 8 bytes
(64 bits) can also be used. The range of integers they can represent is as follows.
16-bit
: -32,768 to 32,767.32-bit
: -2,147,483,648 to 2,147,483,647.64-bit
: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
signed,unsigned
The C programming language uses the signed
keyword to indicate that a type has a positive
or negative sign
and contains negative values, and the unsigned
keyword to indicate that the type does not have a positive or negative sign and can only represent zero and positive integers.
For the int
type, the default is with positive and negative signs, that is, int is equivalent to signed int
. Since this is the default, the keyword signed
is usually omitted, but it will not report an error if written.
signed int a;
// equal to
int a;
The int
type can also be used without a positive or negative sign, to represent only non-negative integers. In this case, the variable must be declared with the keyword unsigned
.
unsigned int a;
The advantage of declaring an integer variable as unsigned
is that the maximum integer value that can be represented by the same length of memory is doubled. For example, the maximum value of a 16-bit signed int
is 32767
, while the maximum value of an unsigned int
is increased to 65535
.
The int
keyword in unsigned int
can be omitted, so the above variable declaration can also be written as follows.
unsigned a;
The character type char
can also be set signed
and unsigned
.
signed char c; // range -128 to 127
unsigned char c; // range 0 to 255
Subtypes of integers
If the int
type uses 4 or 8 bytes to represent an integer, this is a waste of space for small integers. On the other hand, in some cases, larger integers are needed, and 8 bytes are not enough. To solve these problems, C provides three integer subtypes in addition to the int type. This facilitates finer scoping of integer variables and better expresses the intent of the code.
short int
(abbreviated as short): generally occupies 2 bytes (the integer range is -32768 to 32767).long int
(abbreviated as long): occupies not less than int, at least 4 bytes.long long int
(abbreviated as long long): occupies more space than long, at least 8 bytes.
short int a;
long int b;
long long int c;
The above code declares three variables of integer subtypes.
By default, short
, long
, and long long
are signed
(signed), i.e., the signed
keyword is omitted. They can also be declared unsigned, which doubles the maximum value that can be represented.
unsigned short int a;
unsigned long int b;
unsigned long long int c;
C allows the int
keyword to be omitted, so the variable declaration statement can also be written as follows.
short a;
unsigned short a;
long b;
unsigned long b;
long long c;
unsigned long long c;
The byte lengths of data types are different on different computers. When you really need a 32-bit
integer, you should use the long
type instead of the int
type, which guarantees no less than 4 bytes;
when you really need a 64-bit
integer, you should use the long long
type, which guarantees no less than 8 bytes. On the other hand, to save space, you should use the short
type when only a 16-bit integer is needed, and the char
type when an 8-bit integer is needed.
Limit values for integer types
Sometimes we need to check the maximum and minimum values of different integer types in the current system. The C programming language header file limit.h
provides the corresponding constants, such as SCHAR_MIN
for the minimum value of signed char
type -128
and SCHAR_MAX
for the maximum value of signed char
type 127
.
For the sake of code portability, you should try to use these constants when you need to know the limit value of some integer type.
SCHAR_MIN, SCHAR_MAX
: the minimum and maximum values of signed char.SHRT_MIN, SHRT_MAX
: the minimum and maximum values of short.INT_MIN, INT_MAX
: the minimum and maximum values of int.LONG_MIN, LONG_MAX
: the minimum and maximum values of long.LLONG_MIN, LLONG_MAX
: the minimum and maximum values of long long.UCHAR_MAX
: the maximum value of unsigned char.USHRT_MAX
: the maximum value of unsigned short.UINT_MAX
: the maximum value of unsigned int.ULONG_MAX
: The maximum value of unsigned long.ULLONG_MAX
: the maximum value of unsigned long long.
Integers in Different Bases
The integers in C are decimal numbers by default. If you want to represent octal and hexadecimal numbers, you must use a specialized representation.
Octal uses 0
as a prefix, such as 017
, 0377
.
int a = 012; // octal, equivalent to 10 in decimal
Hexadecimal uses 0x
or 0X
as a prefix, such as 0xf
, 0X10
.
int a = 0x1A2B; // hexadecimal, equivalent to 6699 in decimal
Some compilers use the 0b
prefix for binary numbers, but it is not standard.
int x = 0b101010;
The placeholders for printf() in different integers are as follows:
%d
: decimal integer.%o
: octal integer.%x
: hexadecimal integer.%#o
: displays octal integers prefixed with 0.%#x
: displays a hexadecimal integer prefixed with 0x.%#X
: displays the hexadecimal integer prefixed with 0X.
int x = 100;
printf("dec = %d\n", x); // 100
printf("octal = %o\n", x); // 144
printf("hex = %x\n", x); // 64
printf("octal = %#o\n", x); // 0144
printf("hex = %#x\n", x); // 0x64
printf("hex = %#X\n", x); // 0X64
Floating-point type
Any value with a decimal point will be interpreted by the compiler as a floating point number.
The type declaration for floating point numbers uses the float
keyword, which can be used to declare floating number variables.
float c = 10.5;
In the above example, the variable c
is a floating-point type.
The float type takes up 4 bytes (32 bits), 8 of which hold the value and sign of the exponent and the remaining 24 bits hold the value and sign of the decimal. The float type can provide at least (decimal) 6 significant digits, and the exponent part ranges from (decimal) -37 to 37.
Sometimes the precision or range of values provided by 32-bit floating-point numbers is not enough, and C provides two other larger floating-point types.
double
: Occupies 8 bytes (64 bits) and provides at least 13 valid digits.long double
: usually occupies 16 bytes.
if (0.1 + 0.2 == 0.3) // false
C allows the use of scientific notation for floating-point numbers, using the letter e to distinguish between the fractional part and the exponential part.
double x = 123.456e+3; // 123.456 x 10^3
// equal to
double x = 123.456e3;
Boolean type
C originally did not have a separate type for Boolean values, but instead used the integer 0
for false
and all non-zero
values for true
.
int x = 1;
if (x) {
printf("x is true!\n");
}
In the above example, the variable x
is equal to 1
. C assumes that this value represents true
and therefore executes the code inside the decision body.
The C99 standard adds the _Bool
type, which represents a boolean value. However, this type is really just an alias for the integer
type, and still uses 0
for false
and 1
for true
, as shown in the example below.
_Bool isNormal;
isNormal = 1;
if (isNormal)
printf("Everything is OK.\n");
The header file stdbool.h
defines another type alias bool
and defines true for 1 and false for 0. These keywords can be used as long as this header file is loaded.
#include <stdbool.h>
bool flag = false;
In the above example, after loading the header file stdbool.h
, you can use bool
to define the boolean type.
Literals type
A literal is a value that appears directly inside the code.
int x = 123;
In the above code, x
is the variable and 123
is the literals.
Literals are also written to memory at compile time, so the compiler must specify the data type of the literal, just as it must specify the data type of the variable.
Normally, decimal integer literals (e.g. 123) are specified by the compiler as type int. If a value is larger than what int can represent, the compiler will specify it as long int. If the value exceeds long int, it will be specified as unsigned long. if it is not large enough, it will be specified as long long or unsigned long long.
Fractional numbers (e.g. 3.14) will be specified as an even type.
Literals suffix
Sometimes a programmer wants to specify a different type for a literal. For example, if the compiler specifies an integer literal as type int
, but the programmer wants to specify it as type long
, the literal can be suffixed with l
or L
, and the compiler will know to specify the type of the literal as long
.
int x = 123L;
In the above code, the literal 123
has the suffix L
, and the compiler will specify it as a long
type.
Octal and hexadecimal values can also be specified as Long
types using the suffixes l
and L
, such as 020L
and 0x20L
.
int y = 0377L;
int z = 0x7fffL;
If you wish to specify unsigned integers unsigned int
, you can use the suffix u
or U
.
int x = 123U;
L
and U
can be used in combination to represent unsigned long
types. the case and combination order of L
and U
does not matter.
int x = 123LU;
For floating point numbers, the compiler specifies the double type by default. If you wish to specify another type, you need to add the suffix f
(float) or l
(long double) after the decimal.
The following literal suffixes are commonly used.
f and F
: Float types.l and L:
Long int types for integers and long double types for decimals.ll and LL:
Long Long types, such as 3LL.u and U:
denote unsigned int, such as 15U, 0377U.
Below are some examples.
int x = 1234;
long int x = 1234L;
long long int x = 1234LL
unsigned int x = 1234U;
unsigned long int x = 1234UL;
unsigned long long int x = 1234ULL;
float x = 3.14f;
double x = 3.14;
long double x = 3.14L;
Overflow
Each data type has a range of values, and an overflow occurs if a value stored outside this range (less than the minimum or greater than the maximum) requires more binary bits to store. A value greater than the maximum value is called an overflow; a value less than the minimum value is called an underflow.
Generally, the compiler will not report an error for overflow and will execute the code normally, but will ignore the extra binary bits and keep only the remaining bits, which often gives unexpected results. Therefore, overflow should be avoided.
unsigned char x = 255;
x = x + 1;
printf("%d\n", x); // output: 0
In the above example, the variable x
is added with 1
. The result is not 256
, but 0
, because x
is an unsigned char
type with a maximum value of 255
(binary 11111111). After adding 1
, an overflow occurs and the highest bit of 256
, 1
(binary 100000000), is discarded, leaving the value 0
.
See the following example again:
unsigned int ui = UINT_MAX; // 4,294,967,295
ui++;
printf("ui = %u\n", ui); // 0
ui--;
printf("ui = %u\n", ui); // 4,294,967,295
In the above example, the constant UINT_MAX
is the maximum value of the unsigned int
type. If you add 1
, it will overflow for that type, thus getting 0
. And 0
is the minimum value for that type, and then subtract 1
to get UINT_MAX
again.
Overflows are easy to ignore and the compiler doesn’t report errors, so you have to be very careful.
for (unsigned int i = n; i >= 0; --i) // error
The above code seems to be fine, but the type of the loop variable i
is unsigned int
, and the minimum value of this type is 0
. It is impossible to get a result less than 0
. When i is equal to 0
and then subtracted from 1
, it does not return -1
, but the maximum value of type unsigned int, which is always greater than or equal to 0
, resulting in an infinite loop
.
To avoid overflow, the best way is to compare the result of the operation with the limit value of the type.
unsigned int a;
unsigned int b;
// error
if (a + b > UINT_MAX) too_big();
else b = a + b;
//correct
if (a > UINT_MAX - b) too_big();
else b = b + a;
In the above example, the variables b
and a
are both unsigned int
, and their sum is still unsigned int
, so there is a possibility of overflow
. The correct way to compare them is to determine the relationship between UINT_MAX - b
and a
.
Here is another wrong way to write it.
unsigned int i = 5;
unsigned int j = 7;
if (i - j < 0) // error
printf("negative\n");
else
printf("positive\n");
The result of the above example will output “positive
“, because both variables i
and j
are the unsigned int
type and the result of i-j
is also this type with a minimum value of 0
. It is impossible to get a result less than 0
.
sizeof operator
sizeof
is an operator provided by the C programming language that returns the number of bytes occupied by a certain data type or a value. Its argument can be a keyword of a data type, a variable name or a specific value.
// The argument is a data type
int x = sizeof(int);
// The argument is a variable
int i;
sizeof(i);
// parameter is a numeric value
sizeof(3.14);
The first example above, returns the number of bytes occupied by the int type
(usually 4 or 8).
The second example returns the number of bytes occupied by an integer variable
, and the result is exactly the same as the previous example.
The third example returns the number of bytes occupied by the floating-point number 3.14
. Since floating point literals are always stored as double
type, it will return 8 because of the 8 bytes occupied by the double type.
The return value of the sizeof
operator, which C only specifies as an unsigned integer
, does not specify a specific type, but leaves it up to the system to decide what type sizeof
actually returns. The return value may be unsigned int
, unsigned long
, or even unsigned long long
on different systems, and the corresponding printf()
placeholders are %u
, %lu
, and %llu
. This is not convenient for program portability.
C provides a solution by creating a type alias, size_t
, to uniformly represent the return value type of sizeof
. This alias is defined in the stdef.h
header file (which is automatically introduced when stdio.h
is introduced) and corresponds to the current system return value type of sizeof, which may be either unsigned int or unsigned long.
C also provides a constant SIZE_MAX
, which indicates the maximum integer that size_t
can represent. Therefore, the range of integers that size_t
can represent is [0, SIZE_MAX]
.
printf()
has a special placeholder %zd
or %zu
to handle values of type size_t
.
printf("%zd\n", sizeof(int));
In the above code, the %zd
placeholder (or %zu) is output correctly regardless of the type of the sizeof
return value.
If the current system does not support %zd
or %zu
, you can use %u
(unsigned int) or %lu
(unsigned long int) as an alternative.
Automatic type conversion
In some cases, C will automatically convert the type of a value.
Assignment Operation
The assignment operator automatically converts the value on the right to the type of the variable on the left.
Assigning floating-point numbers to integer variables
When floating point numbers are assigned to integer variables, C discards the fractional part directly, rather than rounding.
int x = 3.14;
In the above example, the variable x
is an integer type and the value assigned to it is a floating-point number. The compiler first automatically converts 3.14
to int
, discarding the fractional part, and then assigns that value to x
, so the value of x
is 3
.
This automatic conversion may result in the loss of some data (3.14 loses the decimal part), so it is better not to assign values across types and try to ensure that the variables have the same type and value.
Assigning integers to floating-point variables
Integers are automatically converted to floating-point numbers when assigned to floating point variables.
float y = 12 * 2;
In the above example, the value of the variable y
is not 24
, but 24.0
, because the integer to the right of the equal sign is automatically converted to a floating-point number.
Wide and Narrow typecast in C
When a narrow byte-width integer type is assigned to a wide byte-width integer variable, the narrow type is automatically converted to a wide type.
For example, a char
or short
type assigned to an int
type is automatically converted to int
.
char x = 10;
int i = x + y;
When a type with a wider byte width is assigned to a variable with a narrower byte width, a type degradation occurs and the type is automatically converted to a type with a narrower byte width. This may result in truncation, where the system automatically truncates the extra binary bits, leading to unpredictable results.
int i = 321;
char ch = i; // the value of ch is 65 (321 - 256)
In the above example, the variable ch
is a char
type with a width of 8 binary bits. The variable i
is a int
type and assigns i
to ch
. ch
can only hold the last 8 bits of i (101000001 in binary form, 9 bits in total), and the extra binary bits in front are discarded, keeping the last 8 bits as 01000001
(65 in decimal, equivalent to the character A).
Mixed Type Arithmetic
When values of different types are mixed together for calculation, they must be converted to the same type before calculation. The conversion rules are as follows.
When mixing integer and floating point operations, integers are converted to floating point types.
3 + 1.2 // 4.2
The above example is a mix of int and float types. 3
is converted to a float value of 3.0
and then calculated to get 4.2
.
- When different floating-point types are mixed, the type with narrower width is converted to the type with wider width, such as float to double and double to long double.
- When different integer types are mixed, the type with a narrow width is converted to the type with a wider width. For example, short to int, int to long, etc.
Function Return Type
The parameters and return values of the function are automatically converted to the types specified in the function definition.
int testfunc(int, unsigned char);
char a = 10;
unsigned short b = 20;
long long int c = testfunc (m, n);
In the above example, the parameter variables a
and b
are converted to the parameter types defined by the function testfunc ()
, regardless of their original types.
The following is an example of automatic type conversion of a function return value.
char testfunc(void) {
int a = 65;
return a;
}
In the above example, the variable a inside the function is an int type, but the returned value is a char type because that is the type returned in the function definition.
Explicit Type Conversion
We should avoid automatic type conversions to prevent unexpected results, but C provides explicit type conversions that allow manual type conversions.
A value or variable can be converted to the specified type by specifying the type in parentheses in front of the value or variable, which is called “casting
“.
(unsigned char) ch
The above example converts the variable ch to an unsigned character type.
Portability Type
The integer types in C (short, int, long) may occupy different byte widths on different computers, and it is not possible to know exactly how many bytes they occupy in advance.
For better portability of C programs, the header file stdint.h
creates some new type aliases.
Exact-width integer type
, which guarantees that the width of an integer type is determined.
int8_t
: 8-bit signed integer.int16_t
: 16-bit signed integer.int32_t
: 32-bit signed integer.int64_t
: 64-bit signed integer.uint8_t
: 8-bit unsigned integer.uint16_t
: 16-bit unsigned integer.uint32_t
: 32-bit unsigned integer.uint64_t
: 64-bit unsigned integer.
All of the above are type aliases, and the compiler will specify the underlying type they point to. For example, on a given system, if the int type is 32-bit
, int32_t
will point to int
; if the long type is 32-bit
, int32_t
will point to long
.
Here is an example of usage.
#include <stdio.h>
#include <stdint.h>
int main(void) {
int32_t x32 = 45933945;
printf("x32 = %d\n", x32);
return 0;
}
In the above example, the variable x32
is declared as type int32_t
, which is guaranteed to be 32
bits wide.
Minimum width type
, which guarantees the minimum length of an integer type.
- int_least8_t
- int_least16_t
- int_least32_t
- int_least64_t
- uint_least8_t
- uint_least16_t
- uint_least32_t
- uint_least64_t
These types above are guaranteed to occupy no less than the specified width of bytes. For example, int_least8_t
indicates the type that can hold an 8-bit signed integer
of minimum width.
Fast minimum width type
, the type that enables the fastest integer calculation.
- int_fast8_t
- int_fast16_t
- int_fast32_t
- int_fast64_t
- uint_fast8_t
- uint_fast16_t
- uint_fast32_t
- uint_fast64_t
The above types are to guarantee the byte width while pursuing the fastest arithmetic speed, for example, int_fast8_t
indicates the fastest type for 8-bit signed integers.
The integer type that can hold a pointer.
intptr_t
: Signed integer type that can store pointers (memory addresses).uintptr_t
: unsigned integer type that can store a pointer.
Maximum width integer type for storing the largest integer.
intmax_t
: The type of any valid signed integer that can be stored.uintmax_t
: the type of any valid unsigned integer can be stored.
These two types above are wider than long long
and unsigned lon
g.
0 Comments