In C programming, strings and characters are important data-types. Learn what they are and how to use them like an expert even if you’re a beginner.

Most programming languages discern characters from integral values: actually, every computer memorises characters as whole numbers and C is able to manage both writings.

Characters are stored as numbers ( image )

→ For more information on how data are represented in a computer, read this article by BBC

In fact, one of the most amazing features of C language is its ability to control the computer both at low-level – machine language – and high-level – human language.

The same concept applies also to strings.

Strings are stored in computer memory as sequences of characters, so they are similar to arrays – with a little difference that we’ll discuss later in this post.

 

Characters and ASCII


 
In order to translate numbers to characters, computers use encoding systems which link each character to a numeric code.

C uses ASCIIAmerican Standard Code for Information Interchange – which defines 128 symbols of 1 byte each – out of a total of 256, possible with 8 bits.

Ascii table

Ascii table from Wikipedia

→ You can convert a character to decimal, hex and binary code on RapidTables

This amount is enough to represent all English symbols, but not all symbols from lots of other languages, in particular Chinese and Japanese.

Because of that, after ASCII, many others encoding systems have been created, such as UNICODE and UTF-16, that however aren’t supported by C.

Let me know if you want me to make a post about multi-byte characters, in order to write in other languages.

 

Char


 
The data type used to store characters is char.

char is an integral data type, similar to int, with one difference:

  • the first takes up 1 byte;
  • the second 2 or 4 bytes – depending on the compiler.

Characters take up 1 byte

→ Compare char with other data types at GeeksforGeeks

char can contain values from -127 to 127:

  • values from 0 to 127 represent ASCII’s characters
  • values from -128 to -1 ( or 128 to 255 using the unsigned modifier ) represent Extended ASCII’s characters

In order to assign a value to a char variable, you can use either the character’s numerical code or the character itself between apostrophes.

Example:

65 is the ASCII code that stands for A, so both the methods do the same thing.

You must use apostrophes ( ‘ ) for characters and quotation marks ( ” ) for strings.

 

Read and Write characters


 
In order to read a character from the command line, you can use scanf the same way you do for int.

Nevertheless you can use two different special characters:

%d to read a decimal ASCII code ( e.g. 65 )
%c to read a character ( e.g. A )

character input

→ To learn more about format specifiers in C, read this

If you ask the user to enter a symbol – such as A – you have to read it using %c.

If you ask the numerical code of a symbol – such as 65 – you have to read it with %d.

In order to write a character on the command line, you can use printf – as we’ve done so far for int – with the same special characters we’ve just seen previously.

If I define a variable…

… I can print the character it contains using

and its ASCII code using

To sum up: you can continue using printf and scanf, writing:

%c for characters
%d for whole numbers – including ASCII codes

 

Operations with char


 
Being a whole number, char supports all operations that int does, obviously inside the limits of its possible values.

Therefore you can sum and compare two characters.

For example, ‘A’ (= 65 ) + ‘#’ (= 35 ) = ‘d’ (= 100 )

However operations of this kind are quite useless, while comparisons and another operation – that we’re about to see – are very useful.

The operation I’m talking about is converting uppercase to lowercase and vice versa.

In fact in the ASCII table the difference between each lowercase letter and its uppercase version is always 32.

Said so, we’re going to learn how to do this conversion first by ourselves and then using a library.

Uppercase to lowercase ( Mathematically )

Let’s explain this piece of code.

First of all there are some comments ( indicated by the // ), which, for who doesn’t know, are ignored by the compiler.

There is a variable called uppercase, initialised to ‘B’ and a variable lowercase which isn’t initialised.

The program prints the value of the first one and then it sets the second one equal to the sum of uppercase plus 32.

In the end it prints the value of lowercase.

Copy and paste this piece of code inside the main function of your program and it’ll convert the character ‘B’ to ‘b’.

Today’s program does almost the same thing: it converts each uppercase letter to lowercase and vice versa.

Keep reading to find out how it works.

Uppercase to lowercase ( using a library )

The library ctype.h contains two function to convert letters:

int tolower(int c); it returns the lowercase version of the character c
int toupper(int c); it returns the uppercase version of the character c

Here’s a comparison between the previous program with and without the library ctype.h



 

Strings


 
By definition, a string is an array of characters that ends with a null character.

A null character, also called null terminator and NUL, is a special character represented either by ‘\0’ and 0.

  • ‘\0’ is a single character, like ‘\n’
  • 0 is its ASCII code

Why do only character arrays end with a null character? Why not integer arrays?

The null terminator does literally nothing.

It was sent to printers to allow the mechanism to return to the first printing position on the next line.

In programming it’s used to indicate the end of a string.

Do you remember when last time I said that the length of an array is the number of elements it contains?

Well, when using strings you must remember that the null characters counts as an element as well.

So, if you want to define a string of 4 letters, you can’t set the array’s size to 4, but to 5.

 

Defining a string


 
In order to create a string variable, you have to define a char array of length equals to x+1 where:

  • x is the number of symbols the strings contains
  • +1 is used to include the null terminator

Let’s make a string containing the word “Hello!”.

strings representation

→ If you missed the previous article about arrays, click here

“Hello!” is made of 6 symbols, so the length of the array is 6+1=7.

When you assign a constant value to a string, you must use quotation marks, not apostrophes as it’s done for characters instead.

Since a string is an array, you can also initialise it as a normal array, specifying the value of each element.

Each element of a string is a character, so we use apostrophes.

To sum up:

‘ ‘ it’s used for single characters
” “ it’s used for strings

Did you know that if you don’t initialise an array element, it’s automatically set to 0?

And 0 is equal to ‘\0’, the null terminator.

So I can also write:

The last element of word – word[6] – is automatically set to 0.

Important: don’t assign to an array or a string a value that exceeds the numbers of elements that it can contains.

Since C can control the computer memory at low-level, if you try to insert 5 values in an array with length = 4, you can end up modifying a part of the memory reserved to another variable and the effects can be unpredictable.

You can either change the value of another variable in your program or crush your computer.

→ To learn more about this topic, have a look at this

The following program, in fact, is potentially harmful for your machine and I’m not going to try it on mine:

 

Reading and Writing strings


 
There are a lot of methods to read and write strings.

In order to print a string on the command line you can use printf and the special character %s.

The following piece of code

prints the value of the array word that we defined in the previous chapter.

Reading a string is more complex.

Why is that?

As I said previously, you must make sure that the number of elements that are assigned to an array isn’t greater than its length.

You can either use scanf with some modifications to the special character %s, or other two functions defined in stdio.h: gets and fgets.

scanf

Preview:

If you want to use scanf, you must make sure of two things:

  1. It mustn’t read more character than the string can contain;
  2. It shouldn’t stop at the first white space

The basic %s doesn’t care about the length of what it’s reading and it stops at the first white space.

Solving the 1st problem

To solve the first problem, you can use a modifier.

A modifier changes the behaviour of special characters, such as %s.

First of all, %s, %d, %c aren’t special characters as I’ve been calling them so far.

I’ve been doing it to make them more understandable.

Their real name is format specifier and their syntax is:

where:

% it indicates the start of the format specifier
[*] An asterisk tell the program to skip a value
[width] It’s a number that specify the amount of character to be read
[length] It alters the excepted amount of storage used by a value.
We’re going to analyse it better in a further post.

In order to prevent scanf from reading more characters than it should, you have to replace [width] with the string’s length – 1 ( because you have to leave space for the null terminator ).

For example:

or

and so on and so forth.

As you can notice, to assign a value read with scanf to a string, you mustn’t use &.

Do you want to know why?

If so, follow the blog and wait for the article about pointers. You can do it by writing a commend below and marking the “Follow” box or by going to the Home Page and subscribing to the newsletter.

Solving the 2nd problem

In order to solve the second problem, let me introduce a new modifier to you.

By using [^x] where x is a character – not between apostrophes – you can force scanf to stop at that character and not at the first white space.

For example:

This piece of code stops only when it encounters the ‘\n’ character, which is found at the end of a line – basically, where you’ve pressed enter on the command line.

So, in the end the code you should use to read a string using scanf is:

where l is str’s length – 1 ( because you have to leave an element for the null character) . For example:

or

gets

gets is another function from stdio.h

Its syntax is:

This function reads a whole line from the terminal, without further expedients.

However there’s no way to limit the amount of characters read.

In short: don’t use it.

fgets

Preview:

fgets is an advanced version of gets and it’s what most programmers prefer to use.

It lets you set the string length and the stream from which to read the line.

Furthermore, you can directly insert the length of the array and it will automatically save space for the null terminator.

Its syntax is:

Where:

string it’s a string variable
length it’s the number of characters to read, including the null terminator
stream it’s the place from which the characters are read

This function allows to read from a file, too.

In a further post we’re going to talk about files and we will come back to this function and in particular the argument stream.

For now, set stream to stdin, that is the input of the command line – while stdout is the output of the command line.

For example:

Note: while using scanf for an array of length=10 you have to write 10-1=9, using fgets you directly use the whole length.

 

string.h


 
In today’s program – that you can find above, at the beginning of this post – I’ve used the library string.h and one of its functions strlen.

The library string.h contains lots of useful functions for strings.

Since strings are arrays, there can be some errors and unexpected results when you try to use operators.

The functions contained in string.h are used to make operations with strings avoiding errors.

The list is long and you can find it all at Tutorials Point.

I think these are the most important:

strcpy(destination, source); It copies the value of source to destination
strcat(destination, source); It appends source at the end of destination
strcmp(str1, str2); It compares str1 to str2 and it returns 0 if they’re equal
strchr(str, c); It return the part of the string str after the character c
strlen(str); It returns the length of a string – until ‘\0’ not the length of the whole array
strtok(str, delim); It returns the part of str before the string delim

There are lots of other functions, but this post is getting long and I can’t cover them all.

practice strings

If you want to practice using strings, check this exercises from w3resource

 

Let’s analyse today’s program


 
We’re about to go through the new lines of code used in today’s program.

Its objective is to convert each lowercase letter in a string entered by the user to uppercase and vice versa.

#include

#include is used to include a library, that is a set of functions and values already written.

As we use it for stdio.h – which contains the main input/output functions of C – we can use it also to include string.h, which contains useful functions to manage strings.

In that program I used the strlen function from string.h.

char str[20]

We are defining an array ( […] ) of characters ( char ) containing 20 elements ( [20] ).

Since an array of characters is a string, we are basically creating a string.

fgets(str, 20, stdin)

It’s the preferred function by programmers to read a line from the terminal.

It reads a maximum amount of character equals to 20, including the null terminator, from the stdin stream, that is the command line, and it assigns it to the string str.

for(int i=0; i

This is a for loop that goes through the whole string - character by character.

strlen(str) gets the length of str, so it will stop at the null character of str.

if(str[i]>=65 && str[i]

This line of code checks if the current character analysed in the loop, str[i], is a value between 65 ( 'A' ) and 90 ( 'Z' ).

str[i] += 32

If the previous if statement is true, this line is executed.

It adds 32 to the decimal code of the current character, to convert the uppercase character to lowercase.

else if(str[i]>=97 && str[i]

If the previous conditional statement is false and the character is a lowercase letter - a value between 97 ( 'a' ) and 122 ( 'z' ) - the next line is executed.

str[i] -= 32

By subtracting 32 to a lowercase character, you turn it into an uppercase one.

printf("Uppercase to lowercase and vice versa: %s", str);

It prints the new value of the string.

 

Translating the code to human language


 
Computer, include stdio.h
Include string.h

Here's the main function: start from here
Create a string called str of 20 characters, including the null terminator
Print "Enter a string: "
Read a maximum amount of 20 characters from the command line and assign it to str
Get the real length of str and for each one of its elements do this:

If the current character of str is uppercase

Turn it lowercase

Otherwise, if it's lowercase

Turn it uppercase

Print "Uppercase to lowercase and vice versa:" followed by the value of str.

 

C and C++ Comparison


c++ vs c

Apart from the usual differences, there are 3 others.

The first isn't in the program, but it's the most important.

C++ is an objects-oriented programming language, so it contains a library called string, with a new data type just for strings.

We haven't introduced the objects-oriented paradigm, so I used the C strings library, which in C++ is called cstring.h.

The other 2 differences are:

  1. cstring.h instead of string.h - but they're the same library, only the name is different
  2. cin.getline(string, length) instead of fgets(string, length, stream), that are similar, but fgets requires also the input stream

 

Conclusion


 
Today you've learnt how to use characters and strings.

Now you have a complete understanding of how characters work and how to operate with them both in human and in machine language.

Many programming languages don't let you control these types of variables at such a low level ( the lower the level, the higher the control ).

Furthermore, you've learnt one of the most important features of C: strings.

Now you know the difference between a simple array and a string and that only strings can be printed with a format specifier ( %s ).

If you have any questions, feel free to ask it through a comment.

If you think this might help a friend of yours, share it: it would make my day.

Anyway, the post is finished, have a good day and we will see next week!

From Zephyro it’s all, Bye!

Categories: Learn C

Leave a Reply

%d bloggers like this: