Introduction to Data Structures: Arrays

Updated on April 11, 2018

What is a data structure?

A data structure is a method for organising a set of data. The structure is defined by how the data is stored and how operations, such as data access, insertion and deletion, are performed on the stored data. Data structures are essential tools for programmers, as each structure has a set of benefits that make it useful for solving certain types of problem.

General idea

An array is used to store a fixed number of data elements of the same data type. A single block of memory is set aside to store the whole array. The array's data elements are then contiguously stored within the designated block.

Conceptually, an array is best thought of as a collection of items that are related in someway. For example, an array storing numbers that represent the value of the cards within your hand while playing poker. Arrays are the most commonly used data structure and as such are directly included within most programming languages.

An example array, called Numbers, storing five integers. Stored data is coloured blue.
An example array, called Numbers, storing five integers. Stored data is coloured blue.

Initialisation

Like any other variable, arrays should be initialised before being used in the program. C++ provides different methods to initialise an array. Each array element can be set manually by looping over each array index. Alternatively, an initialiser list can be used to initialise the whole array in a single line. Different variations of the initialiser list syntax are allowed, as shown in the code below. An empty list will initialise the array to contain zeros or specific values for each element can be specified.

//Declaration without initialisation
int test1[4]; //test1 = [four unknown values]
//Manually setting each value
for(int i{0}; i < 4; i++){
	test1[i] = i + 1;
}
//test1 = [1,2,3,4]

//Using an initialiser list
int test2[4] {}; //test2 = [0,0,0,0]
int test3[4] {1,2,3,4}; //test3 = [1,2,3,4]
int test4[4] {1}; //test4 = [1,0,0,0]
int test5[] {1,2,3,4}; //test5 = [1,2,3,4]

Accessing data

Array elements are accessed through requesting an array index. In C++ this is done through the subscript operator, the syntax being: "Array_name[int index]". Arrays are zero-indexed, this means that the first element is given the index 0, the second element is given the index 1 and up to the last element being given an index equal to 1 less than the array's size.

Because the array's data is stored contiguously, it is simple for the computer to find the requested data element. The array variable stores the starting memory address of the array. This can then be moved forward by the requested index multiplied by the size of the data type stored in the array, reaching the starting address of the requested element. Storing the array as a block of memory also allows the computer to implement random access of individual elements, this is a fast operation, scaling as O(1).

Insertion and deletion

Inserting a new element or deleting a current array element isn't possible due to the restriction of the array being a fixed size. A new array (bigger or smaller by one element) would have to be created and the relevant elements copied over from the old array. This makes the operations inefficient and best handled by using a dynamic data structures instead of using an array.

Passing arrays to a function

In C++, the default method for passing parameters into functions is passing by value. You would then expect that passing an array would create a copy of the whole array. This isn't the case, instead the address of the first array element is passed by value. It is said that the array decays to a pointer (it can even be explicitly passed as a pointer). The decayed pointer no longer knows that it is meant to point to an array and any information relating to the array size is lost. This is why you will see most functions also taking a seperate array size variable. Care must also be taken as a non-constant pointer will allow modification of the array variables from within the function.

An array can also be passed by reference but the array size must be specified. This will pass the address of the first element by reference but it still retains the information that the pointer is pointing to an array. Due to the need to specify array size, this method is rarely used. In C++ 11, a standard library array class was introduced to deal with the issue of pointer decay.

Printing an array

#include <iostream>
//Pass array address by value (implicit pointer)
void Print1(const int arr[], int size) {
	std::cout << "[";
	for (int i{ 0 }; i < size; i++) {
		std::cout << arr[i];
		if (i < size - 1) { std::cout << ","; }
	}
	std::cout << "]";
}
//Pass array address by value (explicit pointer)
void Print2(const int* arr, int size) {
	std::cout << "[";
	for (int i{ 0 }; i < size; i++) {
		std::cout << arr[i];
		if (i < size - 1) { std::cout << ","; }
	}
	std::cout << "]";
}
//Pass array address by reference (size must be specified)
void Print3(const int(&arr)[5]) {
	//size of array is preserved and can be used
	int size{ sizeof(arr) / sizeof(arr[0]) };
	std::cout << "[";
	for (int i{ 0 }; i < size; i++) {
		std::cout << arr[i];
		if (i < size - 1) { std::cout << ","; }
	}
	std::cout << "]";
}

int main(){
	int numbers[5]{1,2,3,4,5};
	Print1(numbers);
	Print2(numbers);
	Print3(numbers);
	return 0;	
}

Multidimensional arrays

Multidimensional arrays are arrays whose elements are also arrays. This allows increasingly complex structures to be created, but 2D arrays are the most commonly used. When accessing a multidimensional array, the subscript operators are evaluated from left to right.

A common use of a 2D array is to represent a matrix. The 2D array can be thought of a storing a collection of rows (or columns). Each of these rows is a 1D array of numbers.

An example 2D array of integers, which could be used to represent a 3x5 matrix. The chosen visual layout clearly demonstrates how it is analogous to a matrix. However, the computer would store the numbers as a single, contiguous block of memory.
An example 2D array of integers, which could be used to represent a 3x5 matrix. The chosen visual layout clearly demonstrates how it is analogous to a matrix. However, the computer would store the numbers as a single, contiguous block of memory.

Initialising a 3x3 identity matrix

const int size{3};
int identity[size][size];
for(int i{0}; i < size; i++){
	for(int j{0}; j < size; j++){
		if(i == j){
			identity[i][j] = 1;
		} else {
			identity[i][j] = 0;
		}
	}
}

Advantages and disadvantages

+ Arrays are the most efficient data structure for storing data. Only the data is stored and no extra memory is wasted.

+ Random access allows the fast access of individual data elements.

+ Multidimensional arrays are useful for representing complex structures.

- The size of the array needs to be declared at compile time (in advance of the program running).

- The array size is fixed and cannot be resized during runtime. This can lead to arrays being used that are oversized, to leave space for potential new elements but wasting memory on empty elements.

Uses

Arrays are ubiquitous in programming and can be used for almost any problem. However, the key to using data structures is to select the structure whose attributes best suit the problem. Some examples for arrays being:

  • To store the objects placed on the board of a game. The board will always be a fixed size and fast access to a specific board space may be needed to modify the data stored there. For example, the user clicks an empty board space and the array element representing it needs to be changed from empty to full.
  • To store a constant table of values. Arrays are the best option to store a constant set of values that will be looked up by the program. For example an array of the alphabetic characters, allowing the conversion of a number to a character by using it as an array index.
  • As discussed previously, 2D arrays can store matrices.

Dynamic arrays

The C++ STL (standard template library) contains an implementation of a dynamic array, known as a vector. The vector class removes the requirement of a fixed size by including methods to remove existing elements and add new elements. A very simple code example is included below to demonstrate these features.

#include <vector>
#include <string>

int main(){
	std::vector<std::string> fruits
	{"apple","banana","blueberry","cucumber"};
	//fruits = ["apple","banana","blueberry","cucumber"]
	fruits.pop_back();
	//fruits = ["apple","banana","blueberry"]
	fruits.push_back("kiwi");
	//fruits = ["apple","banana","blueberry","kiwi"]
	return 0;
}

Test your knowledge

view quiz statistics

Alternative data structures

Questions & Answers

    © 2018 Sam Brind

    Comments

      0 of 8192 characters used
      Post Comment

      No comments yet.

      working