Introduction
You only need to know a few fundamental data structures to be a good programmer.
These basic structures are the building blocks of the solution to any problem you can imagine.
Everything from the simplest shell script to the most advanced systems imaginable use at least one of these structures in some way.
Overview
Each data structure defines a specific set of rules for organizing data.
As a general note, we will refer to “Containers” and “Things”.
-
Containersrefers to the data structure itself. -
Thingsrefers to the data contained by the data structure.
In some data structures, the Container enforces the rules of the data structure. In others, the rules are enforced by the Things themselves.
- Arrays, Lists, and Sets are examples of data structures where the rules are enforced by the Container.
- The rules of Trees and Graphs are enforced by the Things inside the Container.
- Hash Tables are a mix of both.
Level 1: Fundamental Data Structures
Arrays
An Array is a Container with a specific number of “slots” (references to Things) usually called it’s “capacity”.
Each “slot” is referenced by it’s position (usually called an index).
Arrays are created with a specific number of slots and cannot be resized. If you need to add more things to an Array than it has slots, you need to create a new Array with more slots and copy the things from the old Array to the new Array.
Array Operations
| Operation | Description | Example |
|---|---|---|
| Get | Get the thing at a specific index | thing = array[index] |
| Set | Set the thing at a specific index | array[index] = thing |
| Length | Get the number of things in the Array | length = array.length |
Lists
A List is a data structure that contains Things in a specific order. The order doesn’t have anything to do with the Things themselves (for example, the Things in the List don’t have to be in alphabetical order). The order is just the order in which the Things were added to the List.
Lists can be created without a specific initial capacity and they can grow or shrink in size.
To summarize Arrays vs. Lists:
- Arrays are created with a specific initial capacity and cannot be resized.
- Lists are created without an initial capacity and can grow or shrink as needed.
List Operations
| Operation | Description | Example |
|---|---|---|
| Add | Add a thing to the end of the List | list.add(thing) |
| Remove | Remove a thing from the List | list.remove(thing) |
| Get | Get the thing at a specific index | thing = list.get(index) |
| Set | Set the thing at a specific index | list.set(index, thing) |
| Length | Get the number of things in the List | length = list.length |
Sets
A Set is a collection of unique Things. Things are considered unique when they are not equal.
How Sets are Different from Arrays and Lists
Elements in a Set cannot be referenced by their position. You can’t get the first element, the second element, etc.
The order in which you add things to a Set does not matter.
If you need the elements in a Set to be in a specific order, you should use a List instead and enforce the uniqueness constraint yourself when adding elements to the List. This will be slower than using a Set. But you can work around that by using a List and a Set together.
Set Operations
| Operation | Description | Example |
|---|---|---|
| Add | Add a thing to the Set. Adding something to a Set that already contains that thing will leave the Set unchanged. | set.add(thing) |
| Remove | Remove a thing from the Set | set.remove(thing) |
| Contains | Check if the Set contains a thing | contains = set.contains(thing) |
| Length | Get the number of things in the Set | length = set.length |
Hash Tables
A Hash Table is a collection of key-value pairs where the keys are unique, and each key is associated with exactly one value.
Another generic name for a Hash Table is an Associative Array. This is because a Hash Table is an array, not of single values, but of key-value pairs.
Every programming language has a built-in implementation of a Hash Table, but many languages have different names for them.
For example:
- In Python, the built-in Hash Table is called a Dictionary.
- In JavaScript, it’s called an Object.
- In Ruby, it’s called a Hash.
- In Java, it’s called a Map.
Despite having different names, these data structures are all conceptually the same. They are all Hash Tables at heart.
Hash Table Operations
| Operation | Description | Example |
|---|---|---|
| Put | Add a key-value pair to the Hash Table | hashTable.put(key, value) |
| Get | Get the value associated with a key | value = hashTable.get(key) |
| Remove | Remove a key-value pair from the Hash Table | hashTable.remove(key) |
| Contains | Check if the Hash Table contains a key | contains = hashTable.contains(key) |
| Length | Get the number of key-value pairs in the Hash Table | length = hashTable.length |
Graphs
A Graph is composed of two Sets with specific rules about the things in each Set.
- The first Set contains elements called Nodes (also called Vertices).
- The other Set contains elements called Edges. An Edge connects two Nodes.
Each Node has some unique identifier that distinguishes it from other Nodes in the Graph. Each Edge defines a relationship between exactly two Nodes.
Graph Operations
| Operation | Description | Example |
|---|---|---|
| Add Node | Add a Node to the Graph | graph.addNode(node) |
| Remove Node | Remove a Node from the Graph | graph.removeNode(node) |
| Add Edge | Add an Edge to the Graph | graph.addEdge(edge) |
| Remove Edge | Remove an Edge from the Graph | graph.removeEdge(edge) |
| Get Nodes | Get all the Nodes in the Graph | nodes = graph.getNodes() |
| Get Edges | Get all the Edges in the Graph | edges = graph.getEdges() |
Summary of Fundamental Data Structures
- We
addandremovethings from Lists and Sets.- We use Lists when order is important.
- We use Sets when uniqueness is important.
- We refer to an element in a List by it’s position.
- We don’t refer to elements in a Set directly.
- We
getandputthings in Hash Tables (also called Associative Arrays).- We use a Hash Tables when a) we need to look things up quickly and b) we can refer each thing by a unique id, key, name, or label.
- We
addandremoveNodes and Edges from Graphs.- We use Graphs when we need to represent relationships among things and when need to remember properties of the elements, the relationships, or both.
Level 2: Common Specialized Data Structures
Specialized Lists
Linked Lists
A Linked List is a special type of List where each element has a reference to the next element in the List.
There is also a Doubly Linked List where each element has a reference to both the next and previous elements.
Stacks
A Stack is a List where you can only add or remove elements from the end of the List.
This is commonly called a Last-In-First-Out (LIFO) data structure.
Queues
A Queue is a List where you can only add elements to the end of the List and remove elements from the beginning of the List.
This is commonly called a First-In-First-Out (FIFO) data structure.
Specialized Sets
Trees
A Tree is a Set where each Node can have zero or more child Nodes.
This may seem similar to a Graph, but there are some important differences.
- In a Tree, there is only one path from one Node to another.
- In a Graph, there can be multiple paths from one Node to another.
- In a Tree, there is a special Node called the Root that is the starting point for all paths in the Tree.
- In a Graph, there is no special Node that is the starting point for all paths.
- In a Tree, there are no cycles (a cycle is a path that starts and ends at the same Node).
- In a Graph, there can be cycles.
- In a Tree, there is a special Node called a Leaf that has no children.
- In a Graph, there are no special Nodes that have no children.
In terms of the core data structure though, while a Graph is made up of two Sets (Nodes and Edges), a Tree is made up of just one Set (Nodes).