Arrays
- Access & Search: Direct index access in O(1) time. Searching an unsorted array takes O(n) time, while binary search on a sorted array takes O(log n) time. For example, an array of size n in non-decreasing order can be searched in O(log n) using binary search (volume2_removed (8).pdf).
- Insertion & Deletion: Inserting or deleting at the end of an array is O(1), but at an arbitrary position requires shifting elements (O(n) worst-case). Thus, arrays have O(n) insertion/deletion in general (Time complexities of different data structures | GeeksforGeeks).
- 2D Array Mapping: In row-major order, the element at row i, column j (0-indexed) is at index
i * numCols + j
in the linear memory. For special matrices stored in 1D, use formulas. Example: For a lower triangular matrix of size N stored row-wise in a 1D array, the index of element A[i,j] (1-indexed with i ≥ j) is:
index(i,j)=i×(i−1)2+(j−1)\text{index}(i,j) = \frac{i \times (i-1)}{2} + (j-1)
This counts all elements of previous rows plus the offset in the i-th row (Convert given lower triangular Matrix to 1D array | GeeksforGeeks). - Memory Formula: If an array starts at memory address
base
, element A[i][j] in a row-major layout has addressbase + (i * cols + j) * element_size
. This formula is often used to compute addresses quickly. - Key Point: Arrays support constant-time random access due to contiguous memory, but operations that require shifting (insertion, deletion, merging) cost linear time.
Linked Lists
- Types: Singly linked list (nodes with next pointer), Doubly linked list (nodes with prev & next), Circular linked list (last node points back to first).
- Traversal & Search: Accessing the k-th element or searching for a value takes O(n) time since traversal is sequential (volume2_removed (8).pdf). Linked lists cannot perform binary search efficiently because there’s no constant-time random access (binary search on a linked list would be O(n) per step). In fact, “binary search using a linear linked list is [not] efficient” (volume2_removed (8).pdf).
- Insertion & Deletion: Given a reference to the node (or its previous node), insertion or removal is O(1) since it involves pointer updates. For singly linked lists, inserting at head or deleting at head is O(1). In a doubly linked list, deletion is O(1) if you have a pointer to the node (since you can update both neighbors directly). Without direct access, you must traverse to find the position (O(n)).
- Example: To concatenate two lists in O(1), maintain a tail pointer. If list1’s tail and list2’s head are known, do
tail1.next = head2
(and update tail1 to tail2 for future operations). With a tail pointer (or a header with pointers to both first and last nodes), merging lists is constant time (How to write an O(1) algorithm to merge two doubly linked lists into a NEW list and then free the original two lists? : r/programminghelp). (Without a tail pointer, concatenation would require traversal of list1 to find its end, costing O(n).)
- Example: To concatenate two lists in O(1), maintain a tail pointer. If list1’s tail and list2’s head are known, do
- Advantages: Insertion and deletion at the ends (or at known node positions) are efficient. Linked lists grow dynamically (no resizing needed) and can easily insert elements in the middle without shifting others.
- Drawbacks: No random access — to get to the i-th element, you must traverse i nodes (O(n)). This also makes sorting or searching by value relatively slow (O(n) unless specialized structures are used). Linked lists also use extra memory for pointers. They are not suitable for problems requiring efficient indexing; e.g., performing a binary search on a linked list is inefficient (volume2_removed (8).pdf).
- Variants: Circular lists make it easy to cycle through data (useful for round-robin scheduling). Doubly linked lists allow backward traversal and direct deletion given a pointer. However, maintaining extra pointers has overhead.
Stacks
- LIFO Principle: A stack is a Last-In First-Out structure. The main operations are Push (insert element on top) and Pop (remove element from top), both of which run in O(1) time (Time complexities of different data structures | GeeksforGeeks). Peek (view top element) is also O(1).
- Implementations: Can be implemented via arrays (with a top index) or linked lists (top as head). Array implementation has a fixed capacity (unless using dynamic array resizing), while a linked list implementation grows as needed. Both give O(1) push/pop.
- Applications:
- Expression Conversion/Evaluation: Stacks are essential for converting infix expressions to postfix/prefix and for evaluating postfix expressions. Infix to Postfix algorithm uses an operator stack: scan tokens, push operators (pop when incoming operator has lower or equal precedence than stack top, output them) and use the stack to ensure correct order of operations (volume2_removed (8).pdf) (volume2_removed (8).pdf). A GATE question asked “Which of the following is essential for converting an infix expression to postfix form efficiently?” – the answer is an operator stack (volume2_removed (8).pdf). Operands go directly to output, and an additional operand stack is not needed for the conversion process.
- Parentheses Matching: Push opening brackets, pop when a closing bracket is encountered, to check for balanced parentheses in O(n).
- Function Call Stack: Recursion uses an implicit stack to store return addresses and local variables.
- Undo mechanisms: Many applications (text editors, etc.) use stacks to track actions for undo/redo.
- Miscellaneous: A stack can be used to implement DFS (deepest node exploration) or to evaluate arithmetic expressions (postfix evaluation by pushing operands and computing on operators). Also, you can implement a stack using a single or double queue, or even a priority queue with time-stamped keys (e.g., push(x) as insert (x, clock++) into max-heap so that the most recent item has the highest key and is popped first) (volume2_removed (8).pdf). However, these alternative implementations usually come with higher constant factors or complexity trade-offs.
- Complexities: Push – O(1), Pop – O(1), Top/Peek – O(1). Searching within a stack (if needed) is O(n) since you’d have to pop items or traverse the underlying structure. But normally, stacks are not used for search – only top access is needed.
Queues
- FIFO Principle: A queue is a First-In First-Out structure. Primary operations are Enqueue (insert at rear) and Dequeue (remove from front), both in O(1) time with a proper implementation (Time complexities of different data structures | GeeksforGeeks).
- Implementations: Commonly via circular arrays or linked lists. With a circular array of capacity N, use two indices
front
andrear
. Enqueue atrear
and increment modulo N; Dequeue fromfront
and increment modulo N. A linked list implementation uses pointers: enqueue at tail, dequeue at head, both O(1). - Circular Queue Details: To distinguish full vs empty in a circular buffer of size N, one trick is to keep a count of elements or to reserve one slot empty. For example, one policy is to declare the queue full when
(rear+1) mod N == front
(leaving one unused slot) (volume2_removed (8).pdf). Alternatively, maintain a size counter. - Applications: Scheduling (CPU task scheduling uses queues for round-robin), BFS traversal (uses a queue to manage frontier) (Why is the complexity of both BFS and DFS O(V+E)? | GeeksforGeeks), buffers in networking/IO (producer-consumer problem).
- Double-Ended Queue (Deque): A generalized queue where insertion and deletion can happen at both ends. Useful in algorithms like the sliding window technique. Typical deque operations are also O(1).
- Interconversion: Queues and stacks can simulate each other with some overhead. For instance, a stack can be made from two queues by enqueuing and dequeuing (with O(n) per operation in worst case), and a queue can be made from two stacks (the standard two-stack method yields amortized O(1) operations: one stack for inbox, one for outbox) (volume2_removed (8).pdf) (volume2_removed (8).pdf). GATE asked: “Minimum number of stacks of size n to implement a queue of size n?” – Two stacks are sufficient to implement a queue (enqueue on one, dequeue by transferring elements to the other stack) in amortized linear time.
- Priority Queue vs Normal Queue: A priority queue retrieves the highest (or lowest) priority element first (not strictly FIFO). It’s usually implemented with a heap (see Heaps section). A normal queue has no priority; it’s strictly FIFO. One GATE question involved using a max-heap to implement a stack by assigning incrementing priority keys to each pushed element (volume2_removed (8).pdf) (volume2_removed (8).pdf), illustrating the versatility of priority queues.
Trees (General and Binary Trees)
- Terminology: A tree is a hierarchical structure of nodes with parent-child relationships (no cycles). A binary tree is a tree where each node has at most two children (referred to as left and right). Key properties and formulas for binary trees often appear in problems:
- Nodes and Edges: A tree with n nodes has n–1 edges (since each edge connects a node to its parent, except the root) (In binary tree, number of nodes with two children when number of leaves is given – Mathematics Stack Exchange).
- Tree Traversals: Inorder, Preorder, Postorder are depth-first traversal methods. Given two of these (inorder + preorder or inorder + postorder), one can uniquely reconstruct a binary tree. However, given only preorder and postorder, reconstruction is not unique in general (unless additional constraints hold) – this was the basis of a true/false question (volume2_removed (8).pdf).
- Binary Tree Properties:
- Height (h): The length of the longest root-to-leaf path. A tree with a single node has height 0 (by edge-count definition) or 1 (by node-count definition, depending on convention). Common formulas: A perfect binary tree of height h (edges) has 2h+1−12^{h+1}-1 nodes; conversely, for n nodes in a perfect tree, height ≈ log₂(n). An arbitrary binary tree with n nodes can have minimum height ≈ log₂(n) (balanced tree) and maximum height n–1 (degenerated chain).
- Leaf and Internal Nodes: A leaf is a node with no children (degree 0). An internal node has at least one child. In any binary tree, the number of leaves (N₀) and the number of internal nodes with two children (N₂) satisfy N₀ = N₂ + 1 (In binary tree, number of nodes with two children when number of leaves is given – Mathematics Stack Exchange). For example, if a binary tree has 5 leaves, it must have 4 nodes with two children (assuming the tree is non-empty). This property can be derived from the fact that adding a child to a node changes leaf counts in a consistent way. (Nodes with a single child do not directly figure into this equation, as they contribute the same to leaves and internal count on both sides of the equation.)
- Full Binary Tree: A binary tree is full (sometimes also called proper) if every internal node has exactly 2 children. In a full binary tree with I internal nodes, there are L = I + 1 leaves (by the above formula).
- Complete Binary Tree: A binary tree is complete if all levels are fully filled except possibly the last, which is filled from left to right. The height of a complete tree with n nodes is ⌊log₂ n⌋.
- Traversal sequences: Preorder = Root-Left-Right, Inorder = Left-Root-Right, Postorder = Left-Right-Root. Knowing inorder + preorder (or inorder + postorder) allows unique construction of the tree. However, knowing only preorder and postorder is not sufficient to uniquely determine the tree (multiple trees can have the same preorder and postorder if the structure is not constrained).
- Binary Tree Computations:
- External Path Length: Some questions involve path lengths. For instance, the sum of depths of all leaves (external path length) in a full binary tree has relationships to number of nodes. (Know that each edge contributes to path length count for exactly one leaf in a full binary tree.)
- Induction Proofs: By induction, one can prove properties like in a binary tree where every non-leaf has exactly 2 children, L = I + 1 (volume2_removed (8).pdf) (volume2_removed (8).pdf), or that a tree with all leaves at the same depth has 2^d leaves at depth d, etc. These often appear as theoretical questions.
Binary Search Trees (BST)
- BST Property: A binary search tree is a binary tree where for every node, all keys in the left subtree are smaller and all keys in the right subtree are larger than the node’s key (assuming no duplicate keys). This ordering allows efficient search.
- Operations Complexity: Search, Insert, Delete take O(h) time where h is the height of the tree. On average, for a balanced BST, h = O(log n), so operations are O(log n). However, in the worst case (if the tree becomes skewed like a linked list), h = O(n) (volume2_removed (8).pdf). Thus, unbalanced BST insertions can degrade to linear time.
- Traversal: Inorder traversal of a BST gives sorted order of keys. This property is often used in questions (e.g., given a sequence of keys and the inorder traversal result, deduce the BST shape or output).
- Construction & Insertion Order: The shape of a BST depends on insertion order. For example, inserting keys in sorted order produces a skewed tree. GATE questions may ask things like: “A BST is generated by inserting keys in a certain order; what is the inorder traversal of the resulting tree?” – The answer is just the sorted keys (BST property) (volume2_removed (8).pdf). Or given an insertion sequence, determine number of nodes in left/right subtrees of the root (volume2_removed (8).pdf) (insert the first key as root, then observe which keys go left or right).
- Deletion in BST: Three cases – deleting a leaf (just remove it), deleting a node with one child (splice it out), deleting a node with two children (replace it with inorder successor or predecessor, then remove that successor). These maintain BST order. Typically O(h) time.
- Number of Possible BSTs: With n distinct keys, the number of structurally unique BSTs is the n-th Catalan number. Formula: Cn=1n+1(2nn)C_n = \frac{1}{n+1}\binom{2n}{n}. For example, for n=3 keys, there are C3=14(63)=5C_3 = \frac{1}{4}\binom{6}{3} = 5 possible BST structures. In general, Total BST count = (2n)!(n+1)! n!\frac{(2n)!}{(n+1)!\,n!} (Count Total Number of Unique Binary Search Trees with n Keys). This formula often appears when counting BSTs or full binary trees in exam problems.
- Rank Queries and Order Statistics: Some problems involve finding the k-th smallest element. Augmented BSTs (with subtree size counts) can support this in O(h) time by traversing left or right based on counts. (Mentioned in case such a question arises, e.g., order statistic tree concept.)
- BST vs Other Structures: A question might compare BST search cost with other structures. For example: “Searching an AVL tree is O(log n) but in a binary search tree is O(n)” (this statement is generally false unless the BST is skewed; on average BST is O(log n) as well). A correct statement would be: “The worst-case cost of searching an unbalanced BST is O(n), while in an AVL tree it is O(log n)” (volume2_removed (8).pdf). Another: “Searching a complete binary tree is O(log n)” which is true (complete tree is height-balanced). Be ready to identify such true/false comparisons.
- Use Cases: BSTs are good for in-memory dictionaries when insertions and deletions are frequent and uniform random (to avoid worst-case). They maintain sorted order, which is useful for order-based queries (find min, max, next larger, etc., all in O(log n)).
Balanced BST (AVL Tree as example)
- Need for Balance: Self-balancing BSTs (like AVL trees, Red-Black trees, etc.) maintain tree height ≈ O(log n) even in worst cases, by rebalancing after insertions or deletions. AVL trees strictly enforce balance factor conditions (difference in height of left and right subtree ≤ 1 for every node). Red-Black trees enforce a color and black-height invariant. The result is guaranteed O(log n) time for search, insert, delete (AVL tree – Wikipedia) (AVL tree – Wikipedia).
- AVL Tree Height Formula: AVL trees are very tightly balanced. The maximum height of an AVL tree with n nodes is about 1.44 * log₂(n+2) – 0.328. More precisely, it follows the Fibonacci-like recurrence M(h)=M(h−1)+M(h−2)+1M(h) = M(h-1) + M(h-2) + 1 (with base M(0)=1,M(1)=2M(0)=1, M(1)=2), leading to M(h)≈ϕhM(h) ≈ \phi^h (φ is the golden ratio ≈1.618). Thus, h≈1.44log2nh ≈ 1.44 \log₂ n (5.1.1 Maximum Height of an AVL Tree) (5.1.1 Maximum Height of an AVL Tree). For example, an AVL tree with 15 nodes has height at most 5 (since 1.44log215≈1.44∗3.91=5.631.44 \log₂15 ≈ 1.44*3.91 = 5.63, so height ≤5).
- Rotations: AVL insertions and deletions may trigger rotations (single or double) to restore balance. Each insertion/deletion is O(log n) time including rebalancing. (It’s good to recall that rotations are O(1) operations, and at most O(h) rotations might be needed in AVL insert/delete, but in AVL specifically, insertion does at most 1 or 2 rotations, deletion can do O(h) rebalancing in worst case, still O(log n) overall).
- Red-Black Tree: Another balanced BST variant with height at most 2·log₂(n+1). It relaxes balance a bit (allowing certain 2x height difference) but guarantees O(log n) operations. GATE sometimes asks conceptual comparisons (e.g., AVL vs Red-Black). Key point: AVL is more rigidly balanced (faster lookups, ~1.44 log₂n height) while Red-Black allows faster insert/delete (fewer rotations amortized) at the cost of slightly taller trees (height ≤ 2 log₂n). Both are O(log n) search.
- Cost of Operations: Searching an AVL or Red-Black tree is O(log n) worst-case (AVL tree – Wikipedia). Insertion/deletion are O(log n) as well (AVL tree – Wikipedia). For example, a question might state “cost of searching an AVL tree is O(log n) but that of a binary search tree is O(n)”. This statement is misleading: a normal BST can also be O(log n) on average. The true distinction is worst-case: AVL (balanced) worst-case O(log n) vs unbalanced BST worst-case O(n) (volume2_removed (8).pdf).
- Augmentations: Balanced BSTs can be augmented for order statistics, interval trees, etc., and still maintain O(log n) operations.
- When to use: Use balanced BSTs when you need ordered data with guaranteed log-time operations (for instance, implementing maps, sets, or maintaining a sorted list of elements dynamically). In contests or interviews, if asked for a data structure supporting fast insertion, deletion, search, and min/max, a balanced BST is a valid answer (as is a heap for min/max, but heap doesn’t support search or ordered iteration efficiently).
Heaps and Priority Queues
- Priority Queue ADT: A priority queue stores elements with priorities and supports at least two operations: insert (with a priority) and extract-extreme (remove and return the element with highest priority for max-heap or lowest for min-heap). Also often supports peek (look at highest priority) and possibly decrease-key or merge in advanced variants.
- Binary Heap Implementation: A binary heap is a complete binary tree structure (usually array-backed) that satisfies the heap property: for a max-heap, every parent is ≥ its children (for min-heap, parent ≤ children). The largest (or smallest) element is at the root.
- Represented as an array
A[1…n]
(1-indexed for convenience, or 0-indexed similarly). For node at index i: left child at2i
, right child at2i+1
(for 1-indexing) (volume2_removed (8).pdf); parent at⌊i/2⌋
. (For 0-indexed, left = 2_i+1, right = 2_i+2, parent = ⌊(i-1)/2⌋.) - Insertion: Add element at the end (next array slot), then “heapify-up” (bubble up) by swapping with parent while heap property is violated. Cost O(log n) in worst case (swap up through height of heap).
- Extract-Max/Min: Remove root (extreme element), replace it with last element, then “heapify-down” by swapping it with larger child (for max-heap) or smaller child (for min-heap) until order is restored. Cost O(log n).
- Peek: The max (or min) is at root (index 1 or 0), so it’s accessible in O(1).
- Represented as an array
- Build Heap: Transforming an unsorted array of n elements into a heap can be done in O(n) time (Floyd’s build-heap algorithm). The algorithm heapifies subtrees from the bottom up: call heapify on indices n/2 down to 1. The work per node is proportional to its height. Summing the work gives: ∑h=0⌊logn⌋n2hO(h)=O(n)\sum_{h=0}^{\lfloor \log n\rfloor} \frac{n}{2^h} O(h) = O(n) (Binary heap – Wikipedia) (Binary heap – Wikipedia). Important: This linear time construction is faster than inserting n elements one-by-one (which would be O(n log n)).
- Heap Sort: Heaps provide an in-place sorting algorithm: build a max-heap in O(n), then repeatedly extract the max in O(log n) (and place it at end of the array). Total time O(n log n). Notably, heap-sort’s O(n log n) has a larger constant than quicksort’s average case, but it’s worst-case O(n log n) and in-place.
- Complexity Summary: For n elements in a heap, find-max or find-min is O(1); insert is O(log n); extract-max/min is O(log n); increase-key or decrease-key (adjusting priority) is O(log n) if you bubble up or down. Merging two heaps (not trivial for binary heaps, but possible in O(n); there are other heap types like binomial/Fibonacci heaps that merge faster).
- Common Questions:
- Given an array, is it a valid max-heap? (Check if each parent ≥ children). Example: “Which of the following arrays represents a binary max-heap?” – one needs to verify the heap property for all indices (volume2_removed (8).pdf).
- Heap height: A heap with n nodes has height ⌊log₂ n⌋. For instance, 16-node heap height = 4 (since 2⁴=16 is a perfect heap). This is relevant in complexity reasoning and sometimes asked implicitly.
- Min in Max-Heap: In a max-heap, the minimum element could be anywhere in the last level. To find the min, you might have to scan ~n/2 leaves. Thus finding the min in a max-heap is O(n) in worst case (volume2_removed (8).pdf). Dually, find-max in a min-heap is O(n). (One question explicitly asked: “In a binary max heap of n numbers, the smallest element can be found in O(?) time.” – answer: O(n) (volume2_removed (8).pdf).)
- Second largest: The second largest element in a max-heap is one of the root’s children (either left or right child of root) – you can find it in O(1) by comparing those two (volume2_removed (8).pdf). In general, the k-th largest is not so trivial, but the second largest is easy. For a min-heap, the second smallest is among the root’s children.
- Priority Queue Implementation: Usually by binary heap as above. Some GATE questions combined a priority queue with another structure, e.g., implementing a stack using a priority queue by time-stamping pushes (volume2_removed (8).pdf). Also, one question gave a scenario of inserting characters into a priority queue with certain keys to simulate stack behavior (volume2_removed (8).pdf). The takeaway is understanding how altering the priority can simulate different orders.
- Heap vs BST: A common confusion: a heap is not a BST. The heap property is weaker (it doesn’t guarantee order between siblings or across subtrees), so an inorder traversal of a heap is not sorted. Heaps are designed for quick access to extremal elements, not for general searching or sorted traversal. For example, you cannot binary-search a heap array because it’s not globally sorted.
- Uses: Priority scheduling, Dijkstra’s shortest path (priority queue of vertices by distance), Huffman coding tree construction, any scenario requiring repeated removal of min/max.
Hashing
- Hash Table Basics: A hash table stores key-value pairs and uses a hash function h(key) to compute an index (bucket) for each key. Ideally, h(key) is uniform and fast. Hashing provides average-case O(1) time for insert, search, and delete.
- Collision Handling: Since different keys can hash to the same index, collision resolution strategies are needed:
- Separate Chaining: Each bucket holds a linked list (or similar structure) of entries that hash to that index. Insertions happen at the bucket list (O(1)), searches traverse the list (O(k) where k is entries in that bucket). With a good hash and load factor α = n/m (n entries, m buckets), the average list length is α, so average search/insert ~ O(1 + α).
- Open Addressing: All entries reside in the array itself (no extra chains). On collision, probe for another slot by a probe sequence. Types of probing: Linear Probing: use sequence
h(key), h(key)+1, h(key)+2, ... (mod m)
until an empty slot is found (Solved Insert the keys <13, 19, 35, 71, 31, 6, 23, 4> into | Chegg.com). Quadratic Probing: useh(key) + c1*i + c2*i^2 (mod m)
sequence. Double Hashing: use a second hash if collision:h1(key) + i * h2(key) (mod m)
. In linear probing, the probe function is h(k,i)=(h′(k)+i) mh(k,i) = (h'(k) + i) \bmod m (Solved Insert the keys <13, 19, 35, 71, 31, 6, 23, 4> into | Chegg.com), i.e., on each collision, try the next slot. For example, if h(k) = k mod 11 and we insert keys 44, 55, etc., we do 44 mod 11 = 0, occupy index 0; then 55 mod 11 = 0 collision, try 1, 2, … until empty slot (Solved Insert the keys <13, 19, 35, 71, 31, 6, 23, 4> into | Chegg.com). - Clustering: Linear probing can cause primary clustering (long runs of filled slots) which can degrade performance as α approaches 1. Quadratic and double-hashing aim to reduce clustering.
- Load Factor & Performance: Hash table operations are O(1) average if load factor (α = n/m) is kept reasonable (say ≤ 0.7). As α grows, collisions increase. In open addressing, once α > 0.5, performance drops and α must always be < 1 by design. Many implementations resize the table (rehash) when α exceeds a threshold (rehashing is costly O(n) but happens infrequently amortized).
- Hash Function: Should distribute keys uniformly. Common methods: division method (k mod m), multiplication method, universal hashing, etc.
m
(table size) often taken prime to reduce clustering for certain patterns. GATE might present a sequence of insertions into a hash table and ask for the final state or a specific index. Example: “Hash table of size 11, linear probing, hash function h(k) = k mod 11. Insert keys 10, 22, 31, 44, … What is index of last inserted record?” You simulate linear probe: if a collision occurs, go to next index (volume2_removed (8).pdf) (volume2_removed (8).pdf). - Example Calculation: If we insert keys 18, 41, 22 into a table of size 11 with linear probing and h(key)=key mod 11:
- 18 mod 11 = 7 → place at index 7.
- 41 mod 11 = 8 → place at index 8.
- 22 mod 11 = 0 → place at index 0 (no collision so far).
Searching for a key follows the same probe sequence until found or an empty slot is hit (if an empty slot is hit during search in open addressing, the key is not in table).
- Deletion in Open Addressing: Instead of removing outright (which can break probe chains), one often marks a slot as “deleted” (a tombstone) to allow searches to continue correctly, but signals that spot can be reused for insertion.
- Double Hashing: h(k,i) = (h1(k) + i * h2(k)) mod m. This avoids primary clustering and, if h2 is relatively prime to m, will probe all slots if needed (guaranteeing finding an empty slot if table not full).
- Common Pitfalls: If a hash table is too full, performance degrades. If poorly implemented (e.g., bad hash function causing many collisions or not handling deletion properly), the expected O(1) can become O(n). But under uniform hashing assumption, expect constant time on average.
- Applications: Hash tables are ubiquitous for lookups (symbol tables in compilers, dictionaries in languages like Python, caches). They can also be used to check existence of an element in constant time (e.g., for two-sum problem or detecting duplicates).
- Example GATE Question: “A sequence of keys is inserted into an empty hash table of size 11 using open addressing with linear probing… Which index does the last key land in?” – You’d simulate each insertion modulo 11 and linearly probe (volume2_removed (8).pdf). Another: “Which data structure for storing integers supports search, insert, delete in O(1) average?” – answer: hash table (with chaining or open addressing) is a strong candidate, assuming no pathological hash behavior.
Graphs
- Basics: A graph G = (V, E) consists of a set of vertices V and edges E (connections between vertices). Graphs can be directed or undirected. Key terms: adjacent vertices, degree (number of edges incident on a vertex), path, cycle, connected (an undirected graph is connected if there is a path between every pair of vertices).
- Representation:
- Adjacency Matrix: |V|×|V| matrix M, where M[i][j] = 1 (or weight) if there is an edge from i to j, else 0. Uses O(V²) space. Efficient to check if a given edge exists (O(1)), but iterating over neighbors of a vertex is O(V). Good for dense graphs.
- Adjacency List: For each vertex, store list of adjacent vertices. Uses O(V + E) space. Efficient to iterate neighbors (proportional to degree). Checking if a specific edge exists is O(deg(v)). Good for sparse graphs.
- Traversal Algorithms:
- Breadth-First Search (BFS): Explores graph level by level from a source vertex. Uses a queue. BFS runs in O(V + E) time (Why is the complexity of both BFS and DFS O(V+E)? | GeeksforGeeks). It finds the shortest path in an unweighted graph (in terms of number of edges). For example, BFS can determine connected components: run BFS/DFS from an unvisited vertex, mark all reachable vertices (that’s one component), then repeat for another unvisited vertex. The total cost to find all components is O(V+E).
- Depth-First Search (DFS): Explores as far as possible along each branch before backtracking. Uses a stack (or recursion). DFS also runs in O(V + E) time (Why is the complexity of both BFS and DFS O(V+E)? | GeeksforGeeks) (Why is the complexity of both BFS and DFS O(V+E)? | GeeksforGeeks). DFS is useful for classifying edges (tree, back edges, etc.), finding cycles, topological sorting, and generating DFS spanning trees.
- Connected Components: In an undirected graph, a connected component is a maximal set of vertices with paths between every pair. BFS or DFS can label components in linear time. In a directed graph, strongly connected components (SCCs) are groups where each vertex is reachable from any other in the same group. Kosaraju’s or Tarjan’s algorithm can find all SCCs in O(V + E). Notably, a directed graph and its transpose (edges reversed) have the same set of strongly connected components (strongcomponents.dvi). Kosaraju’s algorithm exploits this by doing a DFS on G, then DFS on G^T in the order of decreasing finish times.
- Articulation Points & Bridges: An articulation point (or cut vertex) in a connected graph is a vertex whose removal (with edges) increases the number of connected components ([Solved] An articulation point in a connected graph is a vertex such ). A bridge is an edge whose removal disconnects the graph. DFS can be used to find articulation points and bridges in O(V+E) using discovery times and low-link values. For example, a GATE question defined an articulation point exactly as above and asked about properties of articulation points in a DFS tree ([Solved] An articulation point in a connected graph is a vertex such ) ([Solved] An articulation point in a connected graph is a vertex such ). Key points: the root of a DFS tree is an articulation point if it has two or more children in the DFS tree; any other vertex u is articulation if it has a child v such that no back-edge from v or its descendants connects to u or ancestors of u.
- Graph Algorithms:
- Shortest Paths: In weighted graphs with non-negative weights, Dijkstra’s algorithm finds shortest distances in O((V+E) log V) using a min-priority queue. With arbitrary weights (no negative cycles), Bellman-Ford works in O(VE). For unweighted graphs, BFS gives shortest path (in edge count).
- Minimum Spanning Tree (MST): For connecting all vertices with minimum total weight in a weighted undirected graph. Kruskal’s algorithm (sort edges, union-find) and Prim’s algorithm (greedy, using min-heap for edges) run in O(E log V). A complete graph with V vertices has MST weight predictable by certain formulas if edge weights follow patterns, but in general one applies algorithms. (If a question gives an adjacency matrix of weights for a complete graph, they might ask for MST weight or structure by applying Prim/Kruskal conceptually.)
- Topological Sort: For a DAG (directed acyclic graph), a linear ordering of vertices respecting all edges (u→v implies u comes before v). Found via DFS (using finish times) or Kahn’s algorithm (repeatedly remove in-degree 0 vertices). If a directed graph has a cycle, no topological order exists.
- Example Graph Problems:
- Counting components: “Given a graph with vertices 1..n and an edge between i and j if |i-j| is prime, how many connected components?” – here you would interpret the rule and perhaps find a pattern (graph theory meets number theory).
- Identifying special subgraphs: bipartite graphs (can be checked by 2-coloring via BFS/DFS), complete graphs (K_n has n(n-1)/2 edges), trees (an undirected graph with V vertices and V-1 edges that is connected is a tree).
- Degree properties: Handshaking lemma – in any undirected graph, sum of all vertex degrees = 2|E|. In directed, sum of in-degrees = sum of out-degrees = |E|. Sometimes used in proofs or option elimination.
- Notable Concepts:
- Eulerian path/circuit: Path that uses every edge exactly once. Existence criteria: for undirected, 0 or 2 vertices of odd degree (Eulerian circuit if 0 odd-degree, Eulerian path if 2 odd-degree).
- Hamiltonian path/cycle: Path that visits every vertex exactly once (NP-complete to decide generally, unlikely in quick review unless a specific simple case).
- Planarity, Graph Coloring: Possibly out of scope unless specifically mentioned. Usually GATE focuses on traversal, connectivity, and basic properties rather than heavy theory unless specified.
- Key Complexity: Remember BFS/DFS are O(V+E) (Why is the complexity of both BFS and DFS O(V+E)? | GeeksforGeeks) – linear in the size of the graph. This is important because many graph problems reduce to or use BFS/DFS as a subroutine. Also note that if using an adjacency matrix, BFS/DFS would be O(V²), but typically graphs are given in adjacency list format for complexity.
Sources: Key facts and formulas are verified from standard references: CLRS, GeeksforGeeks, and previous GATE questions. For example, the AVL height derivation (5.1.1 Maximum Height of an AVL Tree) (5.1.1 Maximum Height of an AVL Tree), the Catalan number formula for BST count (Count Total Number of Unique Binary Search Trees with n Keys), the lower triangular matrix indexing formula (Convert given lower triangular Matrix to 1D array | GeeksforGeeks), the leaves vs internal nodes relation (In binary tree, number of nodes with two children when number of leaves is given – Mathematics Stack Exchange), and the O(V+E) complexity of BFS/DFS (Why is the complexity of both BFS and DFS O(V+E)? | GeeksforGeeks) are all well-established results in data structure theory.
Be First to Comment