A comprehensive master-list for Digital Design, Computer Architecture, and Firmware Engineering. Master the skills needed for Google, Apple, and top-tier hardware roles.
Linear Algebra and Digital Fundamentals.
Before touching a circuit, master the math that powers ML and Signal Processing.
Caltech Masterclasses vs MIT.
Crucial Advice: Start with Prof. Ali Hajimiri. His intuition involves solid-state physics essential for 1nm nodes.
From Registers to Real-Time Systems.
Miro Samek is mandatory for understanding how software interacts with hardware registers. Phil's Lab bridges code with PCB design.
Memory Management, Drivers, and Algorithms.
The Cherno is legendary for memory/pointers—Taiwanese professors will grill you on this. Kunal Kushwaha is your best bet for LeetCode "approach".
Books, Repos, and Notebooks.
Essential reading and code repositories to support your journey.
Professor Ali Hajimiri's legendary courses from Caltech
Master circuit theorems, source transformation, substitution theorem, maximum power transfer, and Y-Delta transformations. Essential foundation for analog design.
Deep dive into solid-state physics, energy bands, electrons and holes. The latest 2019 edition covering modern analog design principles and semiconductor physics.
Covers converters, inverters, and power systems analysis and design. Essential for understanding modern power conversion and energy-efficient circuit design.
Deep dive into diodes, BJTs, MOSFETs, and op-amp circuits. Comprehensive foundation for understanding analog circuit design and semiconductor devices.
Focuses on microcontrollers, peripherals, and real-time system concepts. Master the hardware-software interface for embedded applications.
Kernel modules, character drivers, and hardware interfacing in Linux. Essential for firmware engineers working on Linux-based embedded systems.
High-Performance Computing with GPUs and parallel programming models. Master GPU acceleration and parallel algorithms for compute-intensive applications.
Full course for EE/CS students learning register-level programming. Deep understanding of ARM Cortex M processor architecture and bare-metal firmware development.
Why these courses? This comprehensive collection covers the entire hardware engineering stack - from fundamental analog circuits and power electronics through embedded systems and kernel development, to high-performance GPU computing. These are industry-standard resources used by engineers at top tech companies like Google, Apple, and NVIDIA.
Use two 2:1 MUXes for the first stage (controlled by S0) and one 2:1 MUX for the second stage (controlled by S1).
I0 ──┐
├─── MUX1 ──┐
I1 ──┘ │
S0 ├─── MUX3 ─── Output
I2 ──┐ │ S1
├─── MUX2 ──┘
I3 ──┘
S0
// Combinational Logic
always @(*) begin
y = a & b; // Blocking
end
// Sequential Logic
always @(posedge clk) begin
q <= d; // Non-blocking
end
A state where a flip-flop output hovers between logic 0 and 1, unable to settle to a valid logic level within the required time.
Statistical measure of how often metastability failures occur. Formula:
Where: t_r = resolution time, τ = time constant, f_c = clock frequency, f_d = data frequency
Compare Read/Write pointers in Gray code to avoid metastability issues when crossing clock domains.
// Gray Code Conversion
gray = binary ^ (binary >> 1);
// Full Logic
full = (wptr_gray[N:N-1] != rptr_gray_sync[N:N-1]) &&
(wptr_gray[N-2:0] == rptr_gray_sync[N-2:0]);
// Empty Logic
empty = (wptr_gray_sync == rptr_gray);
Setup Time: Hold Time:
___ ___
CLK |___| CLK |___
↑ ↑
[Setup] [Hold]
Data must be Data must be
stable BEFORE stable AFTER
Detect sequence 1011. Contrast Overlapping vs Non-overlapping detection.
State | In=0 | In=1 | Output
-------|--------|--------|-------
S0 (R) | S0 | S1 | 0
S1 (1) | S2 | S1 | 0
S2 (10)| S0 | S3 | 0
S3(101)| S2 | S4 | 0
S4(11) | S2 | S1 | 1 <-- Detection!
Enable ──┐ ____
├──>| \
CLK ─────┤ |LAT |──┐ ____
│ |____/ └──| \
| | AND |─── Gated_CLK
CLK ─────────────────────|____/
To prevent glitches on the output clock. The latch ensures the Enable signal is stable (Low or High) during the entire active phase of the clock. Without it, if Enable acts while CLK is high, you'd chop the clock pulse (glitch).
At nodes < 20nm, Planar MOSFETs suffer from excessive Short Channel Effects (SCE).
Drain-Induced Barrier Lowering: As Drain voltage ($V_{ds}$) increases, the depletion region of the drain expands and lowers the potential barrier at the source.
Result: Channel turns on at a lower $V_{gs}$ (Threshold voltage $V_{th}$ decreases), causing leakage.
Curve plotting $V_{out}$ vs $V_{in}$.
If you increase $(W/L)_{PMOS}$: The PMOS becomes stronger. It pulls the output HIGH more
easily.
Result: The switching threshold $V_M$ shifts to the
RIGHT (towards VDD).
Fetch instruction from memory using PC (Program Counter)
Decode instruction and read register operands
Perform ALU operations or calculate memory address
Read from or write to data memory (for load/store)
Write result back to register file
Dirty data, exclusive to this cache. Must write back to memory before eviction.
Clean data, exclusive to this cache. Matches memory, no other cache has it.
Clean data, shared by multiple caches. Read-only effectively.
Data is not usable. Must fetch from memory or another cache.
CPU programs DMA controller with:
DMA controller requests and gains control of system bus
DMA moves data directly between peripheral and memory (bypassing CPU)
DMA releases bus and interrupts CPU to signal completion
64KB Cache, 64-byte line size, 4-way Set Associative, 32-bit Address. Find Offset, Index, Tag bits.
Producer: 200MHz, 80% duty cycle. Consumer: 150MHz continuous. Burst: 1000 items.
$$ \text{Depth} = \text{Burst} \times (1 - \frac{\text{Consumer Rate}}{\text{Producer Rate}}) $$
For sensitive analog/RF circuits (PLLs, ADCs) that need clean voltage (low noise), or when $V_{in}$ is very close to $V_{out}$ (high efficiency region).
For high-power digital cores (CPU/GPU) where efficiency (>90%) is critical and noise is manageable.
Recursively find all .v files and replace "module old_name" with
"module new_name".
import os
root_dir = "./design"
for dirpath, _, filenames in os.walk(root_dir):
for fname in filenames:
if fname.endswith(".v"):
path = os.path.join(dirpath, fname)
with open(path, "r") as f:
content = f.read()
if "module old_name" in content:
new_content = content.replace("module old_name", "module new_name")
with open(path, "w") as f:
f.write(new_content)
print(f"Updated {path}")
Regression output results.csv has columns: TestName, Status
(PASS/FAIL), Runtime.
import pandas as pd
df = pd.read_csv("results.csv")
# 1. Counts
counts = df['Status'].value_counts()
print(counts)
# 2. Max Runtime
slowest_test = df.loc[df['Runtime'].idxmax()]
print(f"Slowest: {slowest_test['TestName']} ({slowest_test['Runtime']}s)")
import csv
max_runtime = 0
slowest = ""
counts = {"PASS": 0, "FAIL": 0}
with open("results.csv", "r") as f:
reader = csv.DictReader(f)
for row in reader:
counts[row['Status']] += 1
curr_time = float(row['Runtime'])
if curr_time > max_runtime:
max_runtime = curr_time
slowest = row['TestName']
Read a Verilog module header and print an instantiation template.
module my_design (
input clk,
input rst_n,
input [31:0] data_in,
output reg done
);
import re
text = """...verilog code string..."""
# Regex to capture direction, width (optional), and name
# Groups: 1=dir, 2=width, 3=name
pattern = r"\s*(input|output)\s+(?:reg\s+)?(?:(\[[^\]]+\])\s+)?(\w+)"
matches = re.findall(pattern, text)
print("my_design u_dut (")
for i, (direction, width, name) in enumerate(matches):
comma = "," if i < len(matches)-1 else ""
print(f" .{name}({name}){comma}")
print(");")
with open("log.txt", "r") as f:
for line in f:
if "ERROR" in line:
print(line.strip())
import re
import mmap
# Memory-mapped file for even faster access
with open("log.txt", "r+b") as f:
with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mmapped:
for line in iter(mmapped.readline, b""):
if b"ERROR" in line:
print(line.decode().strip())
# Or use grep-like approach with regex
pattern = re.compile(rb'.*ERROR.*')
with open("log.txt", "rb") as f:
for match in pattern.finditer(f.read()):
print(match.group().decode())
grep or awk which are optimized for text processing.
import re
text = "Address: 0x1A2B3C4D, Data: 0xDEADBEEF"
addresses = re.findall(r'0x[0-9A-Fa-f]+', text)
print(addresses) # ['0x1A2B3C4D', '0xDEADBEEF']
# Match specific bit widths
addr_32bit = re.findall(r'0x[0-9A-Fa-f]{8}', text) # Exactly 32-bit
addr_64bit = re.findall(r'0x[0-9A-Fa-f]{16}', text) # Exactly 64-bit
# Match with optional underscores (Verilog style)
verilog_hex = re.findall(r"0x[0-9A-Fa-f_]+", text)
# Capture hex with optional bit width prefix
full_pattern = re.findall(r"(\d+'h)?0x([0-9A-Fa-f]+)", text)
# Match SystemVerilog hex literals
sv_hex = re.findall(r"\d+'h[0-9A-Fa-f_]+", text)
0x matches literal "0x",
[0-9A-Fa-f] matches any hex digit, + means one or
more.
bool hasCycle(ListNode *head) {
if (!head) return false;
ListNode *slow = head;
ListNode *fast = head;
while (fast && fast->next) {
slow = slow->next; // Move 1 step
fast = fast->next->next; // Move 2 steps
if (slow == fast) {
return true; // Cycle detected
}
}
return false;
}
Use a sliding window [left, right] and a hash map to store the last
seen index of characters.
int lengthOfLongestSubstring(string s) {
vector map(128, -1);
int left = 0, maxLen = 0;
for (int right = 0; right < s.length(); right++) {
char c = s[right];
if (map[c] >= left) {
left = map[c] + 1; // Shrink window
}
map[c] = right; // Update char index
maxLen = max(maxLen, right - left + 1);
}
return maxLen;
}
Given range [m, n], find the bitwise AND of all numbers in this range.
The result is the common prefix of m and n. All bits to the right of the common prefix will eventually flip to 0.
int rangeBitwiseAnd(int m, int n) {
int shift = 0;
while (m < n) {
m >>= 1;
n >>= 1;
shift++;
}
return m << shift;
}
// Precomputed reverse of 0-255
static const unsigned char table[] = { 0x00, 0x80, ... };
unsigned char reverse(unsigned char n) {
return table[n];
}
unsigned char reverse(unsigned char n) {
n = (n & 0xF0) >> 4 | (n & 0x0F) << 4; // Swap nibbles
n = (n & 0xCC) >> 2 | (n & 0x33) << 2; // Swap pairs
n = (n & 0xAA) >> 1 | (n & 0x55) << 1; // Swap bits
return n;
}
Allocate memory such that the address is a multiple of `alignment` (power of 2).
void* aligned_malloc(size_t size, size_t alignment) {
// 1. Allocate extra bytes: size + alignment + metadata size
void* p1 = malloc(size + alignment + sizeof(void*));
if (!p1) return NULL;
// 2. Calculate aligned address
size_t addr = (size_t)p1 + alignment + sizeof(void*);
void* p2 = (void*)(addr - (addr % alignment));
// 3. Store original pointer before p2 for free()
((void**)p2)[-1] = p1;
return p2;
}
void aligned_free(void* p2) {
void* p1 = ((void**)p2)[-1];
free(p1);
}
int countSetBits(int n) {
int count = 0;
while(n) {
n &= (n - 1); // Clear rightmost set bit
count++;
}
return count;
}
Example: n = 12 (binary: 1100)
Iteration 1: n = 1100, n-1 = 1011
n & (n-1) = 1000, count = 1
Iteration 2: n = 1000, n-1 = 0111
n & (n-1) = 0000, count = 2
Result: 2 set bits
O(k) where k = number of set bits (not O(log n)!)
// Lookup table method (fastest for repeated calls)
int popcount_lookup[256]; // Pre-computed
// Built-in compiler intrinsic
int count = __builtin_popcount(n); // GCC/Clang
// SWAR (SIMD Within A Register) - parallel counting
int count = n;
count = (count & 0x55555555) + ((count >> 1) & 0x55555555);
count = (count & 0x33333333) + ((count >> 2) & 0x33333333);
count = (count & 0x0F0F0F0F) + ((count >> 4) & 0x0F0F0F0F);
count = (count & 0x00FF00FF) + ((count >> 8) & 0x00FF00FF);
count = (count & 0x0000FFFF) + ((count >> 16) & 0x0000FFFF);
bool isPowerOfTwo(int n) {
return (n > 0) && ((n & (n - 1)) == 0);
}
Powers of 2 have exactly one bit set:
1 = 0001
2 = 0010
4 = 0100
8 = 1000
16 = 10000
For power of 2: n = 1000
n-1 = 0111
n & (n-1) = 0000
For non-power: n = 1010
n-1 = 1001
n & (n-1) = 1000 (non-zero!)
// Get next power of 2
int nextPowerOf2(int n) {
n--;
n |= n >> 1;
n |= n >> 2;
n |= n >> 4;
n |= n >> 8;
n |= n >> 16;
return n + 1;
}
// Check if power of 4
bool isPowerOfFour(int n) {
return (n > 0) && ((n & (n-1)) == 0) && ((n & 0xAAAAAAAA) == 0);
}
// Isolate rightmost set bit
int isolateRightmost(int n) {
return n & (-n);
}
Prevents the compiler from optimizing out repeated reads to a variable that might be changed by hardware outside the program flow.
// BAD: Compiler may optimize this to infinite loop
int *status_reg = (int*)0x40000000;
while (*status_reg == 0) {
// Compiler thinks: "status_reg never changes in this loop"
// May optimize to: if (*status_reg == 0) while(1);
}
// GOOD: Compiler always reads from memory
volatile int *status_reg = (volatile int*)0x40000000;
while (*status_reg == 0) {
// Compiler generates actual memory read each iteration
}
// Real-world example: UART status register
#define UART_STATUS (*(volatile uint32_t*)0x40001000)
#define TX_READY (1 << 5)
void uart_send(char c) {
while (!(UART_STATUS & TX_READY)) {
// Wait for transmitter ready
}
UART_DATA = c;
}
❌ volatile does NOT provide atomicity
❌ volatile does NOT provide memory barriers
❌ volatile is NOT a substitute for mutexes
✅ volatile only prevents compiler optimization
// For ISR communication
volatile bool data_ready = false;
void ISR_handler() {
data_ready = true; // Set by interrupt
}
void main_loop() {
while (!data_ready) { // Must be volatile!
// Wait
}
process_data();
}
// For thread safety, use atomics instead
#include
std::atomic thread_safe_flag(false);
Essential resources for cracking placement interviews at top tech companies
Comprehensive video lectures covering all quantitative aptitude topics including number systems, percentages, ratios, time & work, and logical reasoning for technical placements.
The definitive guide for quantitative aptitude preparation. Comprehensive coverage of all topics with practice questions and solutions for placement exams.
Curated collection of frequently asked interview questions for hardware engineering roles. Practice behavioral, technical and HR rounds effectively.
Placement Season : Start your aptitude preparation early! Most companies test quantitative aptitude and logical reasoning heavily in their first screening rounds. Combine these resources with your technical preparation for maximum success.