Top 20 Python Interview Questions and Answers

Python interviews don’t just test syntax. They test how you think.

You might expect a Python interview to be a straight run-through of language features. But most of the time, interviewers don’t care about how many methods you’ve memorized. They care about how you solve problems with Python as your tool.

The hardest questions are sometimes the simplest ones. You’ll be asked to use lists, dictionaries, or loops, but what they’re really checking is how clear your logic is, how clean your code looks, and how well you use Python’s strengths.

In this guide, you’ll:

Practice with 20 Python interview questions and answers
Review code examples and learn from common mistakes
Explore core syntax and advanced topics like decorators, memory management, and Pandas
Strengthen your knowledge with flashcards and self-testing exercises
Follow a Python roadmap to continue learning and improve your Python skills

Now that you know what this guide covers, it’s time to get started. Let’s go through how to prepare for your Python interview step by step.

Preparing for your Python interview

To succeed in a Python interview, keep the following tips in mind:

Lock down the fundamentals: Get solid with data types, functions, list comprehensions, and OOP. Focus on tricky areas like scope, mutability, and memory management.
Prep based on your career path: If you’re interviewing for a data role, focus on NumPy, Pandas, and data visualization. For web development, know Flask, Django, and REST APIs. For backend or systems work, study concurrency, multithreading, and performance tuning.
Practice with purpose: Platforms like LeetCode or HackerRank are great for algorithms, but also build small projects that prove you can write clean, production-ready code.
Show your reasoning, not just your code: Interviewers want to see how you think. Walk through your logic, check edge cases, and explain trade-offs instead of aiming for a “perfect” solution.
Know the company’s stack: Find out if they use Django or FastAPI, Pandas or Spark, AWS or GCP. Tailor your prep so your answers and questions resonate.
Use this guide actively: Don’t just skim this guide, type the code, experiment with variations, and use the flashcards to test yourself.

Test yourself with Flashcards

You can either use these flashcards or jump to the questions list section below to see them in a list format.

0 / 20

Knew0 ItemsLearnt0 ItemsSkipped0 Items

Python basics and core concepts

What is Python, and why is it called a high-level interpreted programming language?

Python is considered a high-level programming language because it handles low-level details, such as memory management and hardware interactions, for you. It allows you to write code that is clear, readable, and closer to human language than machine instructions.

It’s interpreted because the Python interpreter executes code line by line at runtime, instead of compiling everything into machine code first.

Common pitfalls and tips:

Assuming "interpreted" means Python can't be compiled. Tools like PyInstaller or Cython can package Python code.
Over-explaining technical details. This is often a warm-up question to test communication skills, so keep your answer crisp and clear.

Questions List

If you prefer to see the questions in a list format, you can find them below.

Python basics and core concepts

What is Python, and why is it called a high-level interpreted programming language?

It’s interpreted because the Python interpreter executes code line by line at runtime, instead of compiling everything into machine code first.

Common pitfalls and tips:

Assuming "interpreted" means Python can't be compiled. Tools like PyInstaller or Cython can package Python code.
Over-explaining technical details. This is often a warm-up question to test communication skills, so keep your answer crisp and clear.

What are mutable vs. immutable data types in Python? Give examples.

In Python, mutable data types can be changed after they’re created. You can update, remove, and add to them. Examples include list, dict, and set.

Immutable objects cannot be altered once created; any change results in a new object. Examples include str, tuple, int, and float.

python

# Mutable examplesmy_list = [1, 2, 3]my_list.append(4) # Modifies the original listprint(my_list) # [1, 2, 3, 4] my_dict = {'a': 1}my_dict['b'] = 2 # Modifies the original dictionaryprint(my_dict) # {'a': 1, 'b': 2} # Immutable examplesmy_string = "hello"new_string = my_string.upper() # Creates a new stringprint(my_string) # "hello" (unchanged)print(new_string) # "HELLO" my_tuple = (1, 2, 3)# my_tuple[0] = 4 # This would raise a TypeError

Common pitfalls and tips:

Confusing rebinding a variable with mutating an object. For example, x = x + [4] creates a new list and rebinds x to it, while x.append(4) modifies the existing list in place.
Not understanding why immutability matters. Immutable objects can be dictionary keys and are hashable, while mutable objects cannot.

What is the difference between global scope, local scope, and nonlocal variables?

Python follows the LEGB rule for scope resolution: Local → Enclosing → Global → Built-in.

Local scope: Variables defined inside a function
Global scope: Variables defined at the module level
Nonlocal: Variables in an enclosing function's scope

python

global_var = "I'm global"def outer_function():    enclosing_var = "I'm in enclosing scope"        def inner_function():        local_var = "I'm local"                # Accessing different scopes        print(local_var)      # Local        print(enclosing_var)  # Enclosing        print(global_var)     # Global                # Modifying enclosing scope        nonlocal enclosing_var        enclosing_var = "Modified from inner"                # Modifying global scope        global global_var        global_var = "Modified globally"        inner_function()    print(enclosing_var)  # "Modified from inner"outer_function()print(global_var)  # "Modified globally"

Common pitfalls and tips:

Overusing global variables can reduce code readability and make debugging more challenging. Mention this as a best practice in interviews.
Forgetting to use global or nonlocal keywords when trying to modify variables from outer scopes — without them, you create new local variables instead.

What are Python's built-in data structures, and when would you use each?

Python has four main built-in data structures: lists, tuples, dictionaries, and sets. Each has different trade-offs depending on whether you need order, mutability, uniqueness, or fast lookups.

Lists: Ordered, mutable collections. Use when you need to maintain order and add/remove items.

python

shopping_list = ['eggs', 'milk', 'bread']shopping_list.append('cheese')

Tuples: Ordered, immutable collections. Use for fixed data that shouldn’t change.

python

coordinates = (10.5, 20.3)rgb_color = (255, 128, 0)

Dictionaries: Key-value pairs, mutable. Use for fast lookups, mappings, and checking if a key exists with O(1) time complexity."

python

person = {'name': 'Alice', 'age': 30, 'city': 'New York'}

Sets: Unordered collections of unique items, mutable. Use for membership testing and removing duplicates.

python

unique_numbers = {1, 2, 3, 4, 5}seen_users = set()

Common pitfalls and tips:

For uniqueness, prefer sets over lists — list membership tests are O(n), while set lookups are O(1).
Before Python 3.7, dictionaries didn’t guarantee insertion order.
Don’t use a list for lookups when a dictionary or set is more efficient.

What is the difference between a shallow copy and a deep copy?

The difference between a shallow copy and a deep copy is how they handle object references:

A shallow copy creates a new container, but nested objects are still shared references.
A deep copy makes a fully independent copy, including all nested objects.

python

import copy original = [[1, 2], [3, 4]] # Assignment - same objectassigned = original # Shallow copy - new object, shared nested objectsshallow = copy.copy(original)# or shallow = original.copy()# or shallow = list(original) # Deep copy - completely independentdeep = copy.deepcopy(original) # Modify the originaloriginal[0][0] = 99 print("Original:", original) # [[99, 2], [3, 4]]print("Assigned:", assigned) # [[99, 2], [3, 4]] - same objectprint("Shallow:", shallow) # [[99, 2], [3, 4]] - shared nested objectsprint("Deep:", deep) # [[1, 2], [3, 4]] - independent copy

Common pitfalls and tips:

Confusing assignment (=) with copying. Assignment creates a new reference to the same object, not a copy.
Using a shallow copy with nested mutable objects and not expecting shared references; this can lead to unexpected mutations.
Not understanding the performance cost. A deep copy can be expensive for large, deeply nested structures.

How is memory allocation and garbage collection handled in Python?

Python manages memory automatically in three ways:

Memory allocation: Python uses a private heap to store objects. The Python memory manager handles allocation and deallocation automatically.
Reference counting: Each object tracks how many references point to it. When the count reaches zero, the object is immediately deallocated.
Cycle detection: A cyclic garbage collector handles circular references that reference counting can't handle.

You generally don't need to manage memory manually, but understanding this helps with performance optimization and debugging memory leaks.

python

import gc # Creating objectsx = [1, 2, 3]y = x # Reference count for list is now 2 # Check if object is tracked by garbage collectorprint(gc.is_tracked(x)) # True # Delete referencedel x # Reference count decreases to 1 # Force garbage collectioncollected = gc.collect()print(f"Collected {collected} objects") # Check garbage collection statsprint(gc.get_stats())

Common pitfalls and tips:

Thinking del immediately frees memory. It only decreases reference count; the garbage collector decides when to actually free memory.
Not being aware of circular references can prevent automatic cleanup and require the cyclic garbage collector.
Don’t manually call gc.collect() expect when you’re memory-intensive applications (like machine learning). This helps you to free up memory from complex objects that hasn’t been freed up but are not needed.

How are arguments passed in Python: by value or by reference?

Python’s pass by object reference (a.k.a. “pass by assignment”): A function gets a reference to the object, but whether changes stick depends on mutability:

If the object is immutable (like int, str, tuple), reassignment inside the function won’t change the original.
If the object is mutable (like list, dict), in-place modifications will affect the original.

The key is understanding that you're passing the reference to the object, not the variable itself.

python

def modify_immutable(x):    x = x + 10  # Creates new object, doesn't affect original    return xdef modify_mutable(lst):    lst.append(4)  # Modifies the original object    return lstdef reassign_mutable(lst):    lst = [7, 8, 9]  # Creates new local reference, doesn't affect original    return lst# Immutable examplenum = 5result = modify_immutable(num)print(num)     # 5 (unchanged)print(result)  # 15# Mutable example - modificationmy_list = [1, 2, 3]modify_mutable(my_list)print(my_list)  # [1, 2, 3, 4] (changed)# Mutable example - reassignmentmy_list2 = [1, 2, 3]result = reassign_mutable(my_list2)print(my_list2)  # [1, 2, 3] (unchanged)print(result)    # [7, 8, 9]

Common pitfalls and tips:

Expecting immutable objects to change when passed to functions — they create new objects instead of modifying originals.
Being surprised when mutable objects are modified inside functions — the function receives a reference to the same object.
Confusing reassignment with modification — lst = [1, 2, 3] creates a new local reference, while lst.append(4) modifies the original object.

What is dictionary comprehension in Python?

Dictionary comprehension is a compact way to create dictionaries from an existing iterable or transform existing dictionaries. Instead of writing a loop, you define both the key and the value in one expression. It’s basically list comprehension, but it produces key-value pairs.

python

# Basic dictionary comprehensionsquares = {x: x**2 for x in range(5)}print(squares)  # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}# With conditioneven_squares = {x: x**2 for x in range(10) if x % 2 == 0}print(even_squares)  # {0: 0, 2: 4, 4: 16, 6: 36, 8: 64}# Transform existing dictionaryprices = {'apple': 0.5, 'banana': 0.3, 'orange': 0.6}price_increase = {item: price * 1.1 for item, price in prices.items()}# Swap keys and valuesreversed_dict = {v: k for k, v in prices.items()}

Common pitfalls and tips:

Creating dictionaries with duplicate keys (last value wins).
Overcomplicating comprehensions. If it's not readable, use a regular loop.

These questions test whether you understand Python beyond the basics. Hiring managers want to see if you can work with functions, classes, and Python's unique features like decorators and generators.

Intermediate Python interview questions

What is the difference between a normal function and a lambda function?

Normal functions are defined with def and can contain multiple statements, have a name, and include documentation. Lambda functions are anonymous functions that return the result of that expression.

Lambda functions are best for simple operations that can be expressed in one line, especially when you need a quick function for map(), filter(), or sort() operations. Normal functions are better when you need multiple statements, complex logic, or want to reuse the function multiple times.

python

# Lambda function - single expression onlysquare_lambda = lambda x: x ** 2print(square(5))        # 25print(square_lambda(5)) # 25# Where lambdas shine - with higher-order functionsnumbers = [1, 2, 3, 4, 5]# Using lambda with mapsquared = list(map(lambda x: x ** 2, numbers))print(squared)  # [1, 4, 9, 16, 25]# Using lambda with filterevens = list(filter(lambda x: x % 2 == 0, numbers))print(evens)  # [2, 4]# Using lambda with sortedstudents = [    {'name': 'Alice', 'grade': 85},    {'name': 'Bob', 'grade': 92},    {'name': 'Charlie', 'grade': 78}]sorted_students = sorted(students, key=lambda s: s['grade'])print(sorted_students[0]['name'])  # Charlie (lowest grade)# When NOT to use lambda# Bad - too complex for lambdaprocess = lambda x: x.strip().lower().replace(' ', '_') if x else ''# Good - use normal function for claritydef process_string(x):    if not x:        return ''    return x.strip().lower().replace(' ', '_')

Common pitfalls and tips:

Trying to include multiple statements in a lambda (they can only contain expressions).
Using a lambda when a normal function would be clearer.
Assigning lambdas to variables defeats their purpose of being anonymous.

How do you define a Python function with a variable number of arguments?

Python provides two special operators to handle variable numbers of arguments: *args for positional arguments and **kwargs for keyword arguments. This lets your functions accept flexible inputs without knowing exactly how many arguments will be passed.

python

# *args collects positional arguments into a tupledef sum_all(*args):    total = 0    for num in args:        total += num    return totalprint(sum_all(1, 2, 3))        # 6print(sum_all(1, 2, 3, 4, 5))  # 15# **kwargs collects keyword arguments into a dictionarydef print_info(**kwargs):    for key, value in kwargs.items():        print(f"{key}: {value}")print_info(name="Alice", age=30, city="NYC")# Output:# name: Alice# age: 30# city: NYC# Combining regular args, *args, and **kwargsdef flexible_function(required, *args, **kwargs):    print(f"Required: {required}")    print(f"Extra positional: {args}")    print(f"Keyword arguments: {kwargs}")flexible_function("hello", 1, 2, 3, name="Bob", age=25)# Output:# Required: hello# Extra positional: (1, 2, 3)# Keyword arguments: {'name': 'Bob', 'age': 25}# Unpacking arguments when calling functionsdef multiply(a, b, c):    return a * b * cnumbers = [2, 3, 4]print(multiply(*numbers))  # 24 - unpacks list into argumentsconfig = {'a': 2, 'b': 3, 'c': 4}print(multiply(**config))  # 24 - unpacks dict into keyword arguments

Common pitfalls and tips:

Order matters: regular args, then *args, then keyword args, then **kwargs.
*args creates a tuple, not a list.
Forgetting to unpack when passing lists/dicts to functions.

What is the yield keyword, and how does it work?

The yield keyword turns a function into a generator, which produces values one at a time instead of returning all values at once. When a function contains yield, it returns a generator object that can be iterated over, pausing and resuming execution at each yield point.

Generators are memory-efficient because they don't store all values in memory. They calculate and produce each value on demand, making them perfect for large datasets or infinite sequences.

python

# Regular function returns all values at oncedef get_squares_list(n):    result = []    for i in range(n):        result.append(i ** 2)    return result  # Returns entire list# Generator function yields values one at a timedef get_squares_generator(n):    for i in range(n):        yield i ** 2  # Pauses here, returns one value# Memory differenceimport syslist_squares = get_squares_list(1000)gen_squares = get_squares_generator(1000)print(sys.getsizeof(list_squares))  # ~8856 bytesprint(sys.getsizeof(gen_squares))   # ~112 bytes# How yield works - function state is preserveddef countdown(n):    print(f"Starting countdown from {n}")    while n > 0:        yield n  # Pause and return n        n -= 1    print("Countdown finished!")# Using the generatorcounter = countdown(3)print(next(counter))  # "Starting countdown from 3" then returns 3print(next(counter))  # Returns 2print(next(counter))  # Returns 1# next(counter)  # "Countdown finished!" then raises StopIteration

Common pitfalls and tips:

Generators are one-time use; once exhausted, you need to create a new one.
Using list() on a generator defeats its memory efficiency purpose.
Forgetting that generators are lazy and don't execute until iterated.

What are Python's magic (dunder) methods?

Magic methods (also called dunder methods for "double underscore") are special methods that Python calls automatically in certain situations. They let you define how your objects behave with built-in operations like addition, comparison, string representation, and iteration.

python

class Book:    def __init__(self, title, author, pages):        self.title = title        self.author = author        self.pages = pages        # String representation    def __str__(self):        return f"{self.title} by {self.author}"        def __repr__(self):        return f"Book('{self.title}', '{self.author}', {self.pages})"        # Comparison methods    def __eq__(self, other):        if not isinstance(other, Book):            return False        return self.title == other.title and self.author == other.author        def __lt__(self, other):        return self.pages < other.pages        # Arithmetic operations    def __add__(self, other):        return Book(            f"{self.title} & {other.title}",            f"{self.author} & {other.author}",            self.pages + other.pages        )        # Container behavior    def __len__(self):        return self.pages        def __contains__(self, keyword):        return keyword.lower() in self.title.lower()# Using magic methodsbook1 = Book("Python Tricks", "Dan Bader", 300)book2 = Book("Fluent Python", "Luciano Ramalho", 750)print(str(book1))         # Python Tricks by Dan Baderprint(repr(book1))        # Book('Python Tricks', 'Dan Bader', 300)print(book1 == book2)     # Falseprint(book1 < book2)      # True (300 < 750)print(len(book1))         # 300print("python" in book1)  # True

Common pitfalls and tips:

Implementing eq without hash breaks dictionary/set usage.
Not returning NotImplemented for unsupported operations.
Making repr output that can't recreate the object.

Advanced concepts

What is the Global Interpreter Lock (GIL), and why does it matter?

The Global Interpreter Lock is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecode simultaneously. Only one thread can execute Python code at a time, even on multi-core systems.

The GIL exists because Python's memory management isn't thread-safe. It prevents race conditions when reference counting and garbage collection occur. While the GIL simplifies Python's implementation, it means CPU-bound tasks don't benefit from threading.

python

import threadingimport time# CPU-bound task affected by GILdef cpu_bound_task(n):    result = 0    for i in range(n * 1000000):        result += i ** 2    return result# I/O-bound task not affected by GILdef io_bound_task():    time.sleep(1)  # Releases GIL during sleep    return "Done"# Threading with CPU-bound tasks (limited by GIL)start = time.time()threads = []for i in range(4):    t = threading.Thread(target=cpu_bound_task, args=(100,))    threads.append(t)    t.start()for t in threads:    t.join()print(f"Threading CPU-bound: {time.time() - start:.2f}s")# Multiprocessing bypasses GILfrom multiprocessing import Poolstart = time.time()with Pool(4) as pool:    pool.map(cpu_bound_task, [100] * 4)print(f"Multiprocessing CPU-bound: {time.time() - start:.2f}s")

Common pitfalls and tips:

Thinking threading never helps in Python. It's great for I/O-bound tasks since the GIL releases during I/O operations.
Using threading for CPU-intensive work when multiprocessing would be better.

How does memory allocation work in Python?

Python manages memory automatically using a private heap where all objects and data structures live. The Python Memory Manager controls this heap and decides when to allocate or free space. Underneath, the operating system provides the low-level memory.

Python also uses object pooling for certain immutable objects (like small integers and short strings) so they can be reused instead of recreated, which improves performance.

python

import sys# Small integers are cached (-5 to 256)a = 100b = 100print(a is b)  # True (same object in memory)# Large integers create new objectsc = 1000d = 1000print(c is d)  # False (different objects)# String interning for efficiencystr1 = "hello"str2 = "hello"print(str1 is str2)  # True (interned)# Check object size in memorymy_list = [1, 2, 3, 4, 5]print(f"List size: {sys.getsizeof(my_list)} bytes")# Memory growth examplesmall_list = []print(f"Empty list: {sys.getsizeof(small_list)} bytes")small_list.extend(range(100))print(f"100 items: {sys.getsizeof(small_list)} bytes")# Python pre-allocates more space than needed for efficiencysmall_list = [1]print(f"One item: {sys.getsizeof(small_list)} bytes")

Common pitfalls and tips:

Assuming Python immediately frees memory when you delete references. The garbage collector decides when to actually free memory.
Not understanding that Python over-allocates space for dynamic structures to avoid frequent reallocations.

What is multithreading and multiprocessing in Python?

Multithreading uses threads within a single process and shares memory, but is limited by the GIL for CPU-bound tasks. Multiprocessing uses separate processes with independent memory spaces, bypassing the GIL but with higher overhead for communication.

python

import threadingimport multiprocessingimport timefrom concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutordef cpu_intensive_task(n):    # CPU-bound task    result = sum(i * i for i in range(n))    return resultdef io_intensive_task(delay):    # I/O-bound task    time.sleep(delay)    return f"Task completed after {delay}s"# Threading example (good for I/O-bound tasks)def test_threading():    start = time.time()    with ThreadPoolExecutor(max_workers=4) as executor:        # I/O-bound tasks benefit from threading        futures = [executor.submit(io_intensive_task, 1) for _ in range(4)]        results = [f.result() for f in futures]    print(f"Threading I/O: {time.time() - start:.2f}s")        start = time.time()    with ThreadPoolExecutor(max_workers=4) as executor:        # CPU-bound tasks limited by GIL        futures = [executor.submit(cpu_intensive_task, 1000000) for _ in range(4)]        results = [f.result() for f in futures]    print(f"Threading CPU: {time.time() - start:.2f}s")# Multiprocessing example (good for CPU-bound tasks)def test_multiprocessing():    start = time.time()    with ProcessPoolExecutor(max_workers=4) as executor:        # CPU-bound tasks benefit from multiprocessing        futures = [executor.submit(cpu_intensive_task, 1000000) for _ in range(4)]        results = [f.result() for f in futures]    print(f"Multiprocessing CPU: {time.time() - start:.2f}s")

Common pitfalls and tips:

Using threading for CPU-bound tasks expecting parallelism.
Not understanding the overhead of creating processes compared to threads.
Forgetting that processes don't share memory by default.

How do you handle an existing object in memory efficiently?

Object handling involves understanding Python's memory model, using appropriate data structures, minimizing object creation, reusing objects when possible, and being aware of memory leaks from circular references.

python

import sysimport arrayfrom weakref import WeakValueDictionary# Object pooling for reuseclass ObjectPool:    def __init__(self, factory):        self._factory = factory        self._pool = []        def acquire(self):        return self._pool.pop() if self._pool else self._factory()        def release(self, obj):        self._pool.append(obj)# Memory-efficient data structuresint_list = [1, 2, 3, 4, 5]int_array = array.array('i', [1, 2, 3, 4, 5])print(f"List: {sys.getsizeof(int_list)} bytes")print(f"Array: {sys.getsizeof(int_array)} bytes")# Efficient string operationsdef build_string_bad(items):    result = ""    for item in items:        result += str(item) + ", "    return result.rstrip(", ")def build_string_good(items):    return ", ".join(str(item) for item in items)# Weak references for cachingclass CacheManager:    def __init__(self):        self._cache = WeakValueDictionary()        def get_object(self, key, factory):        obj = self._cache.get(key)        if obj is None:            obj = factory()            self._cache[key] = obj        return obj

Common pitfalls and tips:

Creating unnecessary objects in loops instead of reusing them.
Using lists for homogeneous numeric data when arrays would be more efficient.
Not being aware of circular references that prevent garbage collection.

These questions are hands-on projects to test your problem-solving skills and ability to write clean, efficient Python code under pressure. Hiring managers use these to see how you approach problems, handle edge cases, and optimize solutions. They're not just looking for working code; they want to see if you can write readable, maintainable solutions that demonstrate strong coding skills and understanding of data structures.

Coding Challenges

Implement an LRU cache using data structures

An LRU (Least Recently Used) cache evicts the least recently used item when the cache is full. The challenge is maintaining O(1) time complexity for both get and put operations.

python

from collections import OrderedDict# Simple approach using OrderedDictclass LRUCache:    def __init__(self, capacity):        self.cache = OrderedDict()        self.capacity = capacity        def get(self, key):        if key not in self.cache:            return -1        # Move to end (most recent)        self.cache.move_to_end(key)        return self.cache[key]        def put(self, key, value):        if key in self.cache:            self.cache.move_to_end(key)        self.cache[key] = value        if len(self.cache) > self.capacity:            # Remove least recently used (first item)            self.cache.popitem(last=False)# Usage examplecache = LRUCache(2)cache.put(1, 1)cache.put(2, 2)print(cache.get(1))  # Returns 1cache.put(3, 3)      # Evicts key 2print(cache.get(2))  # Returns -1 (not found)

Common pitfalls and tips:

Not maintaining both data structures (hash map and linked list) properly.
Forgetting to update pointers when moving nodes.
Not handling the case when updating an existing key.

What is the Pandas library, and how does it compare to NumPy?

Pandas is built on top of NumPy and provides high-level data structures and tools for data analysis. NumPy for mathematical operations and scientific computing; Pandas for data cleaning, exploration, time series analysis, and creating line plots for data visualization workflows.

python

import pandas as pdimport numpy as np# NumPy: homogeneous numerical arraysnp_array = np.array([[1, 2, 3], [4, 5, 6]])print(np_array * 2)  # Element-wise multiplication# Pandas: labeled, heterogeneous datadf = pd.DataFrame({    'name': ['Alice', 'Bob', 'Charlie'],    'age': [25, 30, 35],    'salary': [50000, 60000, 70000]})print(df.head())# Key differences demonstrated# 1. Indexing# NumPy uses integer positionsprint(np_array[0, 1])  # 2# Pandas uses labelsdf.set_index('name', inplace=True)print(df.loc['Alice', 'age'])  # 25# 2. Missing data handling# NumPy doesn't handle missing data wellnp_with_nan = np.array([1, 2, np.nan, 4])print(np_with_nan.mean())  # nan# Pandas handles missing data gracefullyseries = pd.Series([1, 2, None, 4])print(series.mean())  # 2.333... (ignores NaN)# 3. Data alignmentseries1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])series2 = pd.Series([4, 5, 6], index=['b', 'c', 'd'])print(series1 + series2)  # Aligns by index automatically# 4. Time series functionalitydates = pd.date_range('2024-01-01', periods=5)ts = pd.Series(np.random.randn(5), index=dates)print(ts.resample('2D').mean())  # Resample to 2-day periods

Common pitfalls and tips:

Using Pandas for pure numerical computations (NumPy is faster).
Not understanding that Pandas Series/DataFrame contain NumPy arrays.
Overlooking Pandas' memory overhead for small datasets.

How do you merge two or more DataFrames in Pandas?

Merging DataFrames is essential for combining data from different sources. Pandas provides several methods depending on how you want to combine the data.

python

import pandas as pd# Create sample DataFramesdf1 = pd.DataFrame({    'id': [1, 2, 3, 4],    'name': ['Alice', 'Bob', 'Charlie', 'David'],    'department': ['Sales', 'IT', 'HR', 'Sales']})df2 = pd.DataFrame({    'id': [2, 3, 4, 5],    'salary': [60000, 55000, 62000, 58000],    'bonus': [5000, 3000, 4000, 3500]})df3 = pd.DataFrame({    'department': ['Sales', 'IT', 'HR'],    'budget': [100000, 150000, 80000]})# 1. merge() - Database-style joins# Inner join (default) - only matching recordsinner_merge = pd.merge(df1, df2, on='id')print(inner_merge)  # Only ids 2, 3, 4# Left join - all from left DataFrameleft_merge = pd.merge(df1, df2, on='id', how='left')print(left_merge)  # All from df1, NaN for missing df2 data# Right join - all from right DataFrame  right_merge = pd.merge(df1, df2, on='id', how='right')# Outer join - all from bothouter_merge = pd.merge(df1, df2, on='id', how='outer')# Merge on multiple columnsdf_merged = pd.merge(df1, df3, on='department')# Different column namesdf2_renamed = df2.rename(columns={'id': 'emp_id'})merged = pd.merge(df1, df2_renamed, left_on='id', right_on='emp_id')# 2. join() - Index-based joiningdf1_indexed = df1.set_index('id')df2_indexed = df2.set_index('id')joined = df1_indexed.join(df2_indexed)# 3. concat() - Stack DataFrames# Vertical concatenation (stack rows)df_same_cols = pd.DataFrame({    'id': [5, 6],    'name': ['Eve', 'Frank'],    'department': ['IT', 'HR']})vertical_concat = pd.concat([df1, df_same_cols], ignore_index=True)# Horizontal concatenation (side by side)horizontal_concat = pd.concat([df1, df2], axis=1)# Merge multiple DataFramesfrom functools import reducedfs = [df1, df2, df3]merged_all = reduce(lambda left, right: pd.merge(left, right, on='id', how='outer'), dfs)# Handling duplicate column namesdf_merged = pd.merge(df1, df2, on='id', suffixes=('_left', '_right'))

Common pitfalls and tips:

Not specifying join type explicitly (defaults to inner).
Forgetting to handle duplicate column names.
Memory issues when merging large DataFrames.

How do you read and process a text file in Python?

Reading and processing a text file is a fundamental skill tested in many coding challenges and practical scenarios.

python

# Method 1: Using with statement (recommended)with open('data.txt', 'r') as file:    content = file.read()  # Read entire file    # File automatically closed after this block# Method 2: Read line by line (memory efficient)with open('data.txt', 'r') as file:    for line in file:        process_line(line.strip())  # Remove newline# Method 3: Read all lines into a listwith open('data.txt', 'r') as file:    lines = file.readlines()  # List of lines with \n# Method 4: Read specific number of characterswith open('data.txt', 'r') as file:    chunk = file.read(1024)  # Read 1024 characters# Writing to filewith open('output.txt', 'w') as file:    file.write("Hello, World!\n")    # Append to filewith open('log.txt', 'a') as file:    file.write(f"Log entry: {timestamp}\n")# Handle encodingwith open('data.txt', 'r', encoding='utf-8') as file:    text = file.read()# Word frequency counter (common interview question)word_count = {}with open('text.txt', 'r') as file:    for line in file:        words = line.strip().split()        for word in words:            word_count[word] = word_count.get(word, 0) + 1

Common pitfalls:

Forgetting to close files (use with statement).

Loading huge files entirely into memory instead of processing line by line.

Not handling encoding issues with non-ASCII text.

File handling questions are common for data analyst positions where processing CSV files and logs is routine.

Final thoughts: Prepare smarter, not just harder

Whether you’re aiming for your first junior dev role or leveling up into a senior position, Python interviews test more than syntax. They assess how well you think, solve problems, and apply the language to build clean, efficient solutions.

No matter where you are in your career, we’re here to help. We highly recommend that you check our Python developer roadmap. Take this as a compass, a useful tool that will help you navigate the complexities of the Python universe, from the most basic syntax to advanced concepts.

But you won’t be alone in your learning journey. Our AI tutor will follow you and guide your prep. It adapts to your skill level, gives you tailored challenges, and helps you avoid common pitfalls. Think of it as your practice partner; ask it questions when you’re stuck, try out different examples, and push yourself with new code until the concepts click. That way, you’ll step into your interviews with absolute confidence.

Other Guides

In this article

Top 20 Python Interview Questions and Answers

Preparing for your Python interview

Test yourself with Flashcards

Questions List

Python basics and core concepts

What is Python, and why is it called a high-level interpreted programming language?

What are mutable vs. immutable data types in Python? Give examples.

What is the difference between global scope, local scope, and nonlocal variables?

What are Python's built-in data structures, and when would you use each?

What is the difference between a shallow copy and a deep copy?

How is memory allocation and garbage collection handled in Python?

How are arguments passed in Python: by value or by reference?

What is dictionary comprehension in Python?

Intermediate Python interview questions

What is the difference between a normal function and a lambda function?

How do you define a Python function with a variable number of arguments?

What is the yield keyword, and how does it work?

What are Python's magic (dunder) methods?

Advanced concepts

What is the Global Interpreter Lock (GIL), and why does it matter?

How does memory allocation work in Python?

What is multithreading and multiprocessing in Python?

How do you handle an existing object in memory efficiently?

Coding Challenges

Implement an LRU cache using data structures

What is the Pandas library, and how does it compare to NumPy?

How do you merge two or more DataFrames in Pandas?

How do you read and process a text file in Python?

Final thoughts: Prepare smarter, not just harder