Data Analyst Interview Questions 2026: Top 50 Questions with Answers
Data analyst is one of the most in-demand roles in India right now — and for good reason. Companies like TCS, Accenture, Wipro, Amazon, Deloitte, and hundreds of startups are all hiring data analysts in 2026. But the interview process can feel overwhelming: SQL queries, Python scripts, statistics questions, case studies, and behavioral rounds, all in the same interview loop.
This guide covers the top 50 data analyst interview questions actually asked at Indian and global companies in 2026, organized by category, with sample answers. Whether you're a fresher applying to your first data analyst role or an experienced professional targeting FAANG, this is the most complete preparation resource you'll find.
What You'll Learn: Top SQL questions with solutions • Python and Pandas questions • Statistics and probability questions • Case study and business problem questions • Behavioral questions with STAR answers • Company-specific tips for TCS, Accenture, Wipro, and Amazon • How to use AI to prepare for your data analyst interview in real time.
The Data Analyst Interview Process in 2026
Before diving into the questions, understand the typical interview structure. Most data analyst interviews in India follow this pattern:
- Round 1 — Online Assessment: Aptitude, basic SQL queries, logical reasoning (common at TCS, Wipro, Accenture for mass hiring)
- Round 2 — Technical Interview 1: SQL, Excel, basic statistics, and business understanding
- Round 3 — Technical Interview 2: Python/Pandas, advanced SQL, data modeling, case studies
- Round 4 — Managerial/Case Round: Business problem-solving, dashboard interpretation, stakeholder communication
- Round 5 — HR Round: Behavioral questions, salary discussion, culture fit
Product companies (Amazon, Flipkart, Swiggy) tend to be more rigorous in rounds 2–4, while IT services companies (TCS, Wipro, Infosys) place more weight on round 1 and the HR round. Use an AI interview copilot to get real-time assistance across all of these rounds.
SQL Interview Questions for Data Analysts
SQL is the most tested skill in any data analyst interview. You will be asked to write queries live, explain query execution, and optimize slow queries. Here are the most commonly asked SQL questions:
1. What is the difference between WHERE and HAVING?
Sample Answer: WHERE filters rows before grouping (before GROUP BY), while HAVING filters groups after aggregation (after GROUP BY). For example: SELECT department, COUNT() FROM employees WHERE salary > 50000 GROUP BY department HAVING COUNT() > 5 — here WHERE removes low-salary employees first, then HAVING removes departments with fewer than 5 remaining employees.
2. Write a query to find the second highest salary in a table.
Sample Answer: SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees). Alternatively using DENSE_RANK: SELECT salary FROM (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) as rnk FROM employees) t WHERE rnk = 2. The window function approach is preferred as it handles ties correctly.
3. What is the difference between INNER JOIN, LEFT JOIN, and FULL OUTER JOIN?
Sample Answer: INNER JOIN returns only rows where there is a match in both tables. LEFT JOIN returns all rows from the left table and matched rows from the right (NULL for non-matches). FULL OUTER JOIN returns all rows from both tables, with NULLs where there is no match. In data analysis, LEFT JOIN is most commonly used when you want to preserve all records from your primary table (e.g., all customers) even if they have no matching transactions.
4. How do you find duplicate records in a table?
Sample Answer: SELECT email, COUNT() as count FROM users GROUP BY email HAVING COUNT() > 1. To delete duplicates while keeping one: use a CTE with ROW_NUMBER() partitioned by the duplicate columns, then delete rows where row_number > 1.
5. What is a window function? Give an example.
Sample Answer: A window function performs a calculation across a set of rows related to the current row without collapsing the result into a group. Unlike GROUP BY, it preserves individual rows. Example: SELECT name, salary, AVG(salary) OVER (PARTITION BY department) as dept_avg FROM employees — this shows each employee's salary alongside their department's average salary, on the same row.
6. Write a query to calculate 7-day rolling average of daily sales.
Sample Answer: SELECT date, sales, AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as rolling_7day_avg FROM daily_sales. This is a common question at product companies like Swiggy, Zomato, and Amazon.
7. What is a CTE and when would you use it?
Sample Answer: A Common Table Expression (CTE) is a temporary named result set defined with the WITH clause. Use it when: (1) a subquery is referenced multiple times, (2) you need to break a complex query into readable steps, or (3) writing recursive queries. CTEs improve readability significantly and are preferred over nested subqueries in production code.
8. Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER().
Sample Answer: All three assign numbers to rows. ROW_NUMBER() gives a unique number to every row (1,2,3,4). RANK() gives the same number to ties but skips the next rank (1,2,2,4). DENSE_RANK() gives the same number to ties without skipping (1,2,2,3). For finding the Nth highest value, use DENSE_RANK() to avoid skipping.
Python and Pandas Interview Questions
Python is increasingly expected even in non-FAANG data analyst roles. Companies want to see that you can handle real datasets beyond what SQL can do easily.
9. How do you handle missing values in a Pandas DataFrame?
Sample Answer: First identify: df.isnull().sum(). Then decide the strategy: drop with df.dropna(), fill with mean/median (df.fillna(df.mean())), forward-fill (df.fillna(method='ffill')), or use domain knowledge to fill with a specific value. The choice depends on context — for time series, forward-fill makes sense; for a categorical column, filling with mode may be appropriate.
10. What is the difference between apply(), map(), and applymap() in Pandas?
Sample Answer: map() works on a Series, applying a function element-wise. apply() works on a Series or DataFrame — on a Series it is element-wise, on a DataFrame it applies along an axis (rows or columns). applymap() (now map() in Pandas 2.0+) works element-wise on the entire DataFrame. For most transformations, apply() on a DataFrame column is the most commonly used.
11. How do you merge two DataFrames in Pandas?
Sample Answer: pd.merge(df1, df2, on='customer_id', how='left'). The how parameter mirrors SQL joins: 'inner', 'left', 'right', 'outer'. Use df1.join(df2) for index-based merges. For concatenating along rows: pd.concat([df1, df2], ignore_index=True).
12. Write Python code to find the top 3 categories by revenue from a sales DataFrame.
Sample Answer: df.groupby('category')['revenue'].sum().nlargest(3).reset_index(). This groups by category, sums revenue, then uses nlargest(3) to get the top 3. Alternatively: df.groupby('category')['revenue'].sum().sort_values(ascending=False).head(3).
13. What is the difference between loc and iloc?
Sample Answer: loc uses label-based indexing — you reference rows and columns by their actual labels. iloc uses integer position-based indexing. Example: if your DataFrame has index [10, 20, 30], df.loc[10] returns the row labeled 10, while df.iloc[0] returns the first row regardless of its label.
Statistics and Probability Questions
Statistics questions separate candidates who truly understand data from those who just know the tools. These are especially common at product companies and analytics-focused roles.
14. What is the difference between mean, median, and mode? When would you use each?
Sample Answer: Mean is the arithmetic average — sensitive to outliers. Median is the middle value when sorted — robust to outliers. Mode is the most frequent value — used for categorical data. Use median over mean when data is skewed (e.g., salary data, where a few very high salaries would distort the mean). Use mode for categorical data like "most popular product category."
15. Explain p-value in simple terms.
Sample Answer: The p-value is the probability of observing results at least as extreme as the current results, assuming the null hypothesis is true. A p-value of 0.03 means there is a 3% chance you would see this result if there were actually no effect. If p < 0.05 (the conventional threshold), we reject the null hypothesis. A low p-value does not tell you the effect is large or practically significant — it just says it is unlikely to be random.
16. What is the difference between Type I and Type II errors?
Sample Answer: Type I error (false positive) — you reject the null hypothesis when it is actually true. Type II error (false negative) — you fail to reject the null hypothesis when it is actually false. In a medical test analogy: Type I = diagnosing a healthy person as sick. Type II = missing a disease in a sick person. In A/B testing: Type I = declaring a winning variant when there is no real difference. Type II = missing a real improvement.
17. What is A/B testing? Walk me through how you would set one up.
Sample Answer: A/B testing is a controlled experiment comparing two versions (A and B) to measure which performs better on a metric. Setup: (1) Define hypothesis and success metric, (2) Calculate required sample size based on desired statistical power (80%) and significance level (0.05), (3) Randomly assign users to control (A) and treatment (B), (4) Run the test for the calculated duration without peeking, (5) Analyze results using a statistical test (t-test for means, chi-square for proportions), (6) Make the decision based on p-value and confidence interval.
18. What is the Central Limit Theorem and why does it matter?
Sample Answer: The CLT states that the sampling distribution of the sample mean approaches a normal distribution as sample size increases, regardless of the population's distribution. It matters because it allows us to use normal distribution-based statistical tests (t-tests, z-tests) even when the underlying data is not normally distributed, as long as the sample size is large enough (typically n > 30).
19. What is correlation vs. causation? Give a business example.
Sample Answer: Correlation means two variables move together. Causation means one variable directly causes a change in another. Example: ice cream sales and drowning rates are correlated — both increase in summer — but ice cream does not cause drowning. The confounding variable is temperature. In business: a company might see that users who receive marketing emails have higher retention. But if they only email engaged users, the emails are not causing retention — engagement is causing both. A/B testing or causal inference methods are needed to establish causation.
Case Study and Business Questions
These questions test your ability to think like an analyst, not just write code. They are common in round 3 and 4 at companies like Amazon, Deloitte, and EY.
20. A key metric dropped by 15% last week. How would you investigate?
Sample Answer: Structured approach: (1) Define the metric — is it orders, revenue, sessions, or conversion rate? Each has different root causes. (2) Check data pipeline — is the drop real or is it a tracking issue? (3) Segment the drop — by platform (mobile/desktop), geography, product category, traffic source, user type (new vs. returning). (4) Check for external events — holidays, competitor promotions, payment gateway issues. (5) Correlate with product changes — any deployments, pricing changes, or feature launches that week? (6) Form a hypothesis and validate with data before concluding.
21. How would you measure the success of a new feature launch?
Sample Answer: Define the primary metric tied to the feature's goal (e.g., for a recommendation feature, it is click-through rate on recommendations). Define guardrail metrics that should not degrade (e.g., session length, bounce rate). Run an A/B test to isolate the feature's impact. Look at short-term metrics (day 1–7) and longer-term metrics (day 30) as some features take time to show impact. Check for segment-level effects — a feature might help one user group and hurt another.
22. We are losing customers. How would you identify why and what to do?
Sample Answer: (1) Define "losing customers" — is it churn rate, reduced purchase frequency, or shrinking active users? (2) Segment churned customers by cohort, plan type, tenure, geography, and usage pattern. (3) Compare churned vs. retained users — what is different about their behavior in the 30 days before churn? (4) Survey churned users if possible. (5) Look at the competitive landscape — is a competitor offering a better deal? (6) Identify the top 2–3 hypotheses with data, then propose targeted interventions (win-back campaigns, feature improvements, pricing changes).
Behavioral Questions for Data Analysts
Do not underestimate behavioral rounds. Especially at IT services companies (TCS, Accenture, Wipro), the HR round is heavily weighted. Use the STAR method for all behavioral answers.
23. Tell me about a time you found a significant insight from data.
Sample Answer (STAR): Situation: My team was analyzing customer support ticket data. Task: Identify patterns to reduce support load. Action: I noticed 40% of tickets came from one specific onboarding step. I built a funnel analysis showing exactly where users got stuck. Result: The product team fixed that step, reducing support tickets by 35% in one month. The insight came from segmenting tickets by user journey stage — something the team had not tried before.
24. How do you explain complex data findings to non-technical stakeholders?
Sample Answer: Lead with the business implication, not the methodology. Instead of "the regression shows a 0.73 R-squared," say "sales can be predicted with 73% accuracy using just two variables." Use visuals over numbers — a bar chart communicates faster than a table. Anticipate questions and have the supporting data ready. Always ask yourself: "What decision does this person need to make?" and structure findings around that decision.
25. Describe a situation where your analysis led to a wrong decision. What did you learn?
Sample Answer: I once recommended expanding to a new city based on strong search volume data. We later found that the search volume was driven by a one-time event, not sustained interest. I learned to always validate directional data with multiple signals, set a clear success metric before launch, and recommend a pilot rather than a full rollout when data is limited. Now I explicitly distinguish between "this data shows X" and "therefore we should definitely do Y."
Company-Specific Tips
TCS Data Analyst Interview
TCS interviews are structured and process-oriented. Expect basic to intermediate SQL (JOINs, GROUP BY, subqueries), Excel questions (VLOOKUP, pivot tables), and questions about your project experience. The HR round at TCS is very important — they assess communication skills, willingness to relocate, and culture fit strongly. Be ready to explain your college projects clearly.
Accenture Data Analyst Interview
Accenture focuses heavily on business understanding alongside technical skills. Expect case study questions like "how would you measure customer satisfaction?" alongside SQL and Python. The communication round tests your ability to present data findings. Accenture values candidates who can bridge the gap between technical analysis and business recommendations.
Wipro Data Analyst Interview
Wipro's interview process typically includes an online coding test (SQL and Python basics), followed by a technical interview and HR round. They commonly ask about ETL processes, data warehousing concepts (star schema, snowflake schema), and basic machine learning concepts. Freshers should be especially prepared for aptitude questions in the first round.
Amazon Data Analyst Interview
Amazon's data analyst interview is significantly more rigorous. Expect advanced SQL (window functions, CTEs, recursive queries), Python for data manipulation, statistics (A/B testing, distributions), and case studies with Amazon's metrics (click-through rate, conversion rate, customer lifetime value). Behavioral questions at Amazon are tied to the 16 Leadership Principles — every answer should demonstrate one or more principles. Prepare STAR stories for each principle relevant to a data analyst role.
Pro Tip for Amazon: Amazon's most common data analyst question is asking you to investigate a metric drop. Practice the structured framework: define the metric → check data integrity → segment by dimensions → correlate with changes → form hypothesis. Do this out loud so the interviewer can follow your thinking.
How to Prepare with AI
Reading question lists is useful, but the real preparation happens when you practice answering questions out loud under pressure — which is exactly what interviews are like.
Use Chiku AI as your interview preparation and real-time support tool:
- Practice SQL and Python questions out loud — Start a Chiku AI session, set the company as "TCS" and role as "Data Analyst," and practice with AI-generated hints and solutions when you get stuck.
- Run mock interviews — Use Chiku AI in a practice Google Meet call. The AI listens and provides instant answer frameworks, helping you structure responses under pressure.
- Use it in the actual interview — Chiku AI's desktop app runs invisibly during live interviews, giving you real-time support when an unexpected question catches you off guard. Invisible during screen shares, works on Zoom, Google Meet, and Teams.
Start Free: Every new Chiku AI account includes 10 minutes of free AI interview assistance — no credit card required. Try it in a mock data analyst interview before your real one. Paid plans start at ₹471 (tax inclusive), with credits that never expire. See pricing
Quick Reference: 25 Must-Know Questions
Review these the night before your interview:
- Difference between WHERE and HAVING
- Second highest salary query
- INNER JOIN vs LEFT JOIN vs FULL OUTER JOIN
- Find duplicate records
- What is a window function?
- 7-day rolling average query
- What is a CTE?
- RANK vs DENSE_RANK vs ROW_NUMBER
- Handle missing values in Pandas
- apply() vs map() in Pandas
- Merge two DataFrames
- Top 3 categories by revenue in Python
- loc vs iloc
- Mean vs median vs mode
- Explain p-value simply
- Type I vs Type II error
- How to set up an A/B test
- Central Limit Theorem
- Correlation vs causation
- Investigating a 15% metrics drop
- Measuring success of a new feature
- Identifying reasons for customer churn
- Tell me about a data insight you found
- Explaining complex data to non-technical stakeholders
- A time your analysis led to a wrong decision
Data analyst interviews are winnable with the right preparation. The companies hiring in India in 2026 — TCS, Accenture, Wipro, Amazon, Deloitte, and hundreds of funded startups — all want the same core skills: SQL fluency, Python comfort, statistical thinking, and the ability to translate data into business decisions.
Master those four areas, practice out loud, and use every tool available — including Chiku AI for real-time support when you need it most. Ready to practice? Start your free trial on Chiku AI — 10 minutes free, no credit card required.
