Top 200 Artificial Intelligence Interview Questions & Answers
Basic Questions (1-80)
Q1. What is Artificial Intelligence?
Artificial Intelligence (AI) is a branch of computer science that enables machines to simulate human cognitive functions such as learning, reasoning, problem-solving, perception, and language understanding. AI systems are designed to perform tasks that typically require human intelligence, ranging from visual perception to decision-making.
Q2. What is the difference between AI, Machine Learning, and Deep Learning?
AI is the broad field of making machines intelligent. Machine Learning (ML) is a subset of AI where systems learn patterns from data without explicit programming. Deep Learning is a subset of ML using multi-layered neural networks to automatically learn hierarchical representations from large datasets.
Q3. What are the main types of AI?
AI is categorized as Narrow AI (ANI) which performs specific tasks like facial recognition, General AI (AGI) which can perform any intellectual task a human can, and Super AI (ASI) which surpasses human intelligence. Currently, all deployed AI systems are Narrow AI.
Q4. What is a search algorithm in AI?
Search algorithms are methods used by AI agents to explore a problem space to find a solution path. They include uninformed search (BFS, DFS, UCS) that have no domain knowledge and informed search (A*, Greedy Best-First) that use heuristics to guide the search toward the goal efficiently.
Q5. What is Breadth-First Search (BFS)?
BFS is an uninformed search algorithm that explores all nodes at the current depth before moving to nodes at the next depth level. It uses a queue data structure and guarantees finding the shortest path in terms of number of steps. Its time and space complexity is O(b^d) where b is branching factor and d is depth.
Q6. What is Depth-First Search (DFS)?
DFS is an uninformed search algorithm that explores as far as possible along each branch before backtracking. It uses a stack (or recursion) and has space complexity of O(b*m) where m is the maximum depth, making it more memory efficient than BFS but not guaranteed to find the shortest path.
Q7. What is the A* algorithm?
A* is an informed search algorithm that finds the shortest path by combining the actual cost g(n) from the start and a heuristic estimate h(n) to the goal: f(n) = g(n) + h(n). It is complete and optimal when the heuristic is admissible (never overestimates the true cost).
Q8. What is a heuristic function?
A heuristic function estimates the cost or distance from a given node to the goal state. An admissible heuristic never overestimates the true cost, and a consistent heuristic satisfies the triangle inequality. Good heuristics reduce the search space significantly in informed search algorithms.
Q9. What is knowledge representation in AI?
Knowledge representation is the process of encoding knowledge about the world into a form that AI systems can use for reasoning. Common methods include semantic networks, frames, ontologies, production rules, and logic-based representations like propositional and predicate logic.
Q10. What is propositional logic?
Propositional logic is a formal system where statements (propositions) are either true or false, connected by logical operators: AND (∧), OR (∨), NOT (¬), IMPLIES (→), and BICONDITIONAL (↔). It forms the basis for automated reasoning in AI systems though it lacks the ability to represent relationships between objects.
Q11. What is predicate logic (first-order logic)?
First-order logic (FOL) extends propositional logic by introducing predicates, variables, functions, and quantifiers (∀ universal, ∃ existential). It allows AI systems to represent complex relationships and reason about objects and their properties, making it far more expressive than propositional logic.
Q12. What is an expert system?
An expert system is a rule-based AI program that emulates the decision-making ability of a human expert in a specific domain. It consists of a knowledge base (facts and rules), an inference engine that applies rules, and an explanation interface. Examples include medical diagnosis systems and financial advisory tools.
Q13. What is a production rule system?
A production rule system uses IF-THEN rules to represent knowledge and make inferences. The inference engine applies rules through forward chaining (data-driven reasoning from facts to conclusions) or backward chaining (goal-driven reasoning from a goal back to supporting facts).
Q14. What is the Turing Test?
The Turing Test, proposed by Alan Turing in 1950, is a test of machine intelligence where a human evaluator interacts via text with both a human and a machine without knowing which is which. If the evaluator cannot reliably distinguish the machine from the human, the machine is considered to have passed the test.
Q15. What is natural language processing (NLP)?
NLP is a branch of AI that enables computers to understand, interpret, and generate human language. It involves tasks such as tokenization, part-of-speech tagging, named entity recognition, sentiment analysis, machine translation, and text summarization using both rule-based and machine learning approaches.
Q16. What is computer vision?
Computer vision is an AI field that enables machines to interpret and understand visual information from images and videos. Key tasks include image classification, object detection, image segmentation, facial recognition, and optical character recognition (OCR), often powered by convolutional neural networks (CNNs).
Q17. What is machine learning?
Machine learning is a subset of AI where systems learn from data to improve their performance on tasks without being explicitly programmed. It includes supervised learning (labeled data), unsupervised learning (unlabeled data), semi-supervised learning, and reinforcement learning (reward-based feedback).
Q18. What is supervised learning?
Supervised learning is a type of ML where a model is trained on labeled input-output pairs to learn a mapping function. Common algorithms include linear regression, logistic regression, decision trees, SVMs, and neural networks. The model is evaluated on unseen test data using metrics like accuracy, precision, and recall.
Q19. What is unsupervised learning?
Unsupervised learning uses unlabeled data to discover hidden patterns, structures, or groupings. Common techniques include clustering (K-means, DBSCAN, hierarchical clustering), dimensionality reduction (PCA, t-SNE, autoencoders), and association rule mining (Apriori algorithm).
Q20. What is reinforcement learning?
Reinforcement learning (RL) is an ML paradigm where an agent learns by interacting with an environment, receiving rewards for correct actions and penalties for incorrect ones. The agent aims to maximize cumulative reward through a policy. Key algorithms include Q-learning, SARSA, and Deep Q-Networks (DQN).
Q21. What is a neural network?
A neural network is a computational model inspired by the human brain, consisting of layers of interconnected nodes (neurons). It includes an input layer, one or more hidden layers, and an output layer. Each connection has a weight that is adjusted during training via backpropagation to minimize prediction error.
Q22. What is the difference between AI planning and search?
Search algorithms find paths through a state space to reach a goal, while AI planning focuses on generating a sequence of actions that transitions from an initial state to a goal state. Planning uses formal action representations (STRIPS, PDDL) and handles complex domains with many interdependent actions.
Q23. What is constraint satisfaction?
Constraint satisfaction problems (CSPs) involve finding assignments for variables that satisfy a set of constraints. AI solvers use techniques like backtracking search, arc consistency (AC-3), and constraint propagation. Examples include map coloring, scheduling, and Sudoku solving.
Q24. What is Minimax algorithm?
Minimax is a decision-making algorithm used in two-player zero-sum games where one player maximizes the score and the other minimizes it. It recursively evaluates game states, creating a game tree. Alpha-beta pruning is an optimization that eliminates branches that cannot affect the final decision.
Q25. What is Monte Carlo Tree Search (MCTS)?
MCTS is a heuristic search algorithm for decision-making in games and planning problems. It builds a search tree through repeated random simulations (rollouts) and uses the UCT (Upper Confidence bounds for Trees) formula to balance exploration and exploitation of the game tree.
Q26. What is the difference between deductive, inductive, and abductive reasoning?
Deductive reasoning draws certain conclusions from general premises (top-down). Inductive reasoning generalizes patterns from specific observations (bottom-up, probabilistic). Abductive reasoning infers the most likely explanation for an observation. AI systems use all three forms in different reasoning tasks.
Q27. What is an AI agent?
An AI agent is an autonomous entity that perceives its environment through sensors and takes actions through actuators to achieve goals. Agents are classified by their rationality and structure: simple reflex, model-based reflex, goal-based, utility-based, and learning agents.
Q28. What is a PEAS description in AI?
PEAS stands for Performance measure, Environment, Actuators, and Sensors — a framework for specifying an AI agent's task environment. For example, a self-driving car has safety as its performance measure, roads as its environment, steering and brakes as actuators, and cameras and LIDAR as sensors.
Q29. What are the properties of task environments in AI?
Task environments are characterized as: fully vs partially observable, deterministic vs stochastic, episodic vs sequential, static vs dynamic, discrete vs continuous, single-agent vs multi-agent, and known vs unknown. These properties influence the choice of AI agent architecture.
Q30. What is Bayesian reasoning in AI?
Bayesian reasoning applies Bayes' theorem — P(H|E) = P(E|H)*P(H)/P(E) — to update the probability of a hypothesis given new evidence. It is foundational to probabilistic AI, enabling systems to reason under uncertainty, update beliefs incrementally, and make decisions based on degrees of belief.
Q31. What is a Bayesian network?
A Bayesian network (Bayes net) is a probabilistic graphical model that represents dependencies among variables using a directed acyclic graph (DAG). Nodes represent variables and edges represent conditional dependencies. They are used for diagnosis, prediction, and causal reasoning in uncertain domains.
Q32. What is a Markov Decision Process (MDP)?
An MDP is a mathematical framework for sequential decision-making under uncertainty. It consists of states (S), actions (A), transition probabilities P(s'|s,a), and rewards R(s,a). AI agents use MDPs to find optimal policies using value iteration or policy iteration algorithms.
Q33. What is fuzzy logic?
Fuzzy logic is an AI technique that handles imprecise or vague information by allowing truth values between 0 and 1 (rather than strict true/false). It uses membership functions to represent degrees of truth and is widely used in control systems, appliances, and decision systems where crisp boundaries are impractical.
Q34. What is genetic algorithm?
Genetic algorithms (GAs) are evolutionary optimization techniques inspired by natural selection. They maintain a population of candidate solutions, apply selection, crossover (recombination), and mutation operators to evolve better solutions over generations. GAs are used for optimization problems where gradient-based methods are impractical.
Q35. What is swarm intelligence?
Swarm intelligence is a collective behavior AI approach inspired by social organisms like ants, bees, and birds. Algorithms such as Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO) use decentralized, self-organizing agents to solve optimization problems like routing and scheduling.
Q36. What is the difference between weak AI and strong AI?
Weak AI (Narrow AI) is designed for specific tasks and cannot generalize beyond its programmed domain. Strong AI (General AI/AGI) refers to a hypothetical machine capable of performing any intellectual task that a human can do, with genuine understanding and consciousness rather than mere simulation.
Q37. What is transfer learning?
Transfer learning is a technique where a model trained on one task is reused as the starting point for a different but related task. Pre-trained models like BERT, GPT, and ResNet are fine-tuned for specific downstream tasks, saving significant training time and data requirements.
Q38. What is tokenization in NLP?
Tokenization is the process of splitting text into smaller units called tokens (words, subwords, or characters). Word tokenization splits on spaces and punctuation. Subword tokenization (BPE, WordPiece, SentencePiece) handles rare and unknown words by breaking them into common subword units.
Q39. What is named entity recognition (NER)?
NER is an NLP task that identifies and classifies named entities in text into predefined categories such as persons (PER), organizations (ORG), locations (LOC), dates, and monetary values. Modern NER systems use sequence labeling models like BiLSTM-CRF or transformer-based models like BERT.
Q40. What is sentiment analysis?
Sentiment analysis is an NLP task that determines the emotional tone or opinion expressed in text, classifying it as positive, negative, or neutral. It is widely used in brand monitoring, product reviews, social media analysis, and customer feedback systems using both lexicon-based and deep learning approaches.
Q41. What is word embedding?
Word embeddings are dense vector representations of words that capture semantic meaning based on context. Popular methods include Word2Vec (CBOW and Skip-gram), GloVe, and FastText. Unlike one-hot encoding, embeddings place semantically similar words close together in vector space.
Q42. What is an ontology in AI?
An ontology is a formal representation of knowledge as a set of concepts, relationships, and constraints within a domain. AI systems use ontologies (like OWL — Web Ontology Language) for knowledge sharing, semantic web applications, and reasoning about domain-specific facts and relationships.
Q43. What is computer vision object detection?
Object detection identifies and locates objects within images or videos by drawing bounding boxes around them. Modern approaches include single-stage detectors (YOLO, SSD) and two-stage detectors (R-CNN, Faster R-CNN). YOLO is especially popular for real-time detection due to its speed.
Q44. What is image segmentation?
Image segmentation divides an image into meaningful regions. Semantic segmentation assigns a class label to each pixel. Instance segmentation identifies each individual object instance. Panoptic segmentation combines both. U-Net and Mask R-CNN are widely used architectures for segmentation tasks.
Q45. What is principal component analysis (PCA)?
PCA is a dimensionality reduction technique that projects high-dimensional data onto lower-dimensional principal components that maximize variance. It decorrelates features by transforming them into orthogonal components ordered by the amount of variance they explain, aiding visualization and reducing computational cost.
Q46. What is K-means clustering?
K-means clustering partitions n data points into k clusters where each point belongs to the cluster with the nearest centroid. The algorithm iteratively assigns points to clusters and updates centroids until convergence. The elbow method or silhouette score helps determine the optimal number of clusters.
Q47. What is AI bias?
AI bias occurs when an AI system produces systematically unfair or prejudiced results due to biased training data, flawed algorithm design, or biased human feedback. Types include data bias, algorithmic bias, and confirmation bias. Responsible AI practices include fairness auditing, diverse datasets, and bias mitigation techniques.
Q48. What is explainable AI (XAI)?
Explainable AI refers to methods and techniques that make AI model decisions interpretable and understandable to humans. Techniques include LIME (Local Interpretable Model-agnostic Explanations), SHAP (SHapley Additive exPlanations), attention visualization, and saliency maps for neural networks.
Q49. What is the STRIPS planning language?
STRIPS (Stanford Research Institute Problem Solver) is a formal language for representing AI planning problems using initial states, goal states, and actions with preconditions and effects. PDDL (Planning Domain Definition Language) is its modern successor used by AI planning systems and competitions.
Q50. What is path planning in robotics AI?
Path planning involves computing a collision-free path from a start to a goal position in an environment with obstacles. AI techniques include grid-based search (A*, Dijkstra), sampling-based planners (RRT, PRM), and potential field methods. Dynamic path planning handles moving obstacles in real time.
Q51. What is the frame problem in AI?
The frame problem refers to the difficulty of efficiently representing what does NOT change when an action is taken in a knowledge representation system. Without solving the frame problem, AI systems must explicitly state every unchanged fact after each action, leading to combinatorial explosion.
Q52. What is the closed-world assumption in AI?
The closed-world assumption (CWA) states that anything not known to be true is assumed false. It is used in logic programming (Prolog) and databases. The open-world assumption (OWA), used in semantic web and description logics, states that unknown facts may be true or false.
Q53. What is resolution refutation in AI?
Resolution refutation is a proof technique in logic-based AI that proves a statement by showing that its negation leads to a contradiction. It converts logical statements to Conjunctive Normal Form (CNF) and applies the resolution rule to derive the empty clause, confirming the original statement is true.
Q54. What is the difference between model-based and model-free RL?
Model-based RL uses an internal model of the environment's dynamics to plan ahead before acting, enabling more sample-efficient learning. Model-free RL learns directly from interactions with the environment without a model, using methods like Q-learning and policy gradients, which are simpler but less sample-efficient.
Q55. What is Q-learning?
Q-learning is a model-free RL algorithm that learns the value Q(s,a) — the expected cumulative reward of taking action a in state s. It uses the Bellman equation to update Q-values: Q(s,a) ← Q(s,a) + α[r + γ*max Q(s',a') - Q(s,a)]. It converges to optimal Q-values under sufficient exploration.
Q56. What is the exploration-exploitation tradeoff in RL?
In RL, exploration involves trying new actions to discover potentially better strategies, while exploitation uses known good actions to maximize immediate reward. The ε-greedy strategy balances this by choosing a random action with probability ε and the best-known action otherwise. UCB and Thompson sampling are other approaches.
Q57. What is imitation learning?
Imitation learning trains an AI agent to mimic expert behavior from demonstrations rather than learning purely from reward signals. Behavioral cloning trains a policy directly on expert state-action pairs. DAgger (Dataset Aggregation) improves on this by interactively collecting expert corrections during training.
Q58. What is multi-agent AI?
Multi-agent AI involves systems with multiple interacting AI agents that can be cooperative, competitive, or both. Research areas include communication protocols, game theory, auction mechanisms, and emergent collective behaviors. Multi-agent systems are used in robotics swarms, trading systems, and multiplayer game AI.
Q59. What is cognitive computing?
Cognitive computing refers to AI systems that simulate human thought processes to solve complex problems. IBM Watson is a well-known example. These systems use ML, NLP, computer vision, and reasoning to analyze unstructured data, understand context, and interact naturally with humans.
Q60. What are common AI application domains?
AI is applied in healthcare (diagnosis, drug discovery), finance (fraud detection, algorithmic trading), transportation (autonomous vehicles), retail (recommendation systems), manufacturing (predictive maintenance), agriculture (precision farming), education (personalized learning), and cybersecurity (threat detection).
Q61. What is the difference between classification and regression?
Classification predicts discrete category labels (e.g., spam/not spam, disease/no disease), while regression predicts continuous numerical values (e.g., house price, temperature). Classification uses algorithms like logistic regression, SVM, and decision trees; regression uses linear regression, regression trees, and neural networks.
Q62. What is overfitting and how is it prevented?
Overfitting occurs when a model learns training data too precisely, including noise, causing poor generalization to new data. Prevention techniques include regularization (L1/L2), dropout, early stopping, cross-validation, data augmentation, reducing model complexity, and gathering more training data.
Q63. What is a confusion matrix?
A confusion matrix is a table used to evaluate classification model performance. It shows true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Derived metrics include accuracy = (TP+TN)/(TP+TN+FP+FN), precision = TP/(TP+FP), recall = TP/(TP+FN), and F1-score = 2*precision*recall/(precision+recall).
Q64. What is the ROC curve and AUC?
The ROC (Receiver Operating Characteristic) curve plots true positive rate (recall) against false positive rate at various classification thresholds. AUC (Area Under the Curve) summarizes the ROC curve into a single value between 0 and 1. A perfect classifier has AUC=1; random guessing gives AUC=0.5.
Q65. What is cross-validation?
Cross-validation is a model evaluation technique that partitions data into k folds, training on k-1 folds and validating on the remaining fold, rotating k times. K-fold cross-validation provides a more reliable performance estimate than a single train-test split by reducing dependence on a particular data partition.
Q66. What is data preprocessing in AI?
Data preprocessing transforms raw data into a clean, suitable format for ML models. Steps include handling missing values (imputation, deletion), encoding categorical variables (one-hot, label encoding), feature scaling (normalization, standardization), outlier treatment, and feature engineering to improve model performance.
Q67. What is feature engineering?
Feature engineering is the process of using domain knowledge to create, transform, or select input features that improve model performance. It includes creating polynomial features, interaction terms, log transforms, binning continuous variables, and extracting domain-specific features like day-of-week from timestamps.
Q68. What is a decision tree?
A decision tree is a supervised learning algorithm that splits data based on feature values to create a tree structure for classification or regression. Each internal node tests a feature, branches represent outcomes, and leaf nodes contain predictions. Trees are interpretable but prone to overfitting without pruning.
Q69. What is a random forest?
A random forest is an ensemble method that trains multiple decision trees on random bootstrap samples of data and random subsets of features, then averages their predictions (regression) or takes a majority vote (classification). It reduces overfitting compared to single trees and handles high-dimensional data well.
Q70. What is the difference between bagging and boosting?
Bagging (Bootstrap Aggregating) trains multiple models in parallel on random subsets of data and averages predictions to reduce variance. Boosting trains models sequentially, each correcting errors of the previous one, reducing bias. Random Forest uses bagging; AdaBoost, Gradient Boosting, and XGBoost use boosting.
Q71. What is a support vector machine (SVM)?
SVM is a supervised learning algorithm that finds the optimal hyperplane maximizing the margin between classes in feature space. The kernel trick (linear, RBF, polynomial kernels) maps data to higher dimensions to handle non-linear boundaries. SVMs are effective in high-dimensional spaces but computationally expensive on large datasets.
Q72. What is the k-nearest neighbors (KNN) algorithm?
KNN is a non-parametric, lazy learning algorithm that classifies a new point based on the majority class of its k nearest neighbors in feature space, using distance metrics like Euclidean or Manhattan distance. KNN requires no training phase but is slow at prediction time for large datasets.
Q73. What is naive Bayes classification?
Naive Bayes is a probabilistic classifier based on Bayes' theorem with the naive assumption that all features are conditionally independent given the class. Despite this simplification, it performs well on text classification (spam detection, sentiment analysis) and is computationally efficient on large datasets.
Q74. What is logistic regression?
Logistic regression is a classification algorithm that models the probability of a binary outcome using the sigmoid function: P(y=1) = 1/(1+e^(-z)) where z is a linear combination of input features. It outputs probabilities that are thresholded to make class predictions and is widely used for binary classification.
Q75. What is backpropagation?
Backpropagation is the algorithm used to train neural networks by computing gradients of the loss function with respect to each weight using the chain rule of calculus. Gradients are propagated backward from the output layer to input layers, allowing gradient descent to update weights and minimize prediction error.
Q76. What is gradient descent?
Gradient descent is an optimization algorithm that minimizes a loss function by iteratively moving parameters in the direction of the negative gradient. Variants include batch gradient descent (uses all data), stochastic gradient descent (one sample at a time), and mini-batch gradient descent (small batches), with Adam and RMSprop as popular adaptive variants.
Q77. What is the vanishing gradient problem?
The vanishing gradient problem occurs in deep neural networks when gradients become extremely small during backpropagation through many layers, causing early layers to learn very slowly. Solutions include ReLU activation functions, batch normalization, skip connections (ResNet), and LSTM/GRU architectures for sequential data.
Q78. What is regularization in neural networks?
Regularization techniques prevent overfitting in neural networks. L2 (weight decay) adds a penalty proportional to squared weights. Dropout randomly deactivates neurons during training. Batch normalization normalizes layer inputs. Early stopping halts training when validation loss stops improving.
Q79. What is an activation function?
Activation functions introduce non-linearity into neural networks, enabling them to learn complex patterns. Common functions include Sigmoid (outputs 0-1, used in binary classification output), Tanh (outputs -1 to 1), ReLU (max(0,x), most popular), Leaky ReLU, and Softmax (multinomial classification output layer).
Q80. What is batch normalization?
Batch normalization normalizes the inputs to each layer by subtracting the batch mean and dividing by batch standard deviation, then applying learned scale (γ) and shift (β) parameters. It stabilizes and accelerates training, allows higher learning rates, reduces sensitivity to initialization, and provides mild regularization.
Intermediate Questions (81-150)
Q81. What is a convolutional neural network (CNN)?
A CNN is a deep learning architecture specialized for processing grid-structured data like images. It uses convolutional layers (local feature extraction), pooling layers (spatial downsampling), and fully connected layers. Key operations include convolution, ReLU activation, and max/average pooling. Famous architectures include VGG, ResNet, and EfficientNet.
Q82. What is a recurrent neural network (RNN)?
An RNN is a neural network designed for sequential data where connections form directed cycles, allowing the network to maintain a hidden state that captures temporal context. RNNs process sequences step-by-step but suffer from vanishing gradient issues, addressed by LSTM and GRU variants.
Q83. What is LSTM?
Long Short-Term Memory (LSTM) is an RNN variant that addresses vanishing gradients using gating mechanisms: input gate (what new info to store), forget gate (what to discard from memory), and output gate (what to output). The cell state acts as a long-term memory highway, making LSTMs effective for long-range dependencies.
Q84. What is the Transformer architecture?
The Transformer, introduced in 'Attention Is All You Need' (2017), uses self-attention mechanisms instead of recurrence to process sequences in parallel. It consists of encoder and decoder stacks with multi-head self-attention, positional encoding, and feed-forward layers. It forms the basis of BERT, GPT, and T5.
Q85. What is the attention mechanism?
The attention mechanism allows a model to focus on relevant parts of the input when producing each output. Scaled dot-product attention computes: Attention(Q,K,V) = softmax(QK^T/√d_k)V where Q (queries), K (keys), and V (values) are linear projections of the input. Multi-head attention runs attention multiple times in parallel.
Q86. What is BERT?
BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained NLP model that uses bidirectional transformer encoders. Pre-trained on Masked Language Modeling (MLM) and Next Sentence Prediction (NSP), BERT can be fine-tuned for tasks like question answering, NER, and text classification.
Q87. What is GPT?
GPT (Generative Pre-trained Transformer) is a language model that uses unidirectional (left-to-right) transformer decoders, pre-trained on next token prediction using large text corpora. GPT models excel at text generation, completion, and few-shot learning. GPT-3 and GPT-4 demonstrate remarkable emergent capabilities at scale.
Q88. What is generative AI?
Generative AI refers to AI models that can generate new content — text, images, audio, video, code, or 3D models — that resembles training data. Key approaches include large language models (LLMs), diffusion models (Stable Diffusion, DALL-E), GANs, and VAEs. Applications include content creation, code generation, and synthetic data production.
Q89. What is a GAN (Generative Adversarial Network)?
A GAN consists of two neural networks: a generator that creates synthetic data and a discriminator that distinguishes real from synthetic data. They are trained adversarially — the generator tries to fool the discriminator and the discriminator tries to detect fakes — resulting in increasingly realistic generated outputs.
Q90. What is a variational autoencoder (VAE)?
A VAE is a generative model that encodes inputs into a latent space distribution (mean and variance) rather than a fixed point, then samples from this distribution to decode generated outputs. The loss combines reconstruction error and KL divergence, enabling controlled generation by sampling and interpolating in latent space.
Q91. What is the difference between discriminative and generative models?
Discriminative models learn P(y|x) — the conditional probability of a label given input features — and focus on classification boundaries. Generative models learn P(x,y) or P(x) — the joint or marginal distribution of data — and can generate new samples. Logistic regression is discriminative; Naive Bayes and GANs are generative.
Q92. What is knowledge distillation?
Knowledge distillation is a model compression technique where a smaller 'student' model is trained to mimic the output distribution (soft labels) of a larger pre-trained 'teacher' model. This transfers the teacher's knowledge into a more efficient model, achieving near-teacher performance with significantly fewer parameters.
Q93. What is federated learning?
Federated learning is a distributed ML approach where models are trained across multiple decentralized devices holding local data without sharing raw data with a central server. Each device trains locally and only sends model updates (gradients or weights) for aggregation, preserving data privacy while enabling collaborative learning.
Q94. What is self-supervised learning?
Self-supervised learning generates supervisory signals from the input data itself without manual labels. The model solves pretext tasks (predicting masked words, colorizing images, predicting next frame) as proxies for learning useful representations. BERT's masked language modeling and SimCLR's contrastive learning are examples.
Q95. What is contrastive learning?
Contrastive learning trains models to bring representations of similar (positive) pairs close together and push dissimilar (negative) pairs apart in embedding space. SimCLR, MoCo, and CLIP use contrastive objectives to learn powerful visual and multimodal representations without labels.
Q96. What is the AI alignment problem?
AI alignment refers to the challenge of ensuring AI systems behave in accordance with human values and intentions. As AI becomes more capable, misalignment between the system's objective and intended human goals could lead to harmful outcomes. Research areas include reward modeling, RLHF (RL from Human Feedback), and Constitutional AI.
Q97. What is reinforcement learning from human feedback (RLHF)?
RLHF is a technique used to align language models with human preferences. A reward model is trained on human preference data (which output is better), then the main model is fine-tuned using RL (PPO) to maximize the learned reward. ChatGPT and Claude were trained using variants of RLHF.
Q98. What is a recommendation system?
Recommendation systems predict user preferences for items based on past behavior and/or item attributes. Approaches include collaborative filtering (user-user or item-item similarity), content-based filtering (item attribute matching), and hybrid methods. Matrix factorization (SVD++) and neural collaborative filtering are popular deep learning approaches.
Q99. What is the cold start problem in recommendation systems?
The cold start problem occurs when a recommendation system has insufficient data for new users or items to make good recommendations. Solutions include asking for explicit preferences onboarding, using demographic data, popularity-based recommendations for new users, and content-based filtering for new items without interaction history.
Q100. What is a knowledge graph?
A knowledge graph is a structured representation of real-world entities and the relationships between them, stored as triples (subject, predicate, object). Examples include Google Knowledge Graph, Wikidata, and DBpedia. They power semantic search, question answering, and entity linking in modern NLP systems.
Q101. What is semantic search?
Semantic search understands the intent and contextual meaning of a query rather than matching keywords literally. It uses dense vector embeddings from models like BERT or sentence transformers to find semantically similar documents using vector similarity search (cosine similarity, FAISS, Pinecone).
Q102. What is an LLM (Large Language Model)?
An LLM is a neural language model with billions of parameters trained on massive text corpora using self-supervised objectives. LLMs exhibit emergent capabilities like few-shot learning, reasoning, code generation, and instruction following. Examples include GPT-4, PaLM 2, LLaMA 2, Claude, and Gemini.
Q103. What is prompt engineering?
Prompt engineering is the practice of designing effective input prompts to guide LLM behavior toward desired outputs. Techniques include zero-shot prompting, few-shot prompting (providing examples), chain-of-thought prompting (asking the model to reason step-by-step), and role prompting to establish context.
Q104. What is RAG (Retrieval-Augmented Generation)?
RAG combines retrieval and generation: given a query, relevant documents are retrieved from a vector database (using semantic search) and provided as context to an LLM for answer generation. This grounds LLM responses in factual retrieved content, reducing hallucinations and enabling knowledge updates without retraining.
Q105. What is AI hallucination?
AI hallucination refers to when a generative AI model produces confident-sounding but factually incorrect or entirely fabricated information. It occurs because LLMs optimize for plausible-sounding text rather than factual accuracy. Mitigation strategies include RAG, grounding, fine-tuning, and using tools like web search during generation.
Q106. What is the difference between AI training and inference?
Training is the process of learning model parameters from data by optimizing a loss function through many iterations — computationally expensive and done offline. Inference (prediction) applies trained model parameters to new inputs to generate outputs — must be fast and efficient for production deployment.
Q107. What is model quantization?
Quantization reduces model size and inference speed by representing weights and activations with lower precision (FP16, INT8, INT4 instead of FP32). Post-training quantization applies to already-trained models; quantization-aware training incorporates quantization error during training. It enables deployment of large models on edge devices.
Q108. What is AutoML?
AutoML (Automated Machine Learning) automates the end-to-end ML pipeline including data preprocessing, feature engineering, algorithm selection, and hyperparameter tuning. Tools like Google AutoML, H2O.ai, TPOT, and AutoKeras use techniques like neural architecture search (NAS) and Bayesian optimization to find optimal pipelines.
Q109. What is hyperparameter tuning?
Hyperparameter tuning searches for the best configuration of model parameters that are set before training (learning rate, number of layers, batch size). Methods include grid search (exhaustive), random search (stochastic), Bayesian optimization (model-guided), and early stopping to avoid training sub-optimal configurations.
Q110. What is data augmentation?
Data augmentation artificially expands the training dataset by applying transformations to existing samples. For images: rotation, flipping, cropping, color jitter, and mixup. For text: back-translation, synonym replacement, and paraphrasing. It improves model generalization and reduces overfitting, especially with limited training data.
Q111. What is a convolutional layer?
A convolutional layer applies learnable filters (kernels) to the input by sliding them across the input and computing dot products, producing feature maps that detect local patterns. Key parameters include filter size (3×3, 5×5), number of filters (depth), stride (step size), and padding (border handling).
Q112. What is pooling in CNNs?
Pooling layers reduce the spatial dimensions of feature maps, decreasing computation and providing translation invariance. Max pooling takes the maximum value in each window, preserving prominent features. Average pooling takes the mean. Global average pooling reduces each feature map to a single value for classification.
Q113. What is ResNet?
ResNet (Residual Network) introduced skip connections that add the input of a block directly to its output (identity shortcut), enabling training of very deep networks (up to 1000+ layers) by addressing the vanishing gradient problem. ResNet-50 and ResNet-101 remain popular backbone architectures for vision tasks.
Q114. What is YOLO in computer vision?
YOLO (You Only Look Once) is a real-time object detection algorithm that treats detection as a single regression problem, predicting bounding boxes and class probabilities from a full image in one pass. YOLOv8 and YOLO-NAS are latest versions offering improved accuracy-speed tradeoffs for production deployment.
Q115. What is semantic segmentation?
Semantic segmentation classifies each pixel in an image into a predefined class, producing a pixel-wise labeled map. Encoder-decoder architectures like U-Net (medical imaging), DeepLab (scene understanding), and FCN (Fully Convolutional Networks) are standard approaches. Dilated convolutions capture multi-scale context without losing resolution.
Q116. What is optical character recognition (OCR)?
OCR converts images of printed or handwritten text into machine-readable digital text. Modern OCR systems use CNNs for feature extraction and RNNs with CTC loss for sequence decoding. Tools like Tesseract, EasyOCR, and cloud OCR APIs (AWS Textract, Google Vision API) are widely used in document processing pipelines.
Q117. What is face recognition?
Face recognition identifies or verifies individuals from facial images using deep learning. The pipeline includes face detection, alignment, feature extraction (FaceNet, ArcFace), and matching. Distance metrics (L2, cosine) compare face embeddings. Applications include authentication, surveillance, and attendance systems.
Q118. What is speech recognition?
Speech recognition (ASR) converts spoken audio to text using acoustic and language models. Modern end-to-end systems like DeepSpeech, Wav2Vec 2.0, and Whisper use deep learning to transcribe speech directly. CTC (Connectionist Temporal Classification) loss enables learning from unsegmented audio without frame-level alignment.
Q119. What is text-to-speech (TTS)?
TTS converts text to natural-sounding speech. Modern neural TTS systems (Tacotron 2, FastSpeech 2) generate mel spectrograms from text, then convert them to audio waveforms using vocoders (WaveNet, HiFi-GAN). Voice cloning systems can replicate specific voices from a few seconds of reference audio.
Q120. What is diffusion model?
Diffusion models are generative models that learn to reverse a noising process. During training, Gaussian noise is progressively added to data; the model learns to denoise step-by-step. During generation, the model starts from pure noise and denoises to produce high-quality samples. Stable Diffusion and DALL-E 2/3 use diffusion for image generation.
Q121. What is multimodal AI?
Multimodal AI processes and generates content across multiple modalities — text, images, audio, and video — simultaneously. Models like GPT-4V, CLIP, Flamingo, and Gemini can reason across modalities. CLIP aligns image and text embeddings in a shared space using contrastive learning on image-caption pairs.
Q122. What is autonomous driving AI?
Autonomous driving AI integrates perception (object detection, lane detection using cameras and LiDAR), localization (HD maps, GPS fusion), prediction (trajectory forecasting of other agents), and planning (path planning, control) subsystems. Sensor fusion and end-to-end learning approaches are active research areas.
Q123. What is a Markov chain?
A Markov chain is a stochastic process where the next state depends only on the current state (Markov property). Markov chains model sequential processes in NLP (bigram/trigram language models), queueing theory, and reinforcement learning. MCMC (Markov Chain Monte Carlo) methods use Markov chains for probabilistic sampling.
Q124. What is hidden Markov model (HMM)?
An HMM is a statistical model representing a system with hidden states that generate observable outputs according to emission probabilities. The Viterbi algorithm decodes the most likely state sequence; Baum-Welch trains HMM parameters. HMMs were foundational for speech recognition and biological sequence analysis.
Q125. What is word2vec?
Word2Vec is a shallow neural network for learning word embeddings from text using two architectures: CBOW (predicts a word from surrounding context) and Skip-gram (predicts context from a word). Trained on large corpora, it captures semantic relationships: vector(king) - vector(man) + vector(woman) ≈ vector(queen).
Q126. What is a language model?
A language model (LM) assigns probabilities to sequences of words: P(w1, w2, ..., wn). N-gram LMs use conditional probabilities from historical n-grams. Neural LMs (RNN-LM, GPT) learn continuous representations. LMs are used for text generation, speech recognition, machine translation, and autocompletion.
Q127. What is machine translation?
Machine translation (MT) automatically translates text between languages. Statistical MT used phrase-based models. Neural MT (NMT) with sequence-to-sequence models and attention vastly improved quality. Transformer-based models (Google Translate, DeepL) achieve near-human quality for high-resource language pairs.
Q128. What is the difference between precision and recall?
Precision = TP/(TP+FP) measures what fraction of predicted positives are actually positive (minimizes false alarms). Recall = TP/(TP+FN) measures what fraction of actual positives are correctly detected (minimizes missed detections). The F1-score is their harmonic mean, balancing both metrics.
Q129. What is mean average precision (mAP) in object detection?
mAP is the standard evaluation metric for object detection. For each class, an average precision (AP) is computed as the area under the precision-recall curve at varying IoU thresholds. mAP averages AP across all classes. COCO mAP uses IoU thresholds from 0.5 to 0.95, making it more stringent than PASCAL VOC AP@0.5.
Q130. What is IoU (Intersection over Union)?
IoU measures the overlap between a predicted bounding box and a ground truth bounding box: IoU = Area of Intersection / Area of Union. An IoU ≥ 0.5 is typically considered a correct detection. IoU is also used in semantic segmentation as mean IoU (mIoU) to evaluate per-class pixel accuracy.
Q131. What is BLEU score?
BLEU (Bilingual Evaluation Understudy) is an automatic metric for evaluating machine translation quality by comparing n-gram overlaps between generated and reference translations. BLEU ranges from 0 to 1 (or 0-100). Despite its limitations (no semantic understanding), it remains the standard MT evaluation metric.
Q132. What is ROUGE score?
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) measures the quality of text summarization by computing recall of n-gram overlaps between generated and reference summaries. ROUGE-N measures n-gram recall, ROUGE-L measures longest common subsequence, and ROUGE-SU includes skip-bigrams.
Q133. What is zero-shot learning?
Zero-shot learning enables models to classify or reason about categories not seen during training by leveraging semantic descriptions or attribute vectors. In zero-shot classification, models generalize to new classes using textual descriptions. CLIP performs zero-shot image classification by matching image embeddings to class name embeddings.
Q134. What is few-shot learning?
Few-shot learning enables models to generalize from very few labeled examples per class (1-shot, 5-shot). Meta-learning approaches (Model-Agnostic Meta-Learning/MAML, Prototypical Networks) train models to quickly adapt to new tasks. LLMs perform few-shot learning via in-context learning from prompt examples.
Q135. What is catastrophic forgetting in neural networks?
Catastrophic forgetting occurs when a neural network trained on new tasks forgets previously learned knowledge. Continual/lifelong learning research addresses this with techniques like Elastic Weight Consolidation (EWC), progressive neural networks, replay mechanisms, and parameter-isolation methods.
Q136. What is active learning?
Active learning is an ML paradigm where the model queries a human oracle to label the most informative unlabeled examples, minimizing labeling cost. Query strategies include uncertainty sampling (most uncertain predictions), query by committee, and expected model change. It is valuable when labeled data is expensive to obtain.
Q137. What is the no free lunch theorem in ML?
The No Free Lunch theorem states that no single ML algorithm performs best on all possible problems — every algorithm's gains on some problems are offset by performance losses on others. This motivates trying multiple algorithms and selecting the best based on cross-validation performance for a specific task.
Q138. What is dimensionality reduction?
Dimensionality reduction projects high-dimensional data into lower-dimensional representations. Linear methods (PCA, LDA) find linear projections. Non-linear methods (t-SNE, UMAP) preserve local neighborhood structure for visualization. Autoencoders learn compact latent representations. Reduction combats the curse of dimensionality and improves model efficiency.
Q139. What is the curse of dimensionality?
The curse of dimensionality refers to phenomena that arise in high-dimensional spaces where data becomes increasingly sparse, distances become uniform, and models require exponentially more data to generalize. Dimensionality reduction, feature selection, and regularization are strategies to mitigate its effects.
Q140. What is an autoencoder?
An autoencoder is an unsupervised neural network that learns compressed representations (encoding) of input data by training to reconstruct the input from its bottleneck representation. Variants include denoising autoencoders, sparse autoencoders, variational autoencoders (VAEs), and contractive autoencoders for different representation learning goals.
Q141. What is the bias-variance tradeoff?
The bias-variance tradeoff describes the balance between underfitting (high bias — model too simple) and overfitting (high variance — model too complex). Total expected error = Bias² + Variance + Irreducible Noise. Increasing model complexity reduces bias but increases variance; regularization and ensemble methods help balance both.
Q142. What is ensemble learning?
Ensemble learning combines predictions from multiple models to produce better generalization than any single model. Methods include bagging (Random Forest), boosting (XGBoost, LightGBM, AdaBoost), stacking (meta-learner combining base model outputs), and voting classifiers. Diverse, independent models produce the best ensembles.
Q143. What is XGBoost?
XGBoost (Extreme Gradient Boosting) is a highly efficient gradient boosting framework that builds trees sequentially, each correcting residual errors of the previous. It uses second-order Taylor expansion for loss optimization, L1/L2 regularization, column subsampling, and parallel tree building, making it a top performer on tabular data.
Q144. What is LightGBM?
LightGBM is Microsoft's gradient boosting framework that uses leaf-wise (best-first) tree growth instead of level-wise, reducing training time and memory. It uses Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) to speed up training on large datasets while maintaining competitive accuracy.
Q145. What is AI in healthcare?
AI in healthcare applications include medical image analysis (tumor detection in CT/MRI scans), drug discovery (molecular property prediction, protein folding with AlphaFold), clinical decision support, EHR analysis, personalized medicine, robotic surgery assistance, and pandemic response modeling (epidemiological forecasting).
Q146. What is AI ethics?
AI ethics covers principles for developing and deploying AI responsibly: fairness (avoiding discrimination), transparency (explainability), accountability (who is responsible for AI decisions), privacy (data protection), safety (avoiding harmful outputs), and sustainability (environmental impact of training large models).
Q147. What is differential privacy in AI?
Differential privacy is a mathematical framework for protecting individual data privacy in ML. By adding carefully calibrated random noise to data or gradients, it ensures that the inclusion or exclusion of any single individual's data has negligible effect on model outputs. DP-SGD enables differentially private neural network training.
Q148. What is a vector database?
A vector database stores, indexes, and queries high-dimensional embedding vectors efficiently using approximate nearest neighbor (ANN) search algorithms (HNSW, IVF). Examples include Pinecone, Weaviate, Qdrant, Milvus, and pgvector. They are essential infrastructure for RAG systems and semantic search applications.
Q149. What is LangChain?
LangChain is an open-source framework for building LLM-powered applications with composable chains of prompts, memory, tools, and agents. It provides abstractions for RAG, document loaders, vector stores, LLM wrappers, and agent orchestration, accelerating the development of chatbots, QA systems, and AI workflows.
Q150. What is an AI agent in the context of LLMs?
An LLM-based AI agent uses a language model as its reasoning engine, equipped with tools (web search, code execution, databases) and memory to autonomously complete multi-step tasks. ReAct (Reasoning + Acting) is a prompting pattern where the agent alternates between reasoning steps and tool-calling actions.
Advanced Questions (151-200)
Q151. What is the transformer positional encoding?
Positional encoding adds information about the position of tokens in a sequence since transformers have no inherent notion of order. Sinusoidal positional encodings use sine and cosine functions of different frequencies. Learned positional embeddings (used in BERT, GPT) are an alternative. RoPE (Rotary Position Embedding) and ALiBi are newer relative position encoding methods.
Q152. What is the context window in LLMs?
The context window is the maximum number of tokens an LLM can process in a single inference. It limits the amount of text the model can 'see' at once. Extending context windows (GPT-4 Turbo: 128K, Claude: 200K) enables processing long documents. Techniques like sliding window attention, sparse attention, and RoPE scaling help extend context length.
Q153. What is fine-tuning in LLMs?
Fine-tuning adapts a pre-trained LLM to a specific downstream task or domain by continuing training on task-specific data. Full fine-tuning updates all parameters (expensive). Parameter-efficient fine-tuning (PEFT) methods like LoRA, Adapter layers, and Prefix tuning update only a small fraction of parameters while achieving competitive performance.
Q154. What is LoRA (Low-Rank Adaptation)?
LoRA is a PEFT method that freezes pre-trained model weights and injects trainable low-rank decomposition matrices into transformer layers. Instead of updating a weight matrix W, it learns two small matrices A and B such that ΔW = BA. This dramatically reduces trainable parameters (e.g., 0.1% of full model) while achieving near-full fine-tuning performance.
Q155. What is neural architecture search (NAS)?
NAS automates the design of neural network architectures by searching the space of possible architectures for the best-performing one for a given task. Methods include reinforcement learning-based NAS (Zoph & Le), evolutionary algorithms, and differentiable NAS (DARTS). NAS discovered EfficientNet and MobileNet architectures.
Q156. What is graph neural network (GNN)?
GNNs process graph-structured data by iteratively aggregating information from neighboring nodes. Key architectures include GCN (Graph Convolutional Network), GAT (Graph Attention Network), and GraphSAGE. Applications include molecular property prediction, social network analysis, recommendation systems, and knowledge graph completion.
Q157. What is a mixture of experts (MoE)?
MoE is a neural network architecture where input is routed to a subset of 'expert' sub-networks by a gating mechanism, allowing much larger total model capacity with similar compute. GPT-4, Gemini, and Mistral use sparse MoE layers where each token activates only 2-8 experts out of hundreds, enabling trillion-parameter-scale models.
Q158. What is causal AI?
Causal AI moves beyond correlation to understand cause-and-effect relationships. Causal models (Structural Causal Models, DAGs) enable interventional reasoning (what happens if I do X?) and counterfactual reasoning (what would have happened if X had not occurred?). Judea Pearl's do-calculus formalizes causal inference in AI systems.
Q159. What is neuro-symbolic AI?
Neuro-symbolic AI combines neural network learning with symbolic reasoning to achieve the benefits of both: data-driven pattern recognition from neural networks and interpretable, compositional reasoning from symbolic systems. Approaches include Logic Tensor Networks, Neural Theorem Provers, and AlphaGeometry for mathematical reasoning.
Q160. What is AI model evaluation at scale?
Evaluating large AI models involves benchmarks like MMLU (multitask language understanding), HellaSwag (commonsense reasoning), HumanEval (code generation), and MT-Bench (conversational quality). Chatbot Arena (LMSYS) uses human preference voting for comparative LLM evaluation at scale.
Q161. What is catastrophic interference in deep learning?
Catastrophic interference (forgetting) in deep learning occurs when training a neural network on new tasks disrupts its performance on previously learned tasks by overwriting synaptic weights. Continual learning solutions include replay buffers (retaining samples from old tasks), EWC (penalizing changes to important weights), and progressive neural network architectures.
Q162. What is neural scaling law?
Neural scaling laws (Kaplan et al., 2020) describe power-law relationships between model performance and scale factors: model size (parameters), dataset size (tokens), and compute budget. These laws predict that larger models trained on more data with more compute consistently improve, guiding the design of frontier models.
Q163. What is chain-of-thought (CoT) prompting?
Chain-of-thought prompting elicits step-by-step reasoning from LLMs by including reasoning examples in the prompt. 'Let's think step by step' (zero-shot CoT) or providing explicit reasoning traces (few-shot CoT) enables models to solve complex multi-step problems in math, logic, and commonsense reasoning more accurately.
Q164. What is tool use in AI agents?
Tool use allows AI agents to call external APIs, execute code, query databases, and browse the web to extend their capabilities beyond static knowledge. Function calling in GPT-4 and Claude allows models to output structured JSON to invoke predefined tools. ReAct, Toolformer, and OpenAI Assistants API implement tool-augmented reasoning.
Q165. What is AI safety research?
AI safety research addresses risks from increasingly capable AI systems. Key research areas include robustness (adversarial attacks), scalable oversight (supervising AI systems smarter than humans), interpretability (understanding model internals), specification gaming prevention, and corrigibility (ensuring AI systems accept human correction).
Q166. What is model interpretability vs explainability?
Interpretability refers to the degree to which humans can understand the internal mechanics of a model (inherently interpretable models like decision trees and linear models). Explainability provides post-hoc explanations for black-box model predictions using tools like SHAP, LIME, integrated gradients, and concept activation vectors (TCAV).
Q167. What is an adversarial attack on AI models?
Adversarial attacks craft imperceptible perturbations to inputs that cause AI models to make incorrect predictions with high confidence. FGSM (Fast Gradient Sign Method), PGD, and Carlini-Wagner attacks target image classifiers. Certified defenses (randomized smoothing, adversarial training) provide robustness guarantees.
Q168. What is model distillation vs pruning vs quantization?
Distillation transfers knowledge from a large teacher to a smaller student model. Pruning removes weights or neurons below a magnitude threshold, creating sparse models. Quantization reduces numerical precision of weights. All three compress models for efficient deployment, and they are often applied together (e.g., DistilBERT uses distillation).
Q169. What is AI deployment on edge devices?
Edge AI deploys models directly on devices (phones, IoT sensors, embedded systems) rather than in the cloud, reducing latency, bandwidth, and privacy risks. Frameworks like TensorFlow Lite, ONNX Runtime, and PyTorch Mobile enable deployment, while quantization, pruning, and NAS produce models meeting tight edge hardware constraints.
Q170. What is multi-task learning in deep learning?
Multi-task learning (MTL) trains a single model on multiple related tasks simultaneously, sharing representations across tasks. This improves generalization through task regularization and enables efficient inference with one model. T5 and MT-DNN demonstrate MTL for NLP; MTI-Net demonstrates MTL for computer vision tasks.
Q171. What is gradient checkpointing?
Gradient checkpointing (activation checkpointing) trades compute for memory by not storing all intermediate activations during the forward pass, recomputing them during backpropagation. This reduces GPU memory usage proportionally to the square root of network depth, enabling training of larger models within a fixed memory budget.
Q172. What is model parallelism?
Model parallelism distributes large model layers across multiple GPUs/TPUs when a model is too large for a single device. Tensor parallelism splits individual layer computations across devices. Pipeline parallelism assigns different layers to different devices. Megatron-LM and DeepSpeed implement efficient hybrid parallelism for training models with hundreds of billions of parameters.
Q173. What is the role of the Transformer decoder in generative models?
The Transformer decoder generates sequences autoregressively by attending to previously generated tokens (causal/masked self-attention) and optionally to encoder outputs (cross-attention in encoder-decoder models). GPT-style decoder-only models generate text by predicting the next token based on all preceding context.
Q174. What is the perplexity metric for language models?
Perplexity measures how well a language model predicts a test corpus: PP(W) = P(w1...wN)^(-1/N). Lower perplexity indicates better model performance. It equals 2^(cross-entropy loss in bits). Perplexity is used to compare LMs trained on the same data distribution but does not directly measure downstream task performance.
Q175. What is constitutional AI?
Constitutional AI (CAI) is Anthropic's approach to AI alignment where a set of principles (a 'constitution') guides model behavior. The model critiques and revises its own outputs against the constitution using RLAIF (RL from AI Feedback) instead of solely human feedback, reducing reliance on human labelers for harmlessness training.
Q176. What is world model in AI?
A world model is an internal representation of the environment that an AI agent uses to simulate and predict outcomes of actions without directly interacting with the world. Dyna-Q, DreamerV3, and MuZero use learned world models for efficient planning. World models are central to model-based RL and cognitive AI architectures.
Q177. What is the Bellman equation in reinforcement learning?
The Bellman equation expresses the value of a state as the expected immediate reward plus the discounted value of the next state: V(s) = R(s) + γ * Σ P(s'|s,a) * V(s'). It is the foundation for value-based RL algorithms including value iteration, policy iteration, Q-learning, and deep Q-networks.
Q178. What is policy gradient reinforcement learning?
Policy gradient methods directly optimize the policy π(a|s) by computing gradients of expected cumulative reward with respect to policy parameters. REINFORCE, Actor-Critic (A2C, A3C), PPO (Proximal Policy Optimization), and SAC (Soft Actor-Critic) are key policy gradient algorithms used in game playing, robotics, and LLM alignment.
Q179. What is PPO (Proximal Policy Optimization)?
PPO is a policy gradient RL algorithm that constrains policy updates to prevent destabilizing large changes. It uses a clipped surrogate objective that limits the probability ratio between new and old policies within a trust region [1-ε, 1+ε]. PPO is sample-efficient, stable, and widely used in robotics, game AI, and RLHF for LLM alignment.
Q180. What is symbolic AI vs connectionist AI?
Symbolic AI (GOFAI) manipulates explicit symbols and rules to represent knowledge and perform reasoning — interpretable but brittle and requiring manual knowledge engineering. Connectionist AI (neural networks) learns distributed representations from data — flexible and scalable but less interpretable. Modern AI increasingly blends both (neuro-symbolic AI).
Q181. What is AI for drug discovery?
AI accelerates drug discovery through molecular property prediction (graph neural networks on molecular graphs), de novo drug design (generative models for novel molecule generation), protein structure prediction (AlphaFold 2 achieved near-experimental accuracy), virtual screening, and clinical trial optimization through patient stratification.
Q182. What is AlphaFold?
AlphaFold 2, developed by DeepMind, predicts protein 3D structure from amino acid sequence with near-experimental accuracy, solving a 50-year grand challenge in biology. It uses a novel neural architecture (Evoformer) combining multiple sequence alignment with pair representation processing. AlphaFold DB has predicted structures for ~200 million proteins.
Q183. What is AI for code generation?
AI code generation models (GitHub Copilot, CodeWhisperer, StarCoder, Codex) are LLMs fine-tuned on large code corpora that autocomplete, generate, explain, and debug code from natural language descriptions or partial code. HumanEval and MBPP benchmarks measure functional correctness of generated code.
Q184. What is the AI regulation landscape?
AI regulation frameworks include the EU AI Act (risk-based categorization: unacceptable, high, limited, minimal risk), the US AI Executive Order (safety testing for frontier models), China's generative AI regulations, and sector-specific guidance (FDA for medical AI, FFIEC for financial AI). Voluntary frameworks include NIST AI RMF.
Q185. What is responsible AI development?
Responsible AI development encompasses principles and practices ensuring AI systems are safe, fair, transparent, accountable, and privacy-preserving throughout their lifecycle. It includes ethics review boards, impact assessments, bias auditing, red-teaming (adversarial testing), model cards for documentation, and ongoing monitoring in deployment.
Q186. What is AI for climate change?
AI applications for climate change include weather and climate forecasting (Google DeepMind's GraphCast), renewable energy optimization (wind/solar output prediction), smart grid management, climate model emulation for rapid simulation, carbon footprint optimization in logistics, and species distribution modeling for biodiversity conservation.
Q187. What is the AI 'race to the top' concern?
The 'race to the top' (or race dynamics) concern refers to competitive pressures where AI developers may prioritize speed over safety to gain market advantage. This could lead to deployment of insufficiently tested systems. Governance frameworks, compute governance, and international AI safety coordination aim to address this risk.
Q188. What is symbolic regression?
Symbolic regression automatically discovers mathematical equations that best fit data by searching the space of mathematical expressions. Genetic programming and neural methods (EQL, AI Feynman, SRBench) evolve or learn symbolic formulas. It produces interpretable models and has rediscovered fundamental physics equations from data.
Q189. What is test-time compute scaling in AI?
Test-time compute scaling (inference-time scaling) improves model performance by spending more computation at inference time rather than only at training time. Techniques include chain-of-thought reasoning, best-of-N sampling, tree-of-thought search, and o1/o3-style 'thinking' tokens that allow models to reason longer for harder problems.
Q190. What is sparse attention in transformers?
Sparse attention reduces the O(n²) computational complexity of full self-attention by restricting attention to subsets of tokens. Longformer uses local sliding window + global attention; BigBird combines random, window, and global attention. FlashAttention optimizes dense attention through memory-efficient IO-aware computation on GPU.
Q191. What is in-context learning?
In-context learning (ICL) allows LLMs to adapt to new tasks by conditioning on a few examples provided in the prompt at inference time, without gradient updates. Unlike fine-tuning, ICL requires no parameter changes. GPT-3 demonstrated that larger models exhibit stronger ICL capabilities as an emergent behavior.
Q192. What is AI model safety evaluation?
Safety evaluation tests AI models for harmful outputs, jailbreaking vulnerabilities, bias, and factual accuracy. Methods include red-teaming (adversarial prompt testing), automated benchmarks (TruthfulQA, ToxiGen, BBQ), constitutional testing, and human evaluation of helpfulness, harmlessness, and honesty (HHH).
Q193. What is the AI inference optimization stack?
Inference optimization involves: hardware (GPU/TPU/NPU selection), model compression (quantization, pruning, distillation), batching strategies (continuous batching, dynamic batching), serving frameworks (TensorRT, vLLM, TGI), caching (KV-cache for LLMs), and speculative decoding (using a smaller draft model to propose tokens for the main model).
Q194. What is synthetic data generation in AI?
Synthetic data is artificially generated data used to train AI models when real data is scarce, expensive, or privacy-sensitive. Techniques include GANs, VAEs, diffusion models, LLM-based generation (Self-Instruct, Alpaca), and physics-based simulation. Phi-1, Phi-2, and Mistral-based models demonstrate that high-quality synthetic data can train competitive small models.
Q195. What is speculative decoding for LLM inference?
Speculative decoding accelerates LLM generation by using a small, fast 'draft' model to propose multiple tokens in parallel, then having the large 'verification' model accept or reject them in a single forward pass. When the draft model is accurate, this achieves 2-3x speedup with identical output distribution to the original model.
Q196. What is the softmax function and its role in AI?
Softmax converts a vector of raw scores (logits) into a probability distribution where all values sum to 1: softmax(zi) = exp(zi) / Σexp(zj). It is used in multi-class classification output layers, attention score normalization in transformers, and policy distributions in RL. Temperature scaling controls output distribution sharpness.
Q197. What is model governance in enterprise AI?
Model governance manages the lifecycle of AI models in organizations: documentation (model cards, datasheets), version control, performance monitoring, drift detection, audit trails, access controls, and decommissioning policies. MLflow, Weights & Biases, and SageMaker Model Registry provide model governance infrastructure.
Q198. What is the Kolmogorov complexity perspective on intelligence?
From an algorithmic information theory perspective, intelligence can be measured by the ability to compress and predict. Solomonoff induction defines optimal Bayesian prediction using Kolmogorov complexity. AIXI is a theoretical model of optimal intelligence combining Solomonoff prediction with RL decision-making, though it is computationally intractable.
Q199. What is AI-driven scientific discovery?
AI is accelerating scientific discovery across domains: AlphaFold for structural biology, GNoME for materials discovery (predicted 2.2M new crystal structures), AI for fusion energy control (DeepMind's tokamak plasma control), AI for mathematics (FunSearch, AlphaProof), and drug-target interaction prediction using graph neural networks.
Q200. What are the key challenges in achieving AGI?
Key challenges toward AGI include: sample efficiency (learning from few examples), common sense reasoning (physical and social intuition), compositional generalization (combining concepts in novel ways), causal understanding (not just correlation), robust transfer across domains, long-horizon planning, and ensuring safe behavior under open-ended goals.


