The course teaches students comprehensive and specialised subjects in computer science; it teaches students cutting edge engineering skills to solve real-world problems using computational thinking and tools, as well as soft skills in communication, collaboration, and project management that enable students to succeed in real-world business environments. Most of this program is case (or) project-based where students learn by solving real-world problems end to end. This program has core courses that focus on computational thinking and problems solving from first principles. The core courses are followed by specialization courses that teach various aspects of building real-world systems. This is followed by more advanced courses that focus on research level topics, which cover state of the art methods. The program also has a capstone project at the end, wherein students can either work on building end to end solutions to real world problems (or) work on a research topic. The program also focuses on teaching the students the “ability to learn” so that they can be lifelong learners constantly upgrading their skills. Students can choose from a spectrum of courses to specialize in a specific sub-area of Computer Science like Artificial Intelligence and Machine Learning, Cloud Computing, Software
Engineering, or Data Science, etc.
This is a core and foundational course which aims to equip the student with the ability to model, design, implement and query relational database systems for real- world data storage & processing needs. Students would start with diagrammatic tools (ER-diagram) to map a real world data storage problem into entities, relationships and keys. Then, they learn to translate the ER-diagram into a relational model with tables. SQL is then introduced as a de facto tool to create, modify, append, delete, query and manipulate data in a relational database. Due to SQL’s popularity, the course spends considerable time building the ability to write optimized and complex queries for various data manipulation tasks. The module exposes students to various real world SQL examples to build solid practical knowledge. Students then move on to understanding various trade-offs in modern relational databases like the ones between storage space and latency. Designing a database would need a solid understanding of normal forms to minimize data duplication, indexing for speedup and flattening tables to avoid complex joins in low-latency environments. These real-world database design strategies are discussed with practical examples from various domains. Most of this course uses the opensource MySQL database and cloud-hosted relational databases (like Amazon RDS) to help students apply the concepts learned on real databases via assignments.
The ability to solve problems is a skill, and just like any other skill, the more one practices, the better one gets. So how exactly does one practice problem solving? Learning about different problem-solving strategies and when to use them will give a good start. Problem solving is a process. Most strategies provide steps that help you identify the problem and choose the best solution. Building a toolbox of problem-solving strategies will improve problem solving skills. With practice, students will be able to recognize and choose among multiple strategies to find the most appropriate one to solve complex problems. The course will focus on developing problem-solving strategies such as abstraction, modularity, recursion, iteration, bisection, and exhaustive enumeration. The course will also introduce arrays and some of their real-world applications, such as prefix sum, carry forward, subarrays, and 2-dimensional matrices. Examples will include industry-relevant problems and dive deeply into building their solutions with various approaches, recognizing each’s limitations (i.e when to use a data structure and when not to use a data structure). By the end of this course a student can come up with the best strategy which can optimize both time and space complexities by choosing the best data structure suitable for a given problem
This course helps students translate advanced mathematical/statistical/scientific concepts into code. This is a module for writing code to solve real-world problems. It introduces programming concepts (such as control structures, recursion, classes and objects) assuming no prior programming knowledge, to make this course accessible to advanced professionals from scientific fields like Biology, Physics, Medicine, Chemistry, Civil & Mechanical Engineering etc. After building a strong foundation for converting scientific knowledge into programming concepts, the course advances to dive deeply into Object-Oriented Programming and its methodologies. It also covers when and how to use inbuilt-data structures like 1-Dimensional and 2-Dimensional Arrays before introducing the concepts of computational complexity to help students write optimized code using appropriate data structures and algorithmic design methods. The module can be taught to allow students to learn these concepts using a modern programming language such as Java or Python. The course offers students the ability to identify and solve computer programming problems in scientific fields
at a graduate level.
Mathematics and computer science are closely related fields. Problems in computer science are often formalized and solved with mathematical methods. It is likely that many important problems currently facing computer scientists will be solved by researchers skilled in algebra, analysis, combinatorics, logic, and/or probability theory, as well as computer science. This course covers discrete mathematics for computer science and engineering. Topics may include asymptotic notation and growth of functions; permutations and combinations; counting principles; discrete probability. Further selected topics may also be covered, such as recursive definition and structural induction; state machines and invariants; recurrences; generating functions. Students will be able to explain and apply the basic methods of discrete (noncontinuous) mathematics in computer science. They will be able to use these methods in subsequent courses in the design and analysis of algorithms, computability theory, software engineering, and computer systems. The focus of the course is real-world problems and applications often found in business and industry.
This course teaches students how to analyse the ways users engage with a service. This method, called product analytics, helps businesses track and analyse user data. Students will learn more deeply what is required to move a product from idea to implementation, through to launch, and then on to iterative improvements. The course teaches how to measure progress, validate or update product hypotheses, and present product learnings. Also, students will gain experience in making informed decisions, as well as how to present findings and make an analytics-informed business case to win support for a product
This advanced graduate class addresses a unique topic on a rotating basis in order to keep the program at the forefront of scholarly research and industry practice. Every year the academic staff member will approve of a new topic to be covered. The bibliography will contain not less than 8 peer-reviewed articles or scholarly publications reflecting the current topic. Though the exact topic will vary, the emphasis of this module is practical, domain- specific issues in data science. Topics might include data handling, big data management systems, optimization, sparse signal recovery, principal component analysis, or deeper explorations of text mining, natural language processing, computer vision, or other topics introduced in other modules. Often, Further Studies in Data Science and Data Analytics will extend, complicate, or otherwise deepen the topic taken on in its predecessor course, Studies in Data Science and Data Analytics, giving students who elect this sequence to develop genuine expertise in a specific domain
This advanced graduate class addresses a unique topic on a rotating basis in order to keep the program at the forefront of scholarly research and industry practice. Every year the academic staff member will approve of a new topic to be covered. The bibliography will contain not less than 8 peer-reviewed articles or scholarly publications reflecting the current topic. Though the exact topic will vary, the emphasis of this module is practical, domain- specific issues in data science. Topics might include data handling, big data management systems, optimization, sparse signal recovery, principal component analysis, or deeper explorations of text mining, natural language processing, computer vision, or other topics introduced in other modules
This course focuses on building basic classification and regression models and understanding these models rigorously both with a mathematical and an applicative focus. It opens with a basic introduction to high dimensional geometryof points, distance-metrics, hyperplanes and hyperspheres. Then, it introduces the mathematical formulation of logistic regression to find a separating hyperplane. Vector calculus and gradient descent (GD)-based algorithms are explored to learn to solve the optimization problem, including computational variations of GD like mini-batch and stochastic gradient descent. The course also covers other popular classification and regression methods like k-Nearest Neighbours, Naive Bayes, Decision Trees, Linear Regression etc, to show how each of these techniques performs under various real-world situations like the presence of outliers, imbalanced data, multi class classification etc. Lectures on bias and variance tradeoff and various techniques to avoid overfitting and underfitting are incorporated. Algorithms are taught from a Bayesian viewpoint along with geometric intuition. This course would be heavily hands-on where students apply all these classical techniques to real world problems
builds on introductory programming courses to illustrate object-oriented programming concepts, database design in Python, and the basics of Machine Learning with Python libraries. Students will learn how to solve problems in Python, develop design patterns in Python code, develop internet applications with Python, and collaborate with other students to implement projects. The course introduces advanced features such as decorators and generators, as well as a thorough exploration of the Python development environment. This course is designed to prepare students for an entry-level developer position.
This module focuses on representing statistical techniques in code, and may be
conducted in Python, R, or another relevant language. Such languages provide
libraries that can handle a wide variety of statistical techniques like linear and
nonlinear modeling, classical statistical tests, time-series analysis, classification,
clustering and graphical techniques, and is highly extensible.
Learning to work in statistically-oriented programming language environments can
equip you with the following skills among many others:
An effective way of data handling (using arrays for example) and storing
data in a structured manner.
Expertise in diverse tools and libraries for Data Analysis
Ability to present complex data in a graphical and visual format for easy
understanding of the data and further solutions.
A distributed system is an application that executes a collection of protocols to coordinate the actions of multiple processes on a network, such that all components cooperate together to perform a single or small set of related tasks. Goals of a Distributed System: ● Transparency -> End user does not know what lies behind and how the system is working internally. ● Scalability - > Refers to the growth of the system. ● Availability -> Refers to the system's uptime. The module will carefully examine three case studies, with attention to such topics as: ● Basics of High Level System Design and consistent Hashing ● Caching ● CAP Theorem ● Replication and Master-Slave
● NoSQL
● Differences between SQL and NoSQL
● Multi Master
● Apache Zookeeper & Apache Kafka
● Case Study on ElasticSearch
● AWS S3 and Quad Trees
● Design Distributed Crawler
● Microservices and Containerisation
● Hotstar & IRCTC System design
This core course equips the student with knowledge of database management systems, operating systems and computer networks. At the end of the course, students will have a critical understanding of the architecture of computers and networks, as well has how programs interact with these. Students begin with mapping data storage problems (as they had done in Relational Databases) to understand how data is stored in a distributed network, and related issues such as concurrency. Subsequently, students cover operating systems with an overview of process scheduling, process synchronisation and memory management techniques with disk scheduling. The module concludes with computer networks, where we will be discussing all of the computer network layers and their protocols in detail.
This course is aimed to build a strong foundational knowledge of data structures (DS) used extensively in computing. The module starts with introducing time and space complexity notations and estimation for code snippets. This helps students be able to make trade-offs between various Data Structures while solving real world computational problems. The module introduces most widely used basic data structures like Dynamic arrays, multi-dimensional arrays, Lists, Strings, Hash Tables, Binary Trees, Balanced Binary Trees, Priority Queues and Graphs. The module discusses multiple implementation variations for each of the above data-structures along with trade-offs in space and time for each implementation. In this course, students implement these data-structures from scratch to gain a solid understanding of their inner workings. Students are also introduced to how to use the built-in data-structures available in various programming languages/libraries like Python/NumPy/C++ STL/Java/JavaScript. Students solve real-world problems where they must use an optimal DS to solve a computational problem at hand.
This is a foundational and mandatory course which aims to build student’s ability to apply various algorithmic design methods to provide an optimal solution to computational problems. This course starts with time and space complexity analysis of divide and conquer algorithms using recursion-tree based methods and Master’s theorem. Students would also learn about amortized time and space complexity analysis for randomized/probabilistic algorithms. Various algorithmic design strategies would be introduced via real world examples and problems. Students would learn when, where and how to optimally use Divide and Conquer, Dynamic programming (top-down and button-up), Greedy, Backtracking and Randomization strategies with examples. The module uses various practical examples from Array manipulations, Sorting, Searching, String manipulations, Tree & Graphs traversals, Graph path-finding, Spanning Trees etc., to introduce the above algorithmic strategies in action. Students would implement many of the above algorithmic design methods from scratch as part of the assignments. The module also introduces how some of these popular algorithms are readily available via popular libraries in various programming languages
This is a foundational course on building server-side (or backend) applications using popular JavaScript runtime environments like Node.js. Students will learn event driven programming for building scalable backend for web applications. The module teaches various aspects of Node.js like setup, package manager, client- server programming and connecting to various databases and REST APIs. Most of these concepts would be covered in a hands-on manner with real world examples and applications built from scratch using Node.js on Linux servers. This course also provides an introduction to Linux server administration and scripting with special focus on web-development and networking. Students learn to use Linux monitoring tools (like Monit) to track the health of the servers. The module also provides an introduction to Express.js which is a popular light-weight framework for Node.js applications. Given the practical nature of this course, this would involve building actual website backends via assignments/projects for ecommerce, online learning and/or photo-sharing.
This course builds upon the introductory JavaScript course to acquaint students of popular and modern frameworks to build the front end. We focus on three very popular frameworks/libraries in use: React.js, jQuery and AngularJS. We start with React.js, one of the most popular and advanced ones amongst the three. students learn various components and data flow to learn to architect real world front end using React.js. This would be achieved via multiple code examples and code-walkthroughs from scratch. We would also dive into React Native which is a cross platform Framework to build native mobile and smart-TV apps using JavaScript. This helps students to build applications for various platforms using only JavaScript. jQuery is one of the oldest and most widely used JavaScript libraries, which students cover in detail. Students specifically focus on how jQuery can simplify event handling, AJAX, HTML DOM tree manipulation and create CSS animations. We also provide a hands-on introduction to AngularJS to architect model-view-controller (MVC) based dynamic web pages.
This course focuses on modelling sequences (text, music, time-series, genes) using deep-learning models. We start with a simple Recurrent Neural Network and its limitations with long-sequences. Students learn LSTMs and GRUs which can handle significantly longer sequences to model sequence data like text, music, gene-sequences and time-series data. We study variations of LSTM like bi-directional LSTMs and encoder-decoder architectures. This is followed by a detailed study of attention mechanism and Transformer based models which are currently the state-of-the-art for NLP and sequence modelling. The module teaches encoder-decoder Transformers, BERT, BERT-variations, GPT-1,2 &3 models from both the architectural and mathematical viewpoints and also a practical viewpoint. Students learn to implement many of these complex models from scratch (using TensorFlow 2 and Keras) to gain a deeper understanding of how they work internally. Students will study popular applications of deep-learning in NLP like parts-of-speech tagging, question-answering systems, conversational engines (chatbots), Semantic search with low-latency etc. For each of these problems, Students will study cutting edge deep-learning models along with code implementations.
This course provides a comprehensive overview of Computer vision problems and how they can be tackled using various Convolutional Neural networks (CNNs). Students start with classical image processing operations like edge detection, convolution, shape detectors and colour space conversions. This is followed by a foundational understanding of Deep-Convolutional Neural networks and how their training and evaluation works. We introduce various CNN specific layers like pooling-layers and upsampling layers. We also introduce various Data Augmentation techniques that are very helpful for image-related problems. This is followed by a dive deep into the internals of popular CNN architectures like: AlexNet, VGGNet, ResNet etc. Students also learn how to use these methods practically for transfer learning. Students will study how various computer-vision related tasks like image segmentation, image-generation, object detection and localization, contrastive learning etc., can be performed using state of the art algorithms for each of these tasks. Most of these techniques would be studied directly from the original research papers and open-source code provided by the authors. Students would also implement some of these algorithms from scratch in this course.
This course provides a strong mathematical and applicative introduction to Deep Learning. The module starts with the perceptron model as an over simplified approximation to a biological neuron. We motivate the need for a network of neurons and how they can be connected to form a Multi Layered Perceptron (MLPs). This is followed by a rigorous understanding of back-propagation algorithms and its limitations from the 1980s. Students study how modern deep learning took off with improved computational tools and data sets. We teach more modern activation units (like ReLU and SeLU) and how they overcome problems with the more classical Sigmoid and Tanh units. Students learn weight initialization methods, regularization by dropouts, batch normalization etc., to ensure that deep MLPs can be successfully trained. The module teaches variants of Gradient Descent that have been specifically designed to work well for deep learning systems like ADAM, AdaGrad, RMSProp etc. Students also learn AutoEncoders, VAEs and Word2Vec as unsupervised, encoding deep-learning architectures. We apply all of the foundational theory learned to various real world problems using TensorFlow 2 and Keras. Students also understand how TensorFlow 2 works internally with specific focus on computational graph processing
This course is aimed to help learners understand various techniques and algorithms to visualize, analyse and understand high dimensional data which is very common in Data Science and ML. The module starts with linear algebraic methods like Principal Component Analysis (PCA) and SVD (Singular Value Decomposition) for obtaining linear projection of high dimensional data. This is followed by more advanced nonlinear and state of the art techniques like t-SNE and UMAP for visualizing high dimensional data. Each of these techniques would be covered in full mathematical detail from first principles along with applying them to real world datasets in NLP, Genomics and internet-datasets. Students will also study how PCA and SVD are related to general Matrix Factorization techniques. To analyse and understand high dimensional un-labelled data, students learn clustering techniques like K-Means, Gaussian Mixture models, Hierarchical Clustering and DBSCAN. The modules show how some of the techniques are mathematically related to Matrix Factorization. Students study various outlier detection techniques based on density, proximity, factorization and cluster analysis.
This course focuses on building basic classification and regression models and understanding these models rigorously both with a mathematical and an applicative focus. The module starts with a basic introduction to high dimensional geometry of points, distance-metrics, hyperplanes and hyperspheres. We build on top this to introduce the mathematical formulation of logistic regression to find a separating hyperplane. Students learn to solve the optimization problem using vector calculus and gradient descent (GD) based algorithms. The module introduces computational variations of GD like mini-batch and stochastic gradient descent. Students also learn other popular classification and regression methods like k-Nearest Neighbours, NaiveI Bayes, Decision Trees, Linear Regression etc. Students also learn how each of these techniques under various real world situations like the presence of outliers, imbalanced data, multi class classification etc. Students learn bias and variance trade-off and various techniques to avoid overfitting and underfitting. Students also study these algorithms from a Bayesian viewpoint along with geometric intuition. This module is hands-on and students apply all these classical techniques to real world problems.
This course helps students translate mathematical/statistical/scientific concepts into code. This is a foundational course for writing code to solve Data Science ML &AI problems. It introduces basic programming concepts (like control structures, recursion, classes and objects) from scratch, assuming no prerequisites, to make this course accessible to students from non-computational scientific fields like Biology, Physics, Medicine, Chemistry, Civil & Mechanical Engineering etc. After building a strong foundation, the course advances to dive deep into core Mathematical libraries like NumPy, Scipy and Pandas. Students also learn when and how to use inbuilt-data structures like Lists, Dicts, Sets and Tuples. The module introduces the concepts of computational complexity to help students write optimized code using appropriate data structures and algorithmic design methods. The module does not dive deep into the data structures and algorithm design methods in this course – that is available in the ‘Data Structures and Algorithms’ module. This course is valuable for all students specializing in mathematical sub-areas of CS like ML, Data Science, Scientific Computing etc.
This course is a follow-up to Introduction to Problem-Solving Techniques: Part 1, and as part of their academic planning process with Woolf staff, students will ordinarily take that course first. Part 2 deepens the approach to data structures by including such topics as stacks, queues, linked lists, and trees, and discussing in detail real world applications of each approach and their comparative strengths and limitations (i.e. when to use a data structure and when not to use a data structure). This course will also include hashing techniques along with recursion and subset problems. This course will have rigorous homework and assignments to support the introduction of more than 4 data structures. By the end of this course a student can come up with the best strategy which can optimize both time and space complexities by choosing the best data structure suitable for a given problem.
This course is a hands-on course covering JavaScript from basics to advanced concepts in detail using multiple examples. We start with basic programming concepts like variables, control statements, loops, classes and objects. Students also learn basic data-structures like Strings, Arrays and dates. Students also learn to debug our code and handle errors gracefully in code. We learn popular style guides and good coding practices to build readable and reusable code which is also highly performant. We then learn how web browsers execute JavaScript code using V8 engine as an example. We also cover concepts like JIT-compiling which helps JS code to run faster. This is followed by slightly advanced concepts like DOM, Async- functions, Web APIs and AJAX which are very popularly used in modern front end development. We learn how to optimize JavaScript code to run on both mobile apps and mobile browsers along with Desktop browsers and as desktop apps via ElectronJS. Most of this course would be covered via real world examples and by learning from JS code of popular open-source websites and libraries.
This is a hands-on course on designing responsive, modern, and lightweight UI for web, mobile, and desktop applications using HTML5, CSS, and Frameworks like Bootstrap 4. This course starts with an introduction to how web browsers, mobile apps, and web servers work. We then dive into each of the nitty-gritty details of HTML5 to build webpages. We would start with simple web pages and then graduate to more complex layouts and features in HTML like forms, iFrames, multimedia playback, and using web APIs. We then go on to learn stylesheets based on CSS 4 and how browsers interpret CSS files to render web pages. Once again, we use multiple real-world example web pages to learn the internals of CSS4. We learn popular good practices for writing responsive HTML and CSS code, which is also interoperable on mobile browsers, apps, and desktop apps. We would introduce students to building desktop apps using HTML and CSS using toolkits like Electron. We would also study popular frameworks for front end development like Bootstrap 4, which can speed up UI development significantly.
This course introduces more advanced ML techniques like ensembles: bagging, boosting, cascading and stacking classifiers and regressors. It covers both the theoretical foundations and applicative details of these techniques along with popular implementations of boosting like LightGBM, CatBoost and XGBoost. Students also delve into kernel methods with specific focus on SVMs for classification and regression. Students will study state of the art model agnostic feature importance and model-interpretability techniques like LIME and SHAP. Students also study classical NLP based text encoding methods like Bag-of-words, TF-IDF etc. The module teaches various classical methods in time series analysis and forecasting like ARMA, ARIMA etc. Students also learn how to pose time series forecasting problems as regression and classification problems to leverage well studied ML techniques. This is followed by various domain and problem specific Feature engineering techniques that are often helpful in real world problem solving. Students will study methods like error analysis, ablative analysis etc., to debug and understand why and where a model is performing well and where it is not performing well. This will further help us in designing appropriate features. Students study model calibration techniques like Platt Scaling, Isotonic Regression etc. Later in this course, we cover how to build recommender systems using content-based and collaborative filtering methods. The module also teaches the detailed solution of the Netflix prize (2009) and various recent advances in RecSys
This course aims to build the core competency of building real world end-to-end ML systems and deploy them into production for a variety of problems and scenarios. Students would learn a variety of ML systems ranging from high throughput and low latency internet scale systems to low compute power and energy constrained IoT devices like smart watches. Students will study the ML lifecycle and various components in detail. We also use real world ML platforms like Google’s KubeFlow, TensorFlow Lite, and Amazon’s SageMaker to implement real world systems and understand the engineering trade-offs and challenges. Students also learn relevant technologies and tools like Containerization (Docker) and Container Orchestration (Kubernetes) and Git which are often used extensively in real world scalable ML systems. This course is a hands-on course where we solve
multiple real world cases and discuss solutions built by various companies and
organizations to provide the students a comprehensive understanding of varied
systems and design choices.
This course is aimed at equipping students with skills to architect the high level design (a.k.a. system design) of software and data systems. We start with some of the good to have properties of large complex software systems like scalability, reliability, availability, consistency etc. The module teaches various patterns and design choices we have to satisfy each of these good to have properties. We then go on to understand key components of system design like load-balancers, microservices, reverse-proxies, content-delivery networks etc. Students learn how each of them work internally along with real world implementations of each. We study various NoSQL data stores, their internal architectures and where to use which one with real-world examples. Students also learn popular data encoding schemes like XML and JSON. We learn how to build data pipelines using batch and stream processing systems. We also work on multiple real world cases on architecting on the cloud using popular open-source libraries and tools. Students will study design documents and high-level-design of popular internet applications and services like video-conferencing, recommender-systems, peer-to-peer chat, voice-assistants etc.
This course is aimed to build a strong foundational knowledge of Data Analytics tools used extensively in the Data Science field. There now are powerful data visualisation tools used in the business analytics industry to process and visualise raw business data in a very presentable and understandable format. A good example is Tableau, used by all data analytics departments of companies and in data analytics companies in various fields for its ease of use and efficiency. Tableau uses relational databases, Online Analytical Processing Cubes, Spreadsheets, cloud databases to generate graphical type visualisations. Course starts with visualisations and moves to an in-depth look at the different chart and graph functions, calculations, mapping and other functionality. Students will be taught quick table calculations, reference lines, different types of visualisations, bands and distributions, parameters, motion chart, trends and forecasting, formatting, stories, performance recording and advanced mapping. At the end of this course, students will be prepared, if they desire, to earn such industry desktop certifications as a Tableau Desktop Specialist, a Tableau Certified Associate, or a Tableau Certified Professional.
This course introduces basic probability theory , statistical methods and computational algorithms to perform mathematically rigorous data analysis. The course starts with basic foundational concepts of random variables, histograms, and various plots (PMF, PDF and CDF). Students learn various popular discrete and continuous distributions like Bernoulli, Binomial, Poisson, Gaussian, Exponential, Pareto, log-normal etc., both mathematically and from an applicative perspective. Students learn various measures like mean, median, percentiles, quantiles, variance and interquartile-range. Students learn the pros and cons of each metric and understand when and how to use them in practice. Students will learn conditional probability and Bayes theorem in the applied context of real-world problems in medicine and healthcare. The module teaches the foundations of non-parametric statistics and applies them to solve problems using computational tools. Students learn various methods to determine correlations rigorously in data. This is followed by applied and mathematical understanding of the statistics underlying control- treatment (A/B) experiments and hypothesis testing. The module engages computation tools in modern statics like Bootstrapping, Monte-Carlo methods, RANSAC etc.
This is a project-based course, with the aim of building the required skills for creating web-based software systems. The course covers the entire lifecycle of building software projects, from requirement gathering and scope definition from a product document, to designing the architecture of the system, and all the way to delivery and maintenance of the software system. The course covers both frontend, which is, building browser-based interfaces for users, using frontend web frameworks, and also building the backend, which is the server running an API to serve the information to the frontend, and running on an
SQL or similar database management system for storage.
All aspects of delivering a software project, including security, user authentication and authorisation, monitoring and analytics, and maintaining the project are covered. The course also covers the aspects of project maintenance, like using a version control system, setting up continuous integration and deployment pipelines
and bug trackers. The Applied Computer Science Project will focus on the outcome of how computer science and its associated tools as a field of study can be beneficial to other fields of study or how such tools can be applied to modify existing processes for better outcomes.
This course focuses on modelling sequences (text, music, time-series, genes) using deep-learning models. We start with a simple Recurrent Neural Network and its limitations with long-sequences. Students learn LSTMs and GRUs which can handle significantly longer sequences to model sequence data like text, music, gene-sequences and time-series data. We study variations of LSTM like bi-directional LSTMs and encoder-decoder architectures. This is followed by a detailed study of attention mechanism and Transformer based models which are currently the state-of-the-art for NLP and sequence modelling. The module teaches encoder-decoder Transformers, BERT, BERT-variations, GPT-1,2 &3 models from both the architectural and mathematical viewpoints and also a practical viewpoint. Students learn to implement many of these complex models from scratch (using TensorFlow 2 and Keras) to gain a deeper understanding of how they work internally. Students will study popular applications of deep-learning in NLP like parts-of-speech tagging, question-answering systems, conversational engines (chatbots), Semantic search with low-latency etc. For each of these problems, Students will study cutting edge deep-learning models along with code implementations.
This course provides a comprehensive overview of Computer vision problems and how they can be tackled using various Convolutional Neural networks (CNNs). Students start with classical image processing operations like edge detection, convolution, shape detectors and colour space conversions. This is followed by a foundational understanding of Deep-Convolutional Neural networks and how their training and evaluation works. We introduce various CNN specific layers like pooling-layers and upsampling layers. We also introduce various Data Augmentation techniques that are very helpful for image-related problems. This is followed by a dive deep into the internals of popular CNN architectures like: AlexNet, VGGNet, ResNet etc. Students also learn how to use these methods practically for transfer learning. Students will study how various computer-vision related tasks like image segmentation, image-generation, object detection and localization, contrastive learning etc., can be performed using state of the art algorithms for each of these tasks. Most of these techniques would be studied directly from the original research papers and open-source code provided by the authors. Students would also implement some of these algorithms from scratch in this course.
This course provides a strong mathematical and applicative introduction to Deep Learning. The module starts with the perceptron model as an over simplified approximation to a biological neuron. We motivate the need for a network of neurons and how they can be connected to form a Multi Layered Perceptron (MLPs). This is followed by a rigorous understanding of back-propagation algorithms and its limitations from the 1980s. Students study how modern deep learning took off with improved computational tools and data sets. We teach more modern activation units (like ReLU and SeLU) and how they overcome problems with the more classical Sigmoid and Tanh units. Students learn weight initialization methods, regularization by dropouts, batch normalization etc., to ensure that deep MLPs can be successfully trained. The module teaches variants of Gradient Descent that have been specifically designed to work well for deep learning systems like ADAM, AdaGrad, RMSProp etc. Students also learn AutoEncoders, VAEs and Word2Vec as unsupervised, encoding deep-learning architectures. We apply all of the foundational theory learned to various real world problems using TensorFlow 2 and Keras. Students also understand how TensorFlow 2 works internally with specific focus on computational graph processing
This course is aimed to help learners understand various techniques and algorithms to visualize, analyse and understand high dimensional data which is very common in Data Science and ML. The module starts with linear algebraic methods like Principal Component Analysis (PCA) and SVD (Singular Value Decomposition) for obtaining linear projection of high dimensional data. This is followed by more advanced nonlinear and state of the art techniques like t-SNE and UMAP for visualizing high dimensional data. Each of these techniques would be covered in full mathematical detail from first principles along with applying them to real world datasets in NLP, Genomics and internet-datasets. Students will also study how PCA and SVD are related to general Matrix Factorization techniques. To analyse and understand high dimensional un-labelled data, students learn clustering techniques like K-Means, Gaussian Mixture models, Hierarchical Clustering and DBSCAN. The modules show how some of the techniques are mathematically related to Matrix Factorization. Students study various outlier detection techniques based on density, proximity, factorization and cluster analysis.
This course focuses on building basic classification and regression models and understanding these models rigorously both with a mathematical and an applicative focus. The module starts with a basic introduction to high dimensional geometry of points, distance-metrics, hyperplanes and hyperspheres. We build on top this to introduce the mathematical formulation of logistic regression to find a separating hyperplane. Students learn to solve the optimization problem using vector calculus and gradient descent (GD) based algorithms. The module introduces computational variations of GD like mini-batch and stochastic gradient descent. Students also learn other popular classification and regression methods like k-Nearest Neighbours, NaiveI Bayes, Decision Trees, Linear Regression etc. Students also learn how each of these techniques under various real world situations like the presence of outliers, imbalanced data, multi class classification etc. Students learn bias and variance trade-off and various techniques to avoid overfitting and underfitting. Students also study these algorithms from a Bayesian viewpoint along with geometric intuition. This module is hands-on and students apply all these classical techniques to real world problems.
This course helps students translate mathematical/statistical/scientific concepts into code. This is a foundational course for writing code to solve Data Science ML &AI problems. It introduces basic programming concepts (like control structures, recursion, classes and objects) from scratch, assuming no prerequisites, to make this course accessible to students from non-computational scientific fields like Biology, Physics, Medicine, Chemistry, Civil & Mechanical Engineering etc. After building a strong foundation, the course advances to dive deep into core Mathematical libraries like NumPy, Scipy and Pandas. Students also learn when and how to use inbuilt-data structures like Lists, Dicts, Sets and Tuples. The module introduces the concepts of computational complexity to help students write optimized code using appropriate data structures and algorithmic design methods. The module does not dive deep into the data structures and algorithm design methods in this course – that is available in the ‘Data Structures and Algorithms’ module. This course is valuable for all students specializing in mathematical sub-areas of CS like ML, Data Science, Scientific Computing etc.
A distributed system is an application that executes a collection of protocols to coordinate the actions of multiple processes on a network, such that all components cooperate together to perform a single or small set of related tasks. Goals of a Distributed System: ● Transparency -> End user does not know what lies behind and how the system is working internally. ● Scalability - > Refers to the growth of the system. ● Availability -> Refers to the system's uptime. The module will carefully examine three case studies, with attention to such topics as: ● Basics of High Level System Design and consistent Hashing ● Caching ● CAP Theorem ● Replication and Master-Slave
● NoSQL
● Differences between SQL and NoSQL
● Multi Master
● Apache Zookeeper & Apache Kafka
● Case Study on ElasticSearch
● AWS S3 and Quad Trees
● Design Distributed Crawler
● Microservices and Containerisation
● Hotstar & IRCTC System design
This core course equips the student with knowledge of database management systems, operating systems and computer networks. At the end of the course, students will have a critical understanding of the architecture of computers and networks, as well has how programs interact with these. Students begin with mapping data storage problems (as they had done in Relational Databases) to understand how data is stored in a distributed network, and related issues such as concurrency. Subsequently, students cover operating systems with an overview of process scheduling, process synchronisation and memory management techniques with disk scheduling. The module concludes with computer networks, where we will be discussing all of the computer network layers and their protocols in detail.
This course is aimed to build a strong foundational knowledge of data structures (DS) used extensively in computing. The module starts with introducing time and space complexity notations and estimation for code snippets. This helps students be able to make trade-offs between various Data Structures while solving real world computational problems. The module introduces most widely used basic data structures like Dynamic arrays, multi-dimensional arrays, Lists, Strings, Hash Tables, Binary Trees, Balanced Binary Trees, Priority Queues and Graphs. The module discusses multiple implementation variations for each of the above data-structures along with trade-offs in space and time for each implementation. In this course, students implement these data-structures from scratch to gain a solid understanding of their inner workings. Students are also introduced to how to use the built-in data-structures available in various programming languages/libraries like Python/NumPy/C++ STL/Java/JavaScript. Students solve real-world problems where they must use an optimal DS to solve a computational problem at hand.
This is a foundational and mandatory course which aims to build student’s ability to apply various algorithmic design methods to provide an optimal solution to computational problems. This course starts with time and space complexity analysis of divide and conquer algorithms using recursion-tree based methods and Master’s theorem. Students would also learn about amortized time and space complexity analysis for randomized/probabilistic algorithms. Various algorithmic design strategies would be introduced via real world examples and problems. Students would learn when, where and how to optimally use Divide and Conquer, Dynamic programming (top-down and button-up), Greedy, Backtracking and Randomization strategies with examples. The module uses various practical examples from Array manipulations, Sorting, Searching, String manipulations, Tree & Graphs traversals, Graph path-finding, Spanning Trees etc., to introduce the above algorithmic strategies in action. Students would implement many of the above algorithmic design methods from scratch as part of the assignments. The module also introduces how some of these popular algorithms are readily available via popular libraries in various programming languages
This is a foundational course on building server-side (or backend) applications using popular JavaScript runtime environments like Node.js. Students will learn event driven programming for building scalable backend for web applications. The module teaches various aspects of Node.js like setup, package manager, client- server programming and connecting to various databases and REST APIs. Most of these concepts would be covered in a hands-on manner with real world examples and applications built from scratch using Node.js on Linux servers. This course also provides an introduction to Linux server administration and scripting with special focus on web-development and networking. Students learn to use Linux monitoring tools (like Monit) to track the health of the servers. The module also provides an introduction to Express.js which is a popular light-weight framework for Node.js applications. Given the practical nature of this course, this would involve building actual website backends via assignments/projects for ecommerce, online learning and/or photo-sharing.
This course builds upon the introductory JavaScript course to acquaint students of popular and modern frameworks to build the front end. We focus on three very popular frameworks/libraries in use: React.js, jQuery and AngularJS. We start with React.js, one of the most popular and advanced ones amongst the three. students learn various components and data flow to learn to architect real world front end using React.js. This would be achieved via multiple code examples and code-walkthroughs from scratch. We would also dive into React Native which is a cross platform Framework to build native mobile and smart-TV apps using JavaScript. This helps students to build applications for various platforms using only JavaScript. jQuery is one of the oldest and most widely used JavaScript libraries, which students cover in detail. Students specifically focus on how jQuery can simplify event handling, AJAX, HTML DOM tree manipulation and create CSS animations. We also provide a hands-on introduction to AngularJS to architect model-view-controller (MVC) based dynamic web pages.
This course teaches students how to analyse the ways users engage with a service. This method, called product analytics, helps businesses track and analyse user data. Students will learn more deeply what is required to move a product from idea to implementation, through to launch, and then on to iterative improvements. The course teaches how to measure progress, validate or update product hypotheses, and present product learnings. Also, students will gain experience in making informed decisions, as well as how to present findings and make an analytics-informed business case to win support for a product
This advanced graduate class addresses a unique topic on a rotating basis in order to keep the program at the forefront of scholarly research and industry practice. Every year the academic staff member will approve of a new topic to be covered. The bibliography will contain not less than 8 peer-reviewed articles or scholarly publications reflecting the current topic. Though the exact topic will vary, the emphasis of this module is practical, domain- specific issues in data science. Topics might include data handling, big data management systems, optimization, sparse signal recovery, principal component analysis, or deeper explorations of text mining, natural language processing, computer vision, or other topics introduced in other modules. Often, Further Studies in Data Science and Data Analytics will extend, complicate, or otherwise deepen the topic taken on in its predecessor course, Studies in Data Science and Data Analytics, giving students who elect this sequence to develop genuine expertise in a specific domain
This advanced graduate class addresses a unique topic on a rotating basis in order to keep the program at the forefront of scholarly research and industry practice. Every year the academic staff member will approve of a new topic to be covered. The bibliography will contain not less than 8 peer-reviewed articles or scholarly publications reflecting the current topic. Though the exact topic will vary, the emphasis of this module is practical, domain- specific issues in data science. Topics might include data handling, big data management systems, optimization, sparse signal recovery, principal component analysis, or deeper explorations of text mining, natural language processing, computer vision, or other topics introduced in other modules
This course focuses on building basic classification and regression models and understanding these models rigorously both with a mathematical and an applicative focus. It opens with a basic introduction to high dimensional geometryof points, distance-metrics, hyperplanes and hyperspheres. Then, it introduces the mathematical formulation of logistic regression to find a separating hyperplane. Vector calculus and gradient descent (GD)-based algorithms are explored to learn to solve the optimization problem, including computational variations of GD like mini-batch and stochastic gradient descent. The course also covers other popular classification and regression methods like k-Nearest Neighbours, Naive Bayes, Decision Trees, Linear Regression etc, to show how each of these techniques performs under various real-world situations like the presence of outliers, imbalanced data, multi class classification etc. Lectures on bias and variance tradeoff and various techniques to avoid overfitting and underfitting are incorporated. Algorithms are taught from a Bayesian viewpoint along with geometric intuition. This course would be heavily hands-on where students apply all these classical techniques to real world problems
builds on introductory programming courses to illustrate object-oriented programming concepts, database design in Python, and the basics of Machine Learning with Python libraries. Students will learn how to solve problems in Python, develop design patterns in Python code, develop internet applications with Python, and collaborate with other students to implement projects. The course introduces advanced features such as decorators and generators, as well as a thorough exploration of the Python development environment. This course is designed to prepare students for an entry-level developer position.
This module focuses on representing statistical techniques in code, and may be
conducted in Python, R, or another relevant language. Such languages provide
libraries that can handle a wide variety of statistical techniques like linear and
nonlinear modeling, classical statistical tests, time-series analysis, classification,
clustering and graphical techniques, and is highly extensible.
Learning to work in statistically-oriented programming language environments can
equip you with the following skills among many others:
An effective way of data handling (using arrays for example) and storing
data in a structured manner.
Expertise in diverse tools and libraries for Data Analysis
Ability to present complex data in a graphical and visual format for easy
understanding of the data and further solutions.