course-details-portlet

TDT4265 - Computer Vision and Deep Learning

About

Examination arrangement

Examination arrangement: Aggregate score
Grade: Letter grades

Evaluation	Weighting	Duration	Grade deviation	Examination aids
Assignment	40/100
School exam	60/100	4 hours		D

Course content

Modern computer vision (CV), driven by deep learning (DL), increasingly known as visual intelligence (VI), allows machines to interpret and understand visual data. This technology, crucial today in fields like autonomous driving and medical image computing, is expected to revolutionize various industries by enabling more accurate and efficient visual analysis.

The course will cover the mathematical and computational foundations essential for deep learning-based CV, alongside key neural architectures, and their training mechanisms, including supervised, self-supervised, unsupervised, and reinforcement-based learning. It will address crucial computer vision tasks, highlighting influential and state-of-the-art models for each task. The course will investigate the principal frameworks and tools in the field and explore the application domains that are driving advancements in computer vision.

Some more details about the course content: DL fundamentals: From neurons/units to neural networks (NNs). Ground truth (GT) data, parameters (weights and biases), activation functions and loss functions. Computational graphs, update rule, gradients, and supervised learning. Forward and backward pass in shallow NNs, matrix notation. Normalization (data/batch) and initialization (parameters). Hyper-parameter tuning and gradient decent optimization (from simple to SOTA optimizers). Generalization and regularization. Architectures: Fully Connected (Dense) NNs (FCNNs), Convolutional NNs (CNNs) and different types of convolutions (inc. Residual NNs and Capsule Nets), Recurrent NNs (RNNs, LSTMs, GRUs) for CV (e.g., sequences of frames in a video), Transformers and the self-attention mechanism. Vision Transformers. Graph NNs (GNNs) for CV. Retentive Networks (RetNets). CV tasks: Supervised: Image Classification, Object Detection, Segmentation (semantic, instance, panoptic), Depth estimation and POSE estimation etc. Object Tracking (e.g., same ID on object in a video sequence). Self-Supervised Learning (SSL): Large Vision Models and Multi-model (inc. images, video) Foundation Models. Unsupervised Learning: Autoencoders (AE) and Variational Autoencoders (VAE). Generative Adversarial Networks (GANs). Normalizing flows. Diffusion models. Reinforcement learning in the context of CV: Value-based methods, Policy gradient methods and Actor-critic methods.

Learning outcome

Knowledge:

Understand the fundamental concepts and mathematical principles behind deep learning algorithms and their application to modern computer vision.
Recognize the structure and functionality of various neural network architectures (FCNNs, CNNs, Vision Transformers etc.), as well as their roles in addressing specific computer vision tasks.
Comprehend the theoretical aspects of learning mechanisms such as supervised, self-supervised, unsupervised, and reinforcement learning, and how they contribute to the field of visual intelligence.

Skills:

Apply knowledge of deep learning to construct and train neural networks for a range of computer vision tasks, such as image classification, object detection, segmentation, depth estimation, pose estimation and generative AI for vision tasks.
Employ state-of-the-art optimization techniques, normalization processes, and regularization methods to enhance the generalization of neural network models.
Utilize principal frameworks and tools established in the field to implement and evaluate computer vision models.

General competences:

Analyze and critically assess different neural network models and architectures, and select the most appropriate one for a given visual intelligence task.
Integrate advanced computer vision solutions in various application domains, such as autonomous driving and medical image computing, to improve accuracy and efficiency.
Exhibit problem-solving abilities by tuning hyperparameters and adjusting network architectures to optimize performance for computer vision tasks.

Learning methods and activities

Lectures, self study, assignments, and a real-world mini project.

Lectures will be given in English.

Developing practical skills (tools, key DL-frameworks etc.) is an important part of the course.

Compulsory assignments

Exercises

Further on evaluation

The final grades are based on two parts, a real-world mini-project (40%) and a digital school exam (60%). Both parts are assigned a letter grade and then weighted and combined to form the final letter grade in the course. Both parts must be passed individually the same semester, in order to pass the course.

The examination papers will be given in English only.

If there is a re-sit examination, the examination form may change from written to oral.

If a student decides to retake the course for grade improvement or if the student failed the course, then they have to redo both parts of the course.

Traditional assignments are considered compulsory activity and a certain amount to this work must be approved to be allowed to attend the exam.

For group work differentiated grades may be applicable if the work effort within the group has been unevenly distributed.

Recommended previous knowledge

Some experience in Python programming.

Basic knowledge related to linear algebra, calculus and statistics.

TDT4195 Visual Computing fundamentals or equivalent.

Course materials

Book: Understanding Deep Learning, Simon J.D. Prince (online)
Book: Neural Networks and Deep Learning, Michael Nielsen (online)
Book: Deep Learning, Ian Goodfellow et. al. (online)
Supplementary material will be handed out as needed.

Credit reductions

Course code	Reduction	From	To
SIF8066	7.5

Timetable

List view

Calendar view

Detailed timetable ical

Examination

Examination arrangement: Aggregate score

Term Status code Evaluation Weighting Examination aids Date Time Examination system Room *

Spring ORD School exam 60/100 D INSPERA

Room	Building	Number of candidates

Spring ORD Assignment 40/100

Room	Building	Number of candidates

Summer UTS School exam 60/100 D INSPERA

Room	Building	Number of candidates

* The location (room) for a written examination is published 3 days before examination date. If more than one room is listed, you will find your room at Studentweb.

Examination

For more information regarding registration for examination and examination procedures, see "Innsida - Exams"

Språkvelger

Course - Computer Vision and Deep Learning - TDT4265

course-details-portlet

TDT4265 - Computer Vision and Deep Learning

About

Examination arrangement

Course content

Learning outcome

Learning methods and activities

Compulsory assignments

Further on evaluation

Recommended previous knowledge

Course materials

Credit reductions

Timetable

Examination

Examination arrangement: Aggregate score