CSCE 5218 - Deep Learning

CSCE 6280 – Advanced Topics in Artificial Intelligence

Fall 2024

Basic information:

Instructor: Heng Fan (heng.fan@unt.edu)
Office: Discovery Park F284
Office hours: Friday 9:00 am - 11:00 am or by appointment

Lecture time: Friday 11:00 am - 1:50 pm
Classroom: NTDP B190

TA (half): Xiaoqiong Liu (xiaoqiongliu@my.unt.edu), by appointment

Course description

This is a research-oriented course that aims to provide latest frontiers in artificial intelligence (AI). It will describe advanced approaches in AI, with a focus on recent topics in computer vision or multimodal learning, such as prompt learning, multimodal vision-language learning, visual generation, video understanding, etc. Through this course, the students are expected to understand and digest various advanced AI topics by extensive in-class paper presentation and discussion.

Textbooks

This course does not follow any textbooks closely. However, the following textbooks will be useful for this course:

Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016. online version
Computer Vision: Algorithms and Applications (the second edition), by Rick Szeliski, 2022. online version
Dive into Deep Learning, by Aston Zhang, Zack C. Lipton, Mu Li, and Alex J. Smola, 2019. online version
(A lot of examples are provided to practice deep learning.)
Neural Networks and Deep Learning, by Michael Nielsen, 2019. online version
Introduction to Deep Learning, by Eugene Charniak, 2019. link

In addition to the textbooks, you're highly encouraged to read more related papers.

Papers to review and due dates

There are 11 papers in total to review throughout the whole semester. Please submit your paper review before the corresponding due date. Late submissions will NOT be accepted.

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification, ICCV, 2021.
Due: 8/30/2024

VICRegL: Self-Supervised Learning of Local Visual Features, NeurIPS, 2022.
Due: 9/6/2024

Siamese Masked Autoencoders, NeurIPS, 2023.
Due: 9/13/2024

RegionCLIP: Region-based Language-Image Pretraining, CVPR, 2022.
Due: 9/20/2024

Maple: Multi-modal prompt learning, CVPR, 2023.
Due: 9/27/2024

Vision transformer adapter for dense predictions, ICLR, 2023.
Due: 10/4/2024

Scaling Open-Vocabulary Object Detection, NeurIPS, 2023.
Due: 10/11/2024

Learning Concise and Descriptive Attributes for Visual Recognition, ICCV, 2023.
Due: 10/18/2024

MixFormerV2: Efficient Fully Transformer Tracking, NeurIPS, 2023.
Due: 10/25/2024

Segment Anything, ICCV, 2023.
Due: 11/1/2024

Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection, CVPR, 2022.
Due: 11/8/2024

Schedule (update may be applied)

Date	Topic
Week 1 (8/23)	Introduction to basics in deep learning
Week 2 (8/30)	Transformers An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR, 2021 Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, ICCV, 2021 Presenter: Joseph Caldwell
Week 3 (9/6)	Video-language learning: visual grounding TransVG: End-to-End Visual Grounding with Transformers, ICCV, 2021 Grounding dino: Marrying dino with grounded pre-training for open-set object detection, arXiv, 2023 Presenter: Syed Ali
Week 4 (9/13)	Prompt learning Visual Prompt Tuning, ECCV, 2022 Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model, CVPR, 2023 Presenter: Mingchen Li
Week 4 (9/13)	Efficient learning Parameter-Efficient Transfer Learning for NLP, ICML, 2019 LoRA: Low-Rank Adaptation of Large Language Models, ICLR, 2022 Presenter: Piyush Hemnani
Week 5 (9/20)	Open-world learning: object detection Simple open-vocabulary object detection, ECCV, 2022 Yolo-world: Real-time open-vocabulary object detection, CVPR, 2024 Presenter: Donger Chen
Week 5 (9/20)	Open-world learning: action recognition and tracking Evidential Deep Learning for Open Set Action Recognition, ICCV, 2021 OVTrack: Open-Vocabulary Multiple Object Tracking, CVPR, 2023 Presenter: Minghao Li
Week 6 (9/27)	Mamba: architecture Mamba: Linear-Time Sequence Modeling with Selective State Spaces, COLM, 2024 Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality, ICML, 2024 Presenter: Bing Fan
Week 6 (9/27)	Mamba: application Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model, CVPR, 2024 VideoMamba: State Space Model for Efficient Video Understanding, ECCV, 2024 Presenter: Bizhan Alipour Pijani
Week 7 (10/4)	Learning by description Visual Classification via Description from Large Language Models, ICLR, 2023 Evolving Interpretable Visual Classifiers with Large Language Models, ECCV, 2024 Presenter: Dawei Gao
Week 7 (10/4)	Video-language learning: contrasive learning Learning Transferable Visual Models From Natural Language Supervision, ICML, 2021 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, ICML, 2023 Presenter: Suman Pandey (rescheduled to Week 12)
Week 8 (10/11)	Motion and tracking: optical flow Raft: Recurrent all-pairs field transforms for optical flow, ECCV, 2020 Gmflow: Learning optical flow via global matching; CVPR, 2022 Presenter: Morui Zhu
Week 8 (10/11)	Motion and tracking: depth Multi-Frame Self-Supervised Depth with Transformers, CVPR, 2022 Unifying flow, stereo and depth estimation, IEEE TPAMI, 2023 Presenter: Michael Oluwole
Week 9 (10/18)	Motion and tracking: single object tracking Joint feature learning and relation modeling for tracking: A one-stream framework, ECCV, 2022 Autoregressive visual tracking, CVPR, 2023 Presenter: Yen Pham
Week 9 (10/18)	Motion and tracking: multi-object tracking Global Tracking Transformers, CVPR, 2022 Trackformer: Multi-object tracking with transformers, CVPR, 2022 Presenter: Yihao Zhu
Week 10 (10/25)	Generative models High-Resolution Image Synthesis with Latent Diffusion Models, CVPR, 2022 NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV, 2020 Presenter: Ruixiao Huang
Week 11 (11/1)	Large models Visual instruction tuning, NeurIPS, 2023 Visionllm: Large language model is also an open-ended decoder for vision-centric tasks, NeurIPS, 2023 Presenter: Paul Phillips (cancelled due to student's absence)
Week 12 (11/8)	Security and privacy Learning Transferable Visual Models From Natural Language Supervision, ICML, 2021 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, ICML, 2023 Presenter: Suman Pandey
Week 13 (11/15)	Security and privacy Dpatch: An adversarial patch attack on object detectors, arXiv, 2018 Dual Attention Suppression Attack: Generate Adversarial Camouflage in Physical World, CVPR, 2021 Presenter: Shengping Bi
Week 14 (11/22)	Project presentation Presentation order: Joseph Caldwell Shengping Bi and Ruixiao Huang Morui Zhu, Donger Chen, and Yihao Zhu Minghao Li, Mingchen Li, and Dawei Gao Yen Pham Bing Fan Bizhan AlipourPijani and Suman Pandey Piyush Deepak Hemnani Michael Oluwole Syed Ali Project report due: 11/22
Week 15 (11/29)	Thanks giving break (no class)
Week 16 (12/06)	Reading day (no class)

Grading policy

Grading will be based on the following components:

Paper presentation: 30%
Paper review: 30%
Project: 30%
In-class discussion: 10%