CSCE 6280 – Advanced Topics in Artificial Intelligence
Fall 2024
Basic information:
-
Instructor: Heng Fan (heng.fan@unt.edu)
-
Office: Discovery Park F284
-
Office hours: Friday 9:00 am - 11:00 am or by appointment
-
Lecture time: Friday 11:00 am - 1:50 pm
-
Classroom: NTDP B190
-
TA (half): Xiaoqiong Liu (xiaoqiongliu@my.unt.edu), by appointment
Course description
This is a research-oriented course that aims to provide latest frontiers in artificial intelligence (AI).
It will describe advanced approaches in AI, with a focus on recent topics in computer vision or multimodal learning,
such as prompt learning, multimodal vision-language learning, visual generation, video understanding, etc. Through
this course, the students are expected to understand and digest various advanced AI topics by extensive in-class paper
presentation and discussion.
Textbooks
This course does not follow any textbooks closely. However, the following textbooks will be useful for this course:
-
Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016. online version
-
Computer Vision: Algorithms and Applications (the second edition), by Rick Szeliski, 2022. online version
-
Dive into Deep Learning, by Aston Zhang, Zack C. Lipton, Mu Li, and Alex J. Smola, 2019. online version
(A lot of examples are provided to practice deep learning.)
-
Neural Networks and Deep Learning, by Michael Nielsen, 2019. online version
-
Introduction to Deep Learning, by Eugene Charniak, 2019. link
In addition to the textbooks, you're highly encouraged to read more related papers.
Papers to review and due dates
There are 11 papers in total to review throughout the whole semester. Please submit your paper review before the corresponding due date. Late submissions will NOT be accepted.
-
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification, ICCV, 2021.
Due: 8/30/2024
-
VICRegL: Self-Supervised Learning of Local Visual Features, NeurIPS, 2022.
Due: 9/6/2024
-
Siamese Masked Autoencoders, NeurIPS, 2023.
Due: 9/13/2024
-
RegionCLIP: Region-based Language-Image Pretraining, CVPR, 2022.
Due: 9/20/2024
-
Maple: Multi-modal prompt learning, CVPR, 2023.
Due: 9/27/2024
-
Vision transformer adapter for dense predictions, ICLR, 2023.
Due: 10/4/2024
-
Scaling Open-Vocabulary Object Detection, NeurIPS, 2023.
Due: 10/11/2024
-
Learning Concise and Descriptive Attributes for Visual Recognition, ICCV, 2023.
Due: 10/18/2024
-
MixFormerV2: Efficient Fully Transformer Tracking, NeurIPS, 2023.
Due: 10/25/2024
-
Segment Anything, ICCV, 2023.
Due: 11/1/2024
-
Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection, CVPR, 2022.
Due: 11/8/2024
Schedule (update may be applied)
Date
|
Topic
|
Week 1 (8/23)
|
Introduction to basics in deep learning
|
Week 2 (8/30)
|
Transformers
-
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR, 2021
-
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, ICCV, 2021
Presenter: Joseph Caldwell
|
Week 3 (9/6)
|
Video-language learning: visual grounding
-
TransVG: End-to-End Visual Grounding with Transformers, ICCV, 2021
-
Grounding dino: Marrying dino with grounded pre-training for open-set object detection, arXiv, 2023
Presenter: Syed Ali
|
Week 4 (9/13)
|
Prompt learning
-
Visual Prompt Tuning, ECCV, 2022
-
Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model, CVPR, 2023
Presenter: Mingchen Li
|
Efficient learning
-
Parameter-Efficient Transfer Learning for NLP, ICML, 2019
-
LoRA: Low-Rank Adaptation of Large Language Models, ICLR, 2022
Presenter: Piyush Hemnani
|
Week 5 (9/20)
|
Open-world learning: object detection
-
Simple open-vocabulary object detection, ECCV, 2022
-
Yolo-world: Real-time open-vocabulary object detection, CVPR, 2024
Presenter: Donger Chen
|
Open-world learning: action recognition and tracking
-
Evidential Deep Learning for Open Set Action Recognition, ICCV, 2021
-
OVTrack: Open-Vocabulary Multiple Object Tracking, CVPR, 2023
Presenter: Minghao Li
|
Week 6 (9/27)
|
Mamba: architecture
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces, COLM, 2024
-
Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality, ICML, 2024
Presenter: Bing Fan
|
Mamba: application
-
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model, CVPR, 2024
-
VideoMamba: State Space Model for Efficient Video Understanding, ECCV, 2024
Presenter: Bizhan Alipour Pijani
|
Week 7 (10/4)
|
Learning by description
-
Visual Classification via Description from Large Language Models, ICLR, 2023
-
Evolving Interpretable Visual Classifiers with Large Language Models, ECCV, 2024
Presenter: Dawei Gao
|
Video-language learning: contrasive learning
-
Learning Transferable Visual Models From Natural Language Supervision, ICML, 2021
-
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, ICML, 2023
Presenter: Suman Pandey (rescheduled to Week 12)
|
Week 8 (10/11)
|
Motion and tracking: optical flow
-
Raft: Recurrent all-pairs field transforms for optical flow, ECCV, 2020
-
Gmflow: Learning optical flow via global matching; CVPR, 2022
Presenter: Morui Zhu
|
Motion and tracking: depth
-
Multi-Frame Self-Supervised Depth with Transformers, CVPR, 2022
-
Unifying flow, stereo and depth estimation, IEEE TPAMI, 2023
Presenter: Michael Oluwole
|
Week 9 (10/18)
|
Motion and tracking: single object tracking
-
Joint feature learning and relation modeling for tracking: A one-stream framework, ECCV, 2022
-
Autoregressive visual tracking, CVPR, 2023
Presenter: Yen Pham
|
Motion and tracking: multi-object tracking
-
Global Tracking Transformers, CVPR, 2022
-
Trackformer: Multi-object tracking with transformers, CVPR, 2022
Presenter: Yihao Zhu
|
Week 10 (10/25)
|
Generative models
-
High-Resolution Image Synthesis with Latent Diffusion Models, CVPR, 2022
-
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV, 2020
Presenter: Ruixiao Huang
|
Week 11 (11/1)
|
Large models
-
Visual instruction tuning, NeurIPS, 2023
-
Visionllm: Large language model is also an open-ended decoder for vision-centric tasks, NeurIPS, 2023
Presenter: Paul Phillips (cancelled due to student's absence)
|
Week 12 (11/8)
|
Security and privacy
-
Learning Transferable Visual Models From Natural Language Supervision, ICML, 2021
-
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, ICML, 2023
Presenter: Suman Pandey
|
Week 13 (11/15)
|
Security and privacy
-
Dpatch: An adversarial patch attack on object detectors, arXiv, 2018
-
Dual Attention Suppression Attack: Generate Adversarial Camouflage in Physical World, CVPR, 2021
Presenter: Shengping Bi
|
Week 14 (11/22)
|
Project presentation
Presentation order:
-
Joseph Caldwell
-
Shengping Bi and Ruixiao Huang
-
Morui Zhu, Donger Chen, and Yihao Zhu
-
Minghao Li, Mingchen Li, and Dawei Gao
-
Yen Pham
-
Bing Fan
-
Bizhan AlipourPijani and Suman Pandey
-
Piyush Deepak Hemnani
-
Michael Oluwole
-
Syed Ali
Project report due: 11/22
|
Week 15 (11/29)
|
Thanks giving break (no class)
|
Week 16 (12/06)
|
Reading day (no class)
|
Grading policy
Grading will be based on the following components:
-
Paper presentation: 30%
-
Paper review: 30%
-
Project: 30%
-
In-class discussion: 10%