CSCE 6260 – Advanced Topics in Pattern Recognition and Image Processing
Spring 2026
Basic information:
-
Instructor: Heng Fan (heng.fan@unt.edu)
-
Office: Discovery Park F284
-
Office hours: Wednesday 12:30 - 2:30 pm or by appointment
-
Lecture time: Wednesday 2:30 - 5:20 pm
-
Classroom: NTDP F285
Course description
This is a research-oriented course that aims to provide latest frontiers in computer vision, pattern recognition, multimodal learning, large models, and artificial intelligence (AI). It will describe
advanced approaches in AI, with a focus on recent topics such as prompt learning, multimodal vision-language learning, multimodal large language models, visual generation, video understanding, etc. Through this course, the students are expected to understand and digest various advanced
AI topics by extensive in-class paper presentation and discussion.
Textbooks
This course does not follow any textbooks closely. However, the following textbooks will be useful for this course:
-
Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016. online version
-
Computer Vision: Algorithms and Applications (the second edition), by Rick Szeliski, 2022. online version
-
Dive into Deep Learning, by Aston Zhang, Zack C. Lipton, Mu Li, and Alex J. Smola, 2019. online version
(A lot of examples are provided to practice deep learning.)
-
Neural Networks and Deep Learning, by Michael Nielsen, 2019. online version
-
Introduction to Deep Learning, by Eugene Charniak, 2019. link
In addition to the textbooks, you're highly encouraged to read more related papers.
Papers to review and due dates
The following includes the papers to review throughout the whole semester. Please submit your paper review before the corresponding due date. Late submissions will NOT be accepted.
-
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification, ICCV, 2021.
Due: 1/28/2026
-
Learning Concise and Descriptive Attributes for Visual Recognition, ICCV, 2023.
Due: 2/8/2026
-
VICRegL: Self-Supervised Learning of Local Visual Features, NeurIPS, 2022.
Due: 2/18/2026
-
Siamese Masked Autoencoders, NeurIPS, 2023.
Due: 3/4/2026
-
RegionCLIP: Region-based Language-Image Pretraining, CVPR, 2022.
Due: 3/25/2026
-
Vision transformer adapter for dense predictions, ICLR, 2023.
Due: 4/1/2026
Schedule (update may be applied)
|
Date
|
Topic
|
|
Week 1 (1/14)
|
Review of Basic Tasks in Computer Vision
|
|
Week 2 (1/21)
|
Transformers
-
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR, 2021
-
Vision Transformer with Deformable Attention, CVPR, 2022.
Presenter: Lyuzhou Ye
|
Prompt learning
-
Visual Prompt Tuning, ECCV, 2022
-
MaPLe: Multi-modal Prompt Learning, CVPR, 2023
Presenter: Qi Cai
|
|
Week 3 (1/28)
|
Winter Storm (no class)
|
|
Week 4 (2/4)
|
Parameter-efficient learning
-
Parameter-Efficient Transfer Learning for NLP, ICML, 2019
-
LoRA: Low-Rank Adaptation of Large Language Models, ICLR, 2022
Presenter: Zefeng He
|
|
Week 5 (2/11)
|
Vision-language learning
-
Learning Transferable Visual Models From Natural Language Supervision, ICML, 2021
-
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, ICML, 2023
Presenter: Mingchen Li
|
Learning by description
-
Visual Classification via Description from Large Language Models, ICLR, 2023
-
Evolving Interpretable Visual Classifiers with Large Language Models, ECCV, 2024
Presenter: Xin Gao
|
|
Week 6 (2/18)
|
Visual-language understanding
-
TubeDETR: Spatio-Temporal Video Grounding with Transformers, CVPR, 2022
-
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection, ECCV, 2024
Presenter: Dominic Ani
|
|
Week 7 (2/25)
|
Image generation
-
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction, NeurIPS, 2024
-
Mean Flows for One-step Generative Modeling, NeurIPS, 2025
Presenter: Abhignya Jagathpally
|
|
Week 8 (3/4)
|
Segmentation
-
Segment Anything, ICCV, 2023
-
SAM 2: Segment Anything in Images and Videos, ICLR, 2025
Presenter: Gaoyi Chen
|
|
Week 9 (3/11)
|
Spring Break (no class)
|
|
Week 10 (3/18)
|
Motion and tracking: optical flow
-
Raft: Recurrent all-pairs field transforms for optical flow, ECCV, 2020
-
MemFlow: Optical Flow Estimation and Prediction with Memory, CVPR, 2024
Presenter: Mohit
|
|
Week 11 (3/25)
|
Motion and tracking: depth estimation
-
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, CVPR, 2024
-
Depth Anything V2, NeurIPS, 2024
Presenter: Hanzhi Zhang (note: presentation moved to week 13)
|
|
Week 12 (4/1)
|
Motion and tracking: single object tracking
-
Joint feature learning and relation modeling for tracking: A one-stream framework, ECCV, 2022
-
A Distractor-Aware Memory for Visual Object Tracking with SAM2, CVPR, 2025
Presenter: Yan Qiao
|
Motion and tracking: multi-object tracking
-
Trackformer: Multi-object tracking with transformers, CVPR, 2022
-
MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking, ICCV, 2023
Presenter: Sumera
|
|
Week 13 (4/8)
|
Motion and tracking: open-vocabulary tracking
-
OVTrack: Open-Vocabulary Multiple Object Tracking, CVPR, 2023
-
Matching Anything by Segmenting Anything, CVPR, 2024
Presenter: Divya
|
|
Week 14 (4/15)
|
Large models
-
Chain-of-thought prompting elicits reasoning in large language models, NeurIPS, 2022
-
Visual Instruction Tuning, NeurIPS, 2023
Presenter: Runze Liu
|
|
Week 15 (4/22)
|
Mamba: architecture
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces, COLM, 2024
-
Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality, ICML, 2024
Presenter: Gautam Galada
|
Mamba: application
-
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model, CVPR, 2024
-
VideoMamba: State Space Model for Efficient Video Understanding, ECCV, 2024
Presenter: Lang Zhou
|
|
Week 16 (4/29)
|
Diffusion
-
High-Resolution Image Synthesis With Latent Diffusion Models, CVPR, 2022
-
Scalable Diffusion Models with Transformers, ICCV, 2023
Presenter: Xinpeng Xie
|
Spatial Intelligence: visual search
-
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs, CVPR, 2024
-
Re-thinking Temporal Search for Long-Form Video Understanding, CVPR, 2025
Presenter: Lang Zhou
|
Grading policy
Grading will be based on the following components:
-
Paper presentation: 40%
-
Paper review: 40%
-
In-class discussion: 20%