CSCE 6260 - Advanced Topics

CSCE 6260 – Advanced Topics in Pattern Recognition and Image Processing

Spring 2026

Basic information:

Instructor: Heng Fan (heng.fan@unt.edu)
Office: Discovery Park F284
Office hours: Wednesday 12:30 - 2:30 pm or by appointment

Lecture time: Wednesday 2:30 - 5:20 pm
Classroom: NTDP F285

Course description

This is a research-oriented course that aims to provide latest frontiers in computer vision, pattern recognition, multimodal learning, large models, and artificial intelligence (AI). It will describe advanced approaches in AI, with a focus on recent topics such as prompt learning, multimodal vision-language learning, multimodal large language models, visual generation, video understanding, etc. Through this course, the students are expected to understand and digest various advanced AI topics by extensive in-class paper presentation and discussion.

Textbooks

This course does not follow any textbooks closely. However, the following textbooks will be useful for this course:

Deep Learning, by Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016. online version
Computer Vision: Algorithms and Applications (the second edition), by Rick Szeliski, 2022. online version
Dive into Deep Learning, by Aston Zhang, Zack C. Lipton, Mu Li, and Alex J. Smola, 2019. online version
(A lot of examples are provided to practice deep learning.)
Neural Networks and Deep Learning, by Michael Nielsen, 2019. online version
Introduction to Deep Learning, by Eugene Charniak, 2019. link

In addition to the textbooks, you're highly encouraged to read more related papers.

Papers to review and due dates

The following includes the papers to review throughout the whole semester. Please submit your paper review before the corresponding due date. Late submissions will NOT be accepted.

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification, ICCV, 2021.
Due: 1/28/2026

Learning Concise and Descriptive Attributes for Visual Recognition, ICCV, 2023.
Due: 2/8/2026

VICRegL: Self-Supervised Learning of Local Visual Features, NeurIPS, 2022.
Due: 2/18/2026

Siamese Masked Autoencoders, NeurIPS, 2023.
Due: 3/4/2026

RegionCLIP: Region-based Language-Image Pretraining, CVPR, 2022.
Due: 3/25/2026

Vision transformer adapter for dense predictions, ICLR, 2023.
Due: 4/1/2026

Schedule (update may be applied)

Date	Topic
Week 1 (1/14)	Review of Basic Tasks in Computer Vision
Week 2 (1/21)	Transformers An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR, 2021 Vision Transformer with Deformable Attention, CVPR, 2022. Presenter: Lyuzhou Ye
Week 2 (1/21)	Prompt learning Visual Prompt Tuning, ECCV, 2022 MaPLe: Multi-modal Prompt Learning, CVPR, 2023 Presenter: Qi Cai
Week 3 (1/28)	Winter Storm (no class)
Week 4 (2/4)	Parameter-efficient learning Parameter-Efficient Transfer Learning for NLP, ICML, 2019 LoRA: Low-Rank Adaptation of Large Language Models, ICLR, 2022 Presenter: Zefeng He
Week 5 (2/11)	Vision-language learning Learning Transferable Visual Models From Natural Language Supervision, ICML, 2021 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, ICML, 2023 Presenter: Mingchen Li
Week 5 (2/11)	Learning by description Visual Classification via Description from Large Language Models, ICLR, 2023 Evolving Interpretable Visual Classifiers with Large Language Models, ECCV, 2024 Presenter: Xin Gao
Week 6 (2/18)	Visual-language understanding TubeDETR: Spatio-Temporal Video Grounding with Transformers, CVPR, 2022 Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection, ECCV, 2024 Presenter: Dominic Ani
Week 7 (2/25)	Image generation Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction, NeurIPS, 2024 Mean Flows for One-step Generative Modeling, NeurIPS, 2025 Presenter: Abhignya Jagathpally
Week 8 (3/4)	Segmentation Segment Anything, ICCV, 2023 SAM 2: Segment Anything in Images and Videos, ICLR, 2025 Presenter: Gaoyi Chen
Week 9 (3/11)	Spring Break (no class)
Week 10 (3/18)	Motion and tracking: optical flow Raft: Recurrent all-pairs field transforms for optical flow, ECCV, 2020 MemFlow: Optical Flow Estimation and Prediction with Memory, CVPR, 2024 Presenter: Mohit
Week 11 (3/25)	Motion and tracking: depth estimation Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, CVPR, 2024 Depth Anything V2, NeurIPS, 2024 Presenter: Hanzhi Zhang (note: presentation moved to week 13)
Week 12 (4/1)	Motion and tracking: single object tracking Joint feature learning and relation modeling for tracking: A one-stream framework, ECCV, 2022 A Distractor-Aware Memory for Visual Object Tracking with SAM2, CVPR, 2025 Presenter: Yan Qiao
Week 12 (4/1)	Motion and tracking: multi-object tracking Trackformer: Multi-object tracking with transformers, CVPR, 2022 MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking, ICCV, 2023 Presenter: Sumera
Week 13 (4/8)	Motion and tracking: open-vocabulary tracking OVTrack: Open-Vocabulary Multiple Object Tracking, CVPR, 2023 Matching Anything by Segmenting Anything, CVPR, 2024 Presenter: Divya
Week 14 (4/15)	Large models Chain-of-thought prompting elicits reasoning in large language models, NeurIPS, 2022 Visual Instruction Tuning, NeurIPS, 2023 Presenter: Runze Liu
Week 15 (4/22)	Mamba: architecture Mamba: Linear-Time Sequence Modeling with Selective State Spaces, COLM, 2024 Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality, ICML, 2024 Presenter: Gautam Galada
Week 15 (4/22)	Mamba: application Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model, CVPR, 2024 VideoMamba: State Space Model for Efficient Video Understanding, ECCV, 2024 Presenter: Lang Zhou
Week 16 (4/29)	Diffusion High-Resolution Image Synthesis With Latent Diffusion Models, CVPR, 2022 Scalable Diffusion Models with Transformers, ICCV, 2023 Presenter: Xinpeng Xie
Week 16 (4/29)	Spatial Intelligence: visual search V: Guided Visual Search as a Core Mechanism in Multimodal LLMs, CVPR, 2024 Re-thinking Temporal Search for Long-Form Video Understanding, CVPR, 2025 Presenter:* Ruiyao Liu

Grading policy

Grading will be based on the following components:

Paper presentation: 40%
Paper review: 40%
In-class discussion: 20%