CSCE 6260 – Advanced Topics in Pattern Recognition and Image Processing

Spring 2026    


Basic information:


Course description

This is a research-oriented course that aims to provide latest frontiers in computer vision, pattern recognition, multimodal learning, large models, and artificial intelligence (AI). It will describe advanced approaches in AI, with a focus on recent topics such as prompt learning, multimodal vision-language learning, multimodal large language models, visual generation, video understanding, etc. Through this course, the students are expected to understand and digest various advanced AI topics by extensive in-class paper presentation and discussion.


Textbooks

This course does not follow any textbooks closely. However, the following textbooks will be useful for this course:

In addition to the textbooks, you're highly encouraged to read more related papers.


Papers to review and due dates

The following includes the papers to review throughout the whole semester. Please submit your paper review before the corresponding due date. Late submissions will NOT be accepted.

  1. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification, ICCV, 2021.
    Due: 1/28/2026

  2. Learning Concise and Descriptive Attributes for Visual Recognition, ICCV, 2023.
    Due: 2/8/2026

  3. VICRegL: Self-Supervised Learning of Local Visual Features, NeurIPS, 2022.
    Due: 2/18/2026

  4. Siamese Masked Autoencoders, NeurIPS, 2023.
    Due: 3/4/2026


Schedule (update may be applied)

Date Topic
Week 1 (1/14) Review of Basic Tasks in Computer Vision
Week 2 (1/21) Transformers
  • An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR, 2021
  • Vision Transformer with Deformable Attention, CVPR, 2022.
Presenter: Lyuzhou Ye
Prompt learning
  • Visual Prompt Tuning, ECCV, 2022
  • MaPLe: Multi-modal Prompt Learning, CVPR, 2023
Presenter: Qi Cai
Week 3 (1/28) Winter Storm (no class)
Week 4 (2/4) Parameter-efficient learning
  • Parameter-Efficient Transfer Learning for NLP, ICML, 2019
  • LoRA: Low-Rank Adaptation of Large Language Models, ICLR, 2022
Presenter: Zefeng He
Week 5 (2/11) Vision-language learning
  • Learning Transferable Visual Models From Natural Language Supervision, ICML, 2021
  • BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, ICML, 2023
Presenter: Mingchen Li
Learning by description
  • Visual Classification via Description from Large Language Models, ICLR, 2023
  • Evolving Interpretable Visual Classifiers with Large Language Models, ECCV, 2024
Presenter: Xin Gao
Week 6 (2/18) Visual-language understanding
  • TubeDETR: Spatio-Temporal Video Grounding with Transformers, CVPR, 2022
  • Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection, ECCV, 2024
Presenter: Dominic Ani
Week 7 (2/25) Image generation
  • Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction, NeurIPS, 2024
  • Mean Flows for One-step Generative Modeling, NeurIPS, 2025
Presenter: Abhignya Jagathpally
Week 8 (3/4) Segmentation
  • Segment Anything, ICCV, 2023
  • SAM 2: Segment Anything in Images and Videos, ICLR, 2025
Presenter: Gaoyi Chen
Week 9 (3/11) Spring Break (no class)
Week 10 (3/18) Motion and tracking: optical flow
  • Raft: Recurrent all-pairs field transforms for optical flow, ECCV, 2020
  • MemFlow: Optical Flow Estimation and Prediction with Memory, CVPR, 2024
Presenter: Mohit
Week 11 (3/25) Motion and tracking: depth estimation
  • Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data, CVPR, 2024
  • Depth Anything V2, NeurIPS, 2024
Presenter: Hanzhi Zhang
Week 12 (4/1) Motion and tracking: single object tracking
  • Joint feature learning and relation modeling for tracking: A one-stream framework, ECCV, 2022
  • A Distractor-Aware Memory for Visual Object Tracking with SAM2, CVPR, 2025
Presenter: Yan Qiao
Motion and tracking: multi-object tracking
  • Trackformer: Multi-object tracking with transformers, CVPR, 2022
  • MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking, ICCV, 2023
Presenter: Sumera
Week 13 (4/8) Motion and tracking: open-vocabulary tracking
  • OVTrack: Open-Vocabulary Multiple Object Tracking, CVPR, 2023
  • Matching Anything by Segmenting Anything, CVPR, 2024
Presenter: Divya
Week 14 (4/15) Large models
  • Chain-of-thought prompting elicits reasoning in large language models, NeurIPS, 2022
  • Visual Instruction Tuning, NeurIPS, 2023
Presenter: Runze Liu
Week 15 (4/22) Mamba: architecture
  • Mamba: Linear-Time Sequence Modeling with Selective State Spaces, COLM, 2024
  • Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality, ICML, 2024
Presenter: Gautam Galada
Mamba: application
  • Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model, CVPR, 2024
  • VideoMamba: State Space Model for Efficient Video Understanding, ECCV, 2024
Presenter: Lang Zhou
Week 16 (4/29) Diffusion
  • High-Resolution Image Synthesis With Latent Diffusion Models, CVPR, 2022
  • Scalable Diffusion Models with Transformers, ICCV, 2023
Presenter: Xinpeng Xie
Mamba: application
Spatial Intelligence: visual search
  • V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs, CVPR, 2024
  • Re-thinking Temporal Search for Long-Form Video Understanding, CVPR, 2025
Presenter: Lang Zhou

Grading policy

Grading will be based on the following components: