Basic information:
Course description
This is a research-oriented course that aims to provide latest frontiers in computer vision, pattern recognition, multimodal learning, large models, and artificial intelligence (AI). It will describe advanced approaches in AI, with a focus on recent topics such as prompt learning, multimodal vision-language learning, multimodal large language models, visual generation, video understanding, etc. Through this course, the students are expected to understand and digest various advanced AI topics by extensive in-class paper presentation and discussion.
Textbooks
This course does not follow any textbooks closely. However, the following textbooks will be useful for this course:
Papers to review and due dates
The following includes the papers to review throughout the whole semester. Please submit your paper review before the corresponding due date. Late submissions will NOT be accepted.
Due: 1/28/2026
Due: 2/8/2026
Due: 2/18/2026
Due: 3/4/2026
Schedule (update may be applied)
| Date | Topic |
|---|---|
| Week 1 (1/14) | Review of Basic Tasks in Computer Vision |
| Week 2 (1/21) |
Transformers
|
Prompt learning
|
|
| Week 3 (1/28) |
Winter Storm (no class) |
| Week 4 (2/4) |
Parameter-efficient learning
|
| Week 5 (2/11) |
Vision-language learning
|
Learning by description
|
|
| Week 6 (2/18) |
Visual-language understanding
|
| Week 7 (2/25) |
Image generation
|
| Week 8 (3/4) |
Segmentation
|
| Week 9 (3/11) |
Spring Break (no class) |
| Week 10 (3/18) |
Motion and tracking: optical flow
|
| Week 11 (3/25) |
Motion and tracking: depth estimation
|
| Week 12 (4/1) |
Motion and tracking: single object tracking
|
Motion and tracking: multi-object tracking
|
|
| Week 13 (4/8) |
Motion and tracking: open-vocabulary tracking
|
| Week 14 (4/15) |
Large models
|
| Week 15 (4/22) |
Mamba: architecture
|
Mamba: application
|
|
| Week 16 (4/29) |
Diffusion
|
|
Mamba: application Spatial Intelligence: visual search
|
Grading policy
Grading will be based on the following components: