CSCE 6280 – Advanced Topics in Artificial Intelligence

Fall 2024    


Basic information:


Course description

This is a research-oriented course that aims to provide latest frontiers in artificial intelligence (AI). It will describe advanced approaches in AI, with a focus on recent topics in computer vision or multimodal learning, such as prompt learning, multimodal vision-language learning, visual generation, video understanding, etc. Through this course, the students are expected to understand and digest various advanced AI topics by extensive in-class paper presentation and discussion.


Textbooks

This course does not follow any textbooks closely. However, the following textbooks will be useful for this course:

In addition to the textbooks, you're highly encouraged to read more related papers.


Papers to review and due dates

There are 11 papers in total to review throughout the whole semester. Please submit your paper review before the corresponding due date. Late submissions will NOT be accepted.

  1. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification, ICCV, 2021.
    Due: 8/30/2024

  2. VICRegL: Self-Supervised Learning of Local Visual Features, NeurIPS, 2022.
    Due: 9/6/2024

  3. Siamese Masked Autoencoders, NeurIPS, 2023.
    Due: 9/13/2024

  4. RegionCLIP: Region-based Language-Image Pretraining, CVPR, 2022.
    Due: 9/20/2024

  5. Maple: Multi-modal prompt learning, CVPR, 2023.
    Due: 9/27/2024

  6. Vision transformer adapter for dense predictions, ICLR, 2023.
    Due: 10/4/2024

  7. Scaling Open-Vocabulary Object Detection, NeurIPS, 2023.
    Due: 10/11/2024

  8. Learning Concise and Descriptive Attributes for Visual Recognition, ICCV, 2023.
    Due: 10/18/2024

  9. MixFormerV2: Efficient Fully Transformer Tracking, NeurIPS, 2023.
    Due: 10/25/2024

  10. Segment Anything, ICCV, 2023.
    Due: 11/1/2024

  11. Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection, CVPR, 2022.
    Due: 11/8/2024


Schedule (update may be applied)

Date Topic
Week 1 (8/23) Introduction to basics in deep learning
Week 2 (8/30) Transformers
  • An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, ICLR, 2021
  • Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, ICCV, 2021
Presenter: Joseph Caldwell
Week 3 (9/6) Video-language learning: visual grounding
  • TransVG: End-to-End Visual Grounding with Transformers, ICCV, 2021
  • Grounding dino: Marrying dino with grounded pre-training for open-set object detection, arXiv, 2023
Presenter: Syed Ali
Week 4 (9/13) Prompt learning
  • Visual Prompt Tuning, ECCV, 2022
  • Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model, CVPR, 2023
Presenter: Mingchen Li
Efficient learning
  • Parameter-Efficient Transfer Learning for NLP, ICML, 2019
  • LoRA: Low-Rank Adaptation of Large Language Models, ICLR, 2022
Presenter: Piyush Hemnani
Week 5 (9/20) Open-world learning: object detection
  • Simple open-vocabulary object detection, ECCV, 2022
  • Yolo-world: Real-time open-vocabulary object detection, CVPR, 2024
Presenter: Donger Chen
Open-world learning: action recognition and tracking
  • Evidential Deep Learning for Open Set Action Recognition, ICCV, 2021
  • OVTrack: Open-Vocabulary Multiple Object Tracking, CVPR, 2023
Presenter: Minghao Li
Week 6 (9/27) Mamba: architecture
  • Mamba: Linear-Time Sequence Modeling with Selective State Spaces, COLM, 2024
  • Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality, ICML, 2024
Presenter: Bing Fan
Mamba: application
  • Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model, CVPR, 2024
  • VideoMamba: State Space Model for Efficient Video Understanding, ECCV, 2024
Presenter: Bizhan Alipour Pijani
Week 7 (10/4) Learning by description
  • Visual Classification via Description from Large Language Models, ICLR, 2023
  • Evolving Interpretable Visual Classifiers with Large Language Models, ECCV, 2024
Presenter: Dawei Gao
Video-language learning: contrasive learning
  • Learning Transferable Visual Models From Natural Language Supervision, ICML, 2021
  • BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, ICML, 2023
Presenter: Suman Pandey (rescheduled to Week 12)
Week 8 (10/11) Motion and tracking: optical flow
  • Raft: Recurrent all-pairs field transforms for optical flow, ECCV, 2020
  • Gmflow: Learning optical flow via global matching; CVPR, 2022
Presenter: Morui Zhu
Motion and tracking: depth
  • Multi-Frame Self-Supervised Depth with Transformers, CVPR, 2022
  • Unifying flow, stereo and depth estimation, IEEE TPAMI, 2023
Presenter: Michael Oluwole
Week 9 (10/18) Motion and tracking: single object tracking
  • Joint feature learning and relation modeling for tracking: A one-stream framework, ECCV, 2022
  • Autoregressive visual tracking, CVPR, 2023
Presenter: Yen Pham
Motion and tracking: multi-object tracking
  • Global Tracking Transformers, CVPR, 2022
  • Trackformer: Multi-object tracking with transformers, CVPR, 2022
Presenter: Yihao Zhu
Week 10 (10/25) Generative models
  • High-Resolution Image Synthesis with Latent Diffusion Models, CVPR, 2022
  • NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, ECCV, 2020
Presenter: Ruixiao Huang
Week 11 (11/1) Large models
  • Visual instruction tuning, NeurIPS, 2023
  • Visionllm: Large language model is also an open-ended decoder for vision-centric tasks, NeurIPS, 2023
Presenter: Paul Phillips (cancelled due to student's absence)
Week 12 (11/8) Security and privacy
  • Learning Transferable Visual Models From Natural Language Supervision, ICML, 2021
  • BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, ICML, 2023
Presenter: Suman Pandey
Week 13 (11/15) Security and privacy
  • Dpatch: An adversarial patch attack on object detectors, arXiv, 2018
  • Dual Attention Suppression Attack: Generate Adversarial Camouflage in Physical World, CVPR, 2021
Presenter: Shengping Bi
Week 14 (11/22) Project presentation
Presentation order:
  • Joseph Caldwell
  • Shengping Bi and Ruixiao Huang
  • Morui Zhu, Donger Chen, and Yihao Zhu
  • Minghao Li, Mingchen Li, and Dawei Gao
  • Yen Pham
  • Bing Fan
  • Bizhan AlipourPijani and Suman Pandey
  • Piyush Deepak Hemnani
  • Michael Oluwole
  • Syed Ali

Project report due: 11/22
Week 15 (11/29) Thanks giving break (no class)
Week 16 (12/06) Reading day (no class)

Grading policy

Grading will be based on the following components: