Syllabus
Note
This site is a work-in-progress and is actively being developed. Please check back frequently for updates.
Description
This seminar prepares students for original research contributions at evaluation-focused venues like CLEF. In a dual-track format, participants will first critically analyze the AI/ML/IR applied research landscape (Kaggle, KDD, NeurIPS, TREC, CLEF) to identify viable shared tasks, foster team formation, and initiate research proposals. Simultaneously, a hands-on track develops essential skills for using Georgia Tech's PACE supercomputing cluster, including SLURM, Apptainer, and building ML/IR pipelines with PyTorch and Hugging Face.
- Track A (Applied Research Competition Discussion): Analyze various research competition platforms (e.g., Kaggle, CLEF, KDD Cup, NeurIPS Competitions, TREC), dissect methodologies and evaluation strategies from competition papers and reports, identify research gaps, and culminate in the formation of teams and development of a preliminary proposal for participation in a CLEF 2025 shared task.
- Track B (PACE & ML/IR Pipeline Development): Gain practical experience using the Georgia Tech PACE HPC environment (OnDemand, SLURM, Apptainer), build and evaluate a core ML/IR pipeline involving embeddings, transfer learning, fine-tuning, and semantic search, and utilize essential Python libraries (e.g., PyTorch, Hugging Face, scikit-learn, FAISS) and workflow tools.
The seminar culminates in students being equipped to propose and execute original research for competitive academic workshops.
Learning Outcomes
Upon successful completion of this seminar, students will be able to:
- Critically Evaluate Research and Platforms Analyze the structure, methodologies, and evaluation paradigms of applied AI/ML platforms (e.g., CLEF, Kaggle, NeurIPS), and critique diverse research outputs to identify strengths, weaknesses, and research opportunities.
- Design Research Proposals Develop structured research proposals for shared tasks, including problem framing, methodology, evaluation plans, and collaboration strategies.
- Apply Core ML/IR Concepts Understand key components of ML and information retrieval pipelines, such as embeddings, transfer learning, and evaluation metrics like MAP/NDCG.
- Leverage HPC and Engineering Tools Utilize the PACE HPC environment and foundational tools (SLURM, Apptainer, MLflow/WandB) for efficient experimentation and reproducibility.
- Collaborate and Communicate Effectively Use Git/GitHub for project collaboration and present technical findings clearly in both written and oral formats.
Prerequisites and Expectations
- Background: Familiar with machine learning and information retrieval concepts. Taken courses like Machine Learning, Deep Learning, Natural Language Processing, or Computer Vision.
- Programming: Intermediate proficiency in Python programming is required, including experience with libraries like NumPy, Pandas, and ideally some exposure to PyTorch or TensorFlow. Familiarity with basic command-line operations in a Linux environment is expected for PACE usage.
- Time Commitment: This is a seminar-style course requiring active participation. Expect to spend approximately 3-4 hours per week, including a 1-hour synchronous online meeting and 2-3 hours of asynchronous hands-on work, readings, and assignments. This aligns with typical OMSCS course expectations.
Required Materials & Technology
- Hardware: A reliable laptop or desktop computer meeting Georgia Tech's minimum requirements for online programs. Access to a stable, high-speed internet connection.
- Software:
- Modern web browser (Chrome, Firefox recommended).
- VSCode with Remote SSH extension.
- Access to Georgia Tech's PACE HPC environment (provided).
- GitHub account
- Readings: Course materials will primarily consist of online documentation, research papers (provided or accessed via GT Library), competition descriptions, and solution write-ups. No mandatory textbook purchase is required.
Schedule
Track A: Applied Research Competition Discussion
Date | Week # | Track A Topic | Deliverables |
---|---|---|---|
2025-08-18 | 1 | The "Why" of Applied Research & Initial Exploration | |
2025-08-25 | 2 | Deeper Dive into Research Platforms & Task Analysis | |
2025-09-01 | 3 | Analyzing Research Papers from Competitions | Labor Day |
2025-09-08 | 4 | Kaggle Solution Deconstruction & Strategy | CLEF Madrid |
2025-09-15 | 5 | CLEF & Academic Competition Methodology Review | |
2025-09-22 | 6 | Identifying Research Gaps & Opportunities Across Platforms | |
2025-09-29 | 7 | Initial CLEF Task Brainstorming & Focus | |
2025-10-06 | 8 | Fall Break | |
2025-10-13 | 9 | CLEF Task Shortlisting & Focused Literature Reviewing | |
2025-10-20 | 10 | CLEF Team Formation Dynamics & Roles | |
2025-10-27 | 11 | CLEF Proposal Structuring & Methodology Brainstorming | |
2025-11-03 | 12 | CLEF Proposal Peer Review Workshop & Refinement | |
2025-11-10 | 13 | CLEF Proposal Intensive & Finalization | |
2025-11-17 | 14 | CLEF Team Proposal Presentations | |
2025-11-24 | 15 | Thanksgiving | |
2025-12-01 | 16 | ARC spring team formation | |
2025-12-08 | 17 | End of term |
Track B: PACE & ML/IR Pipeline Development
Date | Week # | Track B Topic | Deliverables |
---|---|---|---|
2025-08-18 | 1 | Git/GitHub & Initial PACE Onboarding | |
2025-08-25 | 2 | VSCode Remote to PACE & Scientific Python Essentials | |
2025-09-01 | 3 | Embeddings/Representations & Introduction to SLURM | Labor Day |
2025-09-08 | 4 | EDA on Embeddings & Advanced SLURM Usage | CLEF Madrid |
2025-09-15 | 5 | ||
2025-09-22 | 6 | Transfer Learning with PyTorch & Hugging Face Trainer | |
2025-09-29 | 7 | ||
2025-10-06 | 8 | Fall Break | |
2025-10-13 | 9 | Parameter-Efficient Fine-Tuning (PEFT) in Practice | |
2025-10-20 | 10 | Semantic Search, IR Metrics (MAP/NDCG), ANN & Reranking | |
2025-10-27 | 11 | Apptainer for Advanced & Multimodal Workloads | |
2025-11-03 | 12 | Experiment Tracking (WandB/MLflow) & Workflow Management | |
2025-11-10 | 13 | HPC Job Monitoring (GPU), Debugging & PyTorch Memory | |
2025-11-17 | 14 | Compiling Module-wise Report & Presentation Preparation | |
2025-11-24 | 15 | Thanksgiving | |
2025-12-01 | 16 | ARC spring team formation | |
2025-12-08 | 17 | End of term |