Skip to main content

课程作业

作业设计的目的和理念主要参考CMU CS11747课程。不仅理解关于大模型相关的重要概念,还能实现一些重要技术,并且培养 提出科学研究问题和创新的能力。

作业要求

团队合作

该课程一共有四个作业,其中作业1必须是单人完成,作业2,3,4必须由 2-3人 的团队完成。如果有组队上的困难,请联系老师或者TAs。

作业提交

通过Canvas提交作业,打包成一个包含如下信息的“zip”包,

  • 代码 (比如都放在名称为"code"的文件夹里面)
  • 系统输出结果 (作业1和作业2,格式具体任务中会指定)
  • PDF英文报告 (针对作业2,3,4,命名为“report.pdf”)
    • pdf需要为英文,推荐使用Overleaf (latex) 进行编辑,推荐模板
    • 允许使用chatgpt等AI工具进行辅助润色
    • 课程作业2 pdf页数推荐5页左右,课程作业3,4推荐长度7页左右(均不包括Appendix和Reference)
  • 一个存放github链接的 txt文件(比如命名为github.txt)
    • 增加这个需求是培养大家熟悉使用Github进行代码、项目管理的能力。
  • PPT和海报(只有作业4需要)
    • 海报推荐模板

迟交规则

  • 作业2,3,4迟交天数总计不超过5天

作业细节

作业 1: Build Your Own LLaMa

  • 设计动机:现在在做大模型相关研究时,市面上已经有了很多很现成的库,很多时候,我们实现自己功能的时候都是简单在做一些拼凑和修改,错过了 一些可以耐心学习相对网络结构原理的机会。而往往代码学习的越细致未来在创新这件事情上有的想象空间就更大。基于此,我们也希望学生可以通过作业1 学习和实现基本的网络结构。
  • 作业 (个人):我们目前采用CMU11-711的 Build Your Own LLaMa project.

作业2:Build Your Own Lima

  • 设计动机:没有经过对齐(Alignment)的大语言模型是鲜有价值的、也是不安全的,学会如何对齐大模型到人类价值是个必备的能力,该作业希望学生可以实现指令合成、指令精调、模型评价等功能
  • 作业(小组):通过自己构建数据、精调模型和评估实现大模型的对齐。

作业3: Project Proposal State-of-the-art Reimplementation

作业3我们采用CMU11-711作业模式,和评分标准,我们会提供一些选题,后续会放出。

Assignment 3 will involve two parts. (1) You will perform a literature survey on a topic of interest, and propose a project topic based on this literature survey. (2) You will reproduce the evaluation numbers of a competitive baseline model for a task related to this project topic (not necessarily the same). In other words, you must get the same numbers as the previous paper on the same dataset.

In your report, perform an analysis of what remaining errors this model makes (ideally with concrete examples of failure cases), and describe how what you plan to do in the final project will improve on this. If you are interested in tackling a task that does not have a neural baseline in the final project, you may also describe how you plan to adapt this model to the new task and, based on your error analysis, what difficulties you predict in doing so.

The grading rubric for the project proposal component is as follows:

A+: Exceptional or surprising. Goes far beyond most other submissions.
A: A survey that covers all the major relevant papers in the field and a well-grounded project proposal based on this survey.
A-: The survey has a good analysis but is missing a few pieces of relevant related work, or is quite complete but is lacking in critical analysis or forward directions.
B+: The survey is either quite lacking in coverage or analysis, or is decent but not complete in both aspects.
B or B-: The survey is lacking in both coverage and analysis, but does make an attempt to cover some related research.
C+ or below: Clear lack of effort or incompleteness.
The grading rubric for the reproduction component is as follows:

A+: Exceptional or surprising. Goes far beyond most other submissions.
A: Numbers that meet or exceeds the previously reported results. A comprehensive analysis of the results, and forward-looking plans for further development.
A-: Similarly, a complete re-implementation with competitive result numbers, but less analysis or forward-looking plans for development than assignments rewarded an A.
B+: An implementation and evaluation numbers exist, but they do not match previous work in the field. Or the analysis or forward-looking plans may be seriously lacking.
B or B-: Two or more of the above three elements are lacking.
C+ or below: Clear lack of effort or incompleteness.

作业4: Final Project

作业4我们采用CMU11-711作业模式,和评分标准

The final project work will be expected to be a novel contribution that either (1) introduces new techniques for one of the existing tasks in the assignment using a significant amount of technical sophistication utilizing one of the more advanced techniques introduced in the class, or (2) tackles a new NLP task with an NLP model that is motivated by the unique problems posed by the application domain, (3) applies an existing NLP method to a new language or domain with improvements specifically tailored to the unique challenges posed by that language or domain. The grading rubric is as follows:

A+: Exceptional or surprising. Goes far beyond most other submissions.
A: A respectable research contribution that is novel and effective, and could be submitted largely as-is as a paper to an academic conference.
A-: A respectable research contribution that has some small incomplete parts, but is largely complete and promising.
B+: An idea that is novel, but the results may not be there yet, or the analysis is short.
B or B-: Results, analysis, or novelty are lacking.
C+ or below: Clear lack of effort or incompleteness.
Negative Results: Sometimes experiments don’t work as planned. If you try hard to get positive results but are not successful, you may still get a good grade by clearly describing why you thought your methods would work, and then performing an analysis of why your initial assumptions were incorrect, leading to results that did not match your initial expectations. The bar for paper writing, experimentation, and analysis will be a bit higher in these cases, as we want to make sure that you really made a serious effort.

如果作业4完成后,有同学准备继续提高实验和报告质量 以便进行顶级会议投稿,可以联系TAs或者讲师,我们愿意提供可能的指导和建议。