Changes between Version 31 and Version 32 of Other/Summer/2024/lLM


Ignore:
Timestamp:
Aug 6, 2024, 5:01:31 PM (4 months ago)
Author:
talati
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Other/Summer/2024/lLM

    v31 v32  
    122122**Slideshow Link**:
    123123https://docs.google.com/presentation/d/1Opw-jvDWLsCzsMahu0Ax9Z1Hh6cPiQextrTsCcV5_hI/edit#slide=id.g2ee2c68907a_0_0
     124
    124125**What we did this week:**
    1251261. **Research Paper**: We read and annotated a Google DeepMind paper on Weighted Average Reward Models (WARM), a novel approach to develop and train reward models to mitigate reward hacking. The paper discusses the advantages of WARM over more traditional methods such as ensembling, which take the average output of various individual models, whereas WARM provides a single output using the weights and biases corresponding to multiple models. We aim to present this paper to Dr. Ortiz and team at our weekly meeting next Tuesday.
     
    133134
    134135== Week 5
     136**What we did this week**
     1371. **Maestro Feature**: Began work on adding a log download feature for specific Maestros. This feature will allow for maintaining better system health and give future developers greater diagnostic capabilities to address any issues with the system. The feature will be integrated with an API endpoint which will be directly connected to a frontend interface for easier user access. These decisions were made to ensure system robustness.
     1382. **Research Paper**: Presented a research paper on Weighted Average Reward Models to Professor Ortiz and team. The paper discussed the benefits of WARM to traditional ensembling methods. Ultimately, the WARM model performs much better with improved efficiency in order to mitigate reward hacking. During the presentation, Professor gave us valuable advice regarding academic presentations, and taught us how to interpret and explain visual data in a concise manner to an audience.
     1393. **Final Presentation**: We worked on creating the final presentation for the WINLAB Open House on August 7th. This final presentation will encompass the high-level objectives of our project and will go into detail regarding the contributions our research group made during our 5 weeks at WINLAB. In addition to the slides, we worked on designing a poster for our research project which will be hung at WINLAB.