Changes between Version 27 and Version 28 of Other/Summer/2024/lLM


Ignore:
Timestamp:
Jul 30, 2024, 4:10:25 PM (4 months ago)
Author:
talati
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Other/Summer/2024/lLM

    v27 v28  
    1161163. **Termination Control**: We also worked on adding more specific termination control for Maestros. Previously, running the end_experiment script would terminate all Maestros, or at least attempt to do so. Hence, even if only one was running, it would try and connect to each every Maestro, of which there are more than 30 total, and this process was inefficient. Hence, we wanted to add a feature that could shut off specified Maestros and give the user more control and efficient workflow when interacting with the Maestros. It is still a work in progress, with tr, and it should be finalized by the upcoming week.
    117117
    118 4. **Git Hub Documentation** We worked on finishing installing the dependencies for a Maestro and documenting it. Even though imaging was considered the best method to have all Maestros with the required dependencies to run, using the other method, installing dependencies from a fresh Raspberry Pi SD card, is still important to understand how the working Maestros came to be and to have a reference on how to create your own working Maestro for experimentation.
     1184. **Git Hub Documentation**: We worked on finishing installing the dependencies for a Maestro and documenting it. Even though imaging was considered the best method to have all Maestros with the required dependencies to run, using the other method, installing dependencies from a fresh Raspberry Pi SD card, is still important to understand how the working Maestros came to be and to have a reference on how to create your own working Maestro for experimentation.
    119119
    120120== Week 4
    121121
     122**Slideshow Link**:
     123https://docs.google.com/presentation/d/1Opw-jvDWLsCzsMahu0Ax9Z1Hh6cPiQextrTsCcV5_hI/edit#slide=id.g2ee2c68907a_0_0
     124**What we did this week:**
     1251. **Research Paper**: We read and annotated a Google DeepMind paper on Weighted Average Reward Models (WARM), a novel approach to develop and train reward models to mitigate reward hacking. The paper discusses the advantages of WARM over more traditional methods such as ensembling, which take the average output of various individual models, whereas WARM provides a single output using the weights and biases corresponding to multiple models. We aim to present this paper to Dr. Ortiz and team at our weekly meeting next Tuesday.
     126
     1272.
    122128== Week 5