Context Navigation

Changes between Version 27 and Version 28 of Other/Summer/2024/lLM

Timestamp:: Jul 30, 2024, 4:10:25 PM (14 months ago)
Author:: talati
Comment:: —

Legend:

: Unmodified
: Added
: Removed
: Modified

Other/Summer/2024/lLM

-              v27
+              v28
 . **Termination Control**: We also worked on adding more specific termination control for Maestros. Previously, running the end_experiment script would terminate all Maestros, or at least attempt to do so. Hence, even if only one was running, it would try and connect to each every Maestro, of which there are more than 30 total, and this process was inefficient. Hence, we wanted to add a feature that could shut off specified Maestros and give the user more control and efficient workflow when interacting with the Maestros. It is still a work in progress, with tr, and it should be finalized by the upcoming week.
 . **Git Hub Documentation** We worked on finishing installing the dependencies for a Maestro and documenting it. Even though imaging was considered the best method to have all Maestros with the required dependencies to run, using the other method, installing dependencies from a fresh Raspberry Pi SD card, is still important to understand how the working Maestros came to be and to have a reference on how to create your own working Maestro for experimentation.
+. **Git Hub Documentation**: We worked on finishing installing the dependencies for a Maestro and documenting it. Even though imaging was considered the best method to have all Maestros with the required dependencies to run, using the other method, installing dependencies from a fresh Raspberry Pi SD card, is still important to understand how the working Maestros came to be and to have a reference on how to create your own working Maestro for experimentation.
 == Week 4
+**Slideshow Link**:
+https://docs.google.com/presentation/d/1Opw-jvDWLsCzsMahu0Ax9Z1Hh6cPiQextrTsCcV5_hI/edit#slide=id.g2ee2c68907a_0_0
+**What we did this week:**
+. **Research Paper**: We read and annotated a Google DeepMind paper on Weighted Average Reward Models (WARM), a novel approach to develop and train reward models to mitigate reward hacking. The paper discusses the advantages of WARM over more traditional methods such as ensembling, which take the average output of various individual models, whereas WARM provides a single output using the weights and biases corresponding to multiple models. We aim to present this paper to Dr. Ortiz and team at our weekly meeting next Tuesday.
+.
 == Week 5