Changes between Version 27 and Version 28 of Other/Summer/2024/lLM
- Timestamp:
- Jul 30, 2024, 4:10:25 PM (4 months ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Other/Summer/2024/lLM
v27 v28 116 116 3. **Termination Control**: We also worked on adding more specific termination control for Maestros. Previously, running the end_experiment script would terminate all Maestros, or at least attempt to do so. Hence, even if only one was running, it would try and connect to each every Maestro, of which there are more than 30 total, and this process was inefficient. Hence, we wanted to add a feature that could shut off specified Maestros and give the user more control and efficient workflow when interacting with the Maestros. It is still a work in progress, with tr, and it should be finalized by the upcoming week. 117 117 118 4. **Git Hub Documentation** We worked on finishing installing the dependencies for a Maestro and documenting it. Even though imaging was considered the best method to have all Maestros with the required dependencies to run, using the other method, installing dependencies from a fresh Raspberry Pi SD card, is still important to understand how the working Maestros came to be and to have a reference on how to create your own working Maestro for experimentation.118 4. **Git Hub Documentation**: We worked on finishing installing the dependencies for a Maestro and documenting it. Even though imaging was considered the best method to have all Maestros with the required dependencies to run, using the other method, installing dependencies from a fresh Raspberry Pi SD card, is still important to understand how the working Maestros came to be and to have a reference on how to create your own working Maestro for experimentation. 119 119 120 120 == Week 4 121 121 122 **Slideshow Link**: 123 https://docs.google.com/presentation/d/1Opw-jvDWLsCzsMahu0Ax9Z1Hh6cPiQextrTsCcV5_hI/edit#slide=id.g2ee2c68907a_0_0 124 **What we did this week:** 125 1. **Research Paper**: We read and annotated a Google DeepMind paper on Weighted Average Reward Models (WARM), a novel approach to develop and train reward models to mitigate reward hacking. The paper discusses the advantages of WARM over more traditional methods such as ensembling, which take the average output of various individual models, whereas WARM provides a single output using the weights and biases corresponding to multiple models. We aim to present this paper to Dr. Ortiz and team at our weekly meeting next Tuesday. 126 127 2. 122 128 == Week 5