Changes between Version 29 and Version 30 of Other/Summer/2024/lLM
- Timestamp:
- Jul 30, 2024, 6:50:34 PM (4 months ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Other/Summer/2024/lLM
v29 v30 125 125 1. **Research Paper**: We read and annotated a Google DeepMind paper on Weighted Average Reward Models (WARM), a novel approach to develop and train reward models to mitigate reward hacking. The paper discusses the advantages of WARM over more traditional methods such as ensembling, which take the average output of various individual models, whereas WARM provides a single output using the weights and biases corresponding to multiple models. We aim to present this paper to Dr. Ortiz and team at our weekly meeting next Tuesday. 126 126 127 2. **Sensor Testing**: We also spent a considerable portion of our time testing out the Maestro sensor suite , and verifying that all the sensors can transmit meaningful data to the Testbed server. We tested the following sensors: Accelerometer, Humidity, Temperature, Air Quality, Infrared Motion, RGB, as well as Audio. All of the sensors were working optimally, except audio, which kept returning values containing high entropy. We thoroughly tested the audio sensors by introducing extremely loud stimuli for a short period of time,and maintaining a near silent environment in between the stimuli. However, the streamed values showed no significant shift. This is currently a work in progress.127 2. **Sensor Testing**: We also spent a considerable portion of our time testing out the Maestro sensor suite and verifying that all the sensors can transmit meaningful data to the Testbed server. We tested the following sensors: Accelerometer, Humidity, Temperature, Air Quality, Infrared Motion, RGB, as well as Audio. All of the sensors were working optimally, except audio, which kept returning values containing high entropy. We thoroughly tested the audio sensors by introducing extremely loud stimuli for a short period of time and maintaining a near silent environment in between the stimuli. However, the streamed values showed no significant shift. This is currently a work in progress. 128 128 129 3. **Loguru Logging**: 129 3. **Loguru Logging**: We also implemented the Loguru library into our Testbed server to create a more organized script output. Previously, our output logs were not organized and this led to efficiency losses when trying to access the specific logs. However, using Loguru, we created an organized, timestamped way to log and debug the scripts. Additionally, we also added a function to locally download and clear old log files once unnecessary. 130 131 4. **GitHub**: Completed the installation of all Maestro dependencies and the necessary documentation. The documentation was an important focus for this week because it allows newcomers to have an understanding as to how the working Maestros came to be. It also provides a reference on how to replicate this experiment to others in the community. 132 130 133 131 134 == Week 5