Changes between Version 28 and Version 29 of Other/Summer/2024/lLM
- Timestamp:
- Jul 30, 2024, 4:36:37 PM (4 months ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Other/Summer/2024/lLM
v28 v29 125 125 1. **Research Paper**: We read and annotated a Google DeepMind paper on Weighted Average Reward Models (WARM), a novel approach to develop and train reward models to mitigate reward hacking. The paper discusses the advantages of WARM over more traditional methods such as ensembling, which take the average output of various individual models, whereas WARM provides a single output using the weights and biases corresponding to multiple models. We aim to present this paper to Dr. Ortiz and team at our weekly meeting next Tuesday. 126 126 127 2. 127 2. **Sensor Testing**: We also spent a considerable portion of our time testing out the Maestro sensor suite, and verifying that all the sensors can transmit meaningful data to the Testbed server. We tested the following sensors: Accelerometer, Humidity, Temperature, Air Quality, Infrared Motion, RGB, as well as Audio. All of the sensors were working optimally, except audio, which kept returning values containing high entropy. We thoroughly tested the audio sensors by introducing extremely loud stimuli for a short period of time, and maintaining a near silent environment in between the stimuli. However, the streamed values showed no significant shift. This is currently a work in progress. 128 129 3. **Loguru Logging**: 130 128 131 == Week 5