Changes between Version 29 and Version 30 of Other/Summer/2024/lLM


Ignore:
Timestamp:
Jul 30, 2024, 6:50:34 PM (4 months ago)
Author:
talati
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Other/Summer/2024/lLM

    v29 v30  
    1251251. **Research Paper**: We read and annotated a Google DeepMind paper on Weighted Average Reward Models (WARM), a novel approach to develop and train reward models to mitigate reward hacking. The paper discusses the advantages of WARM over more traditional methods such as ensembling, which take the average output of various individual models, whereas WARM provides a single output using the weights and biases corresponding to multiple models. We aim to present this paper to Dr. Ortiz and team at our weekly meeting next Tuesday.
    126126
    127 2. **Sensor Testing**: We also spent a considerable portion of our time testing out the Maestro sensor suite, and verifying that all the sensors can transmit meaningful data to the Testbed server. We tested the following sensors: Accelerometer, Humidity, Temperature, Air Quality, Infrared Motion, RGB, as well as Audio. All of the sensors were working optimally, except audio, which kept returning values containing high entropy. We thoroughly tested the audio sensors by introducing extremely loud stimuli for a short period of time, and maintaining a near silent environment in between the stimuli. However, the streamed values showed no significant shift. This is currently a work in progress.
     1272. **Sensor Testing**: We also spent a considerable portion of our time testing out the Maestro sensor suite and verifying that all the sensors can transmit meaningful data to the Testbed server. We tested the following sensors: Accelerometer, Humidity, Temperature, Air Quality, Infrared Motion, RGB, as well as Audio. All of the sensors were working optimally, except audio, which kept returning values containing high entropy. We thoroughly tested the audio sensors by introducing extremely loud stimuli for a short period of time and maintaining a near silent environment in between the stimuli. However, the streamed values showed no significant shift. This is currently a work in progress.
    128128
    129 3. **Loguru Logging**:
     1293. **Loguru Logging**: We also implemented the Loguru library into our Testbed server to create a more organized script output. Previously, our output logs were not organized and this led to efficiency losses when trying to access the specific logs. However, using Loguru, we created an organized, timestamped way to log and debug the scripts. Additionally, we also added a function to locally download and clear old log files once unnecessary.
     130
     1314. **GitHub**: Completed the installation of all Maestro dependencies and the necessary documentation. The documentation was an important focus for this week because it allows newcomers to have an understanding as to how the working Maestros came to be. It also provides a reference on how to replicate this experiment to others in the community.
     132
    130133
    131134== Week 5