{{{ }}} == **AI For Behavioral Discovery**\\ **Team**: Adarsh Narayanan^UG^, Benjamin Yu^UG^, Elias Xu^HS^, Shreyas Musuku^HS^ **Advisors**: Dr. Richard Martin and Dr. Richard Howard ---- **Project Description & Goals**:\\ The past 40 years has seen enormous increases in man-made Radio Frequency (RF) radiation. However, the possible small and long term impacts of RF radiation are not well understood. This project seeks to discover if RF exposure impacts animal behaviors. In this experimental paradigm, animals are subject to RF exposure while their behaviors are video recorded. Deep Neural Networks (DNNs) are then tasked to correctly classify if the video contains exposure to RF or not. This uses DNNs as powerful pattern discovery tools, in contrast to their traditional uses of classification and generation. The project involves evaluating the accuracies of a number of DNN architectures on pre-recorded videos, as well as describe any behavioral patterns found by the DNNs. ---- **Weekly Progress**:\\ **Week 1** - https://docs.google.com/presentation/d/1oZaaNaLMyTjMO3_yzCU-_ruVrwQr0rTuVrc27HG6WAo/edit?usp=share_link - Created synthetic data to train a model to perform binary classification based on linear vs curved path (due to presence of distortion field) of a single bee flight to home. - Gathered further insight to how data preparation and model training can work for the real dataset. ---- **Week 2** - https://docs.google.com/presentation/d/1BebSXbCDB7Z3yCCVYtAP1WkcEWYFKNNOK1TcxBEtSrs/edit?usp=share_link [[Image(Untitled drawing.png​, 20%)]] [[Image(dummy.0.0.gif​​, 20%)]] [[Image(dummy.8.0.gif​​, 20%)]] ---- **Week 3** - https://docs.google.com/presentation/d/1gv2Mb9vWc3VottF-gdk-rGUH5jPb285WYxMsCsw6OnI/edit?usp=share_link 1 frame per sample: [[Image(Figure_1.png, 20%)]] [[Image(Figure_2.png, 20%)]] 4 frames per sample: [[Image(Figure_1 (1).png​, 20%)]] [[Image(Figure_2 (1).png​, 20%)]] Confusion matrices (bias towards class 1, where field is on); Overfitted (expected because the scenario the data is trying to emulate is oversimplified for the complex model): [[Image(Screenshot 2024-06-13 at 2.18.28 PM.png, 20%)]] [[Image(Screenshot 2024-06-13 at 2.18.47 PM.png, 20%)]] ---- **Week 4** - https://docs.google.com/presentation/d/1v5lVYUB6YdxdCAED8_bN5KRCGSDFeQI9UTT7IYCR_as/edit?usp=sharing Calculated a hypothetical decision boundary If our hypothetical recall ~= actual recall => hypothetical decision boundary might actually represent the model's decision boundary Class 0 hypothetical recall (between curves / total actual class 0): 0.6070519810977826\\ Actual Class 0 Recall (tested among 500 samples): ~0.896\\ [[Image(unnamed.png​, 20%)]]​ Gathered grad-cam heat-maps -> Gave us insight and confirmation that the home and bee were being used as features [[Image(unnamed-2.png​, 20%)]] ---- **Week 5** - https://docs.google.com/presentation/d/1zLVgiYbtt4TZ1SGsb9uq4mg7eFy8aJ9Eb-sj8c6GtM4/edit?usp=sharing Week 4's dataset results (left - 1 frame/sample, right - 4 frames/sample) [[Image(unnamed-3.png​, 20%)]] [[Image(unnamed-4.png​, 20%)]] First iteration of radial dataset - Radial entries, 200 entries per side, normalized vectors, fixed center home, fixed field magnitude, 4 possible field directions -> significantly improved location distribution of both classes (left - trajectory map, right - sped-up video of radial simulation) [[Image(unnamed-5.png​, 20%)]] [[Image(2024-01-01070811.9.0-ezgif.com-video-to-gif-converter.gif, 20%)]] First radial iteration's training results (left - 1 frame/sample, right - 4 frames/sample) [[Image(unnamed-6.png​, 20%)]] [[Image(unnamed-7.png​, 20%)]] Grad-CAM plot for one of the layers- model seems to be focused at background instead of bee [[Image(unnamed-8.png​, 30%)]] -> Similar result accuracies between 1 and 4 frames/sample could suggest that the model is not learning motion/sequence of frames ---- **Week 6** - https://docs.google.com/presentation/d/1KhvYrNr8cXDZuFGPOcjqMmSxNhW-MUsw3jaT43cRWME/edit?usp=sharing Discovered underlying issues with the simulation data that contribute noise and possibly mislead the model. Issues include: numerous frames without the bee present (left) and multi-frame batches where a sample contains parts of 2 trajectories, rather than just a single trajectory (right). [[Image(unnamed-9.png​, 20%)]] [[Image(Screenshot 2024-07-22 at 11.34.16 AM.png​, 42%)]] Solutions: remove erroneous frames (where bee is not present. Find frame count per trajectory and shorten to a multiple of (sample frame length). E.g. if frames per sample is 4, shorten a trajectory to a multiple so that there is never overlapping between trajectories ---- **Week 7** - https://docs.google.com/presentation/d/190Q4dk5QiZkNhdKbVkTFXgQYj4Q9lRnhFJMPzQQJzss/edit?usp=sharing Preparing raspberry pi and scripts to begin camera data for ants & honeybees. Created a script that uses multiprocessing & OpenCV to quickly convert .h264 to .mp4 and create the dataset.csv Created 2nd iteration of radial simulation. Many major improvements. Improvements include: increased frames/sample from 4 to 7 (more frames for ML to learn curve), removed frames where bee was not present, each trajectory is a multiple of the frames/sample (no overlapping trajectories), better overall trajectory distribution (see trajectory plot) [[Image(unnamed-10.png​, 30%)]] Poor/abnormal training results-- severe under-fitting. The testing accuracy is higher than the training Second radial iteration's training results (left - 1 frame/sample, right - 7 frames/sample) [[Image(unnamed-11.png​, 30%)]] [[Image(unnamed-12.png​, 30%)]] ---- **Week 8** - https://docs.google.com/presentation/d/1KR-OetxvyUYUJDXgIzFuEu50IlTqxxUanz4t6_TT648/edit?usp=sharing The setup for data collection. Features: Helmholtz coils to generate a uniform magnetic field, overhead camera positioning, raspberry pi, inside a small room with steady/constant lighting conditions [[Image(unnamed-2.jpg​, 25%)]] [[Image(unnamed.jpg​, 25%)]] Data collection of ants (grayscale) [[Image(unnamed-13.png​, 23%)]] Created 3rd iteration of radial simulation. Improvements: reduced banding within class trajectories, randomized speeds (3 options), greater distortion -> improved overall distribution to reduce potential shortcut learning. (top - 2nd iteration, bottom - 3rd iteration (latest)) [[Image(Screenshot 2024-07-22 at 12.12.52 PM.png​, 45%)]] Training details: used 80-20 split instead of K-fold to avoid data leakage. There is still a degree of under-fitting. (left - AlexNet, right - ResNet18) [[Image(unnamed-14.png​, 34%)]] [[Image(unnamed-15.png​, 37%)]] Worst classification frames (below). Seems to confuse areas where the bee moves in relatively linear path for both classes. ---- **Week 9** - https://docs.google.com/presentation/d/1ND2mShKl7sqPFndrb6Z6vwcD20V-8ldVLMqPlqIxgOg/edit?usp=sharing Completed the new video sampler script. Now ran as 1 sbatch job and can finish 49 GB tar size for 354 videos in about 10 hours. Ran a control training test (compass data). Obtained high accuracy -> successful workflow test Ran training on ant-data. Suspiciously high accuracy, possible overfitting Grad-CAM plots suggest model is looking at the background instead of the ants. There could be a possible lighting issue in the apparatus. [[Image(Screenshot 2024-07-29 at 2.49.53 PM.png​​, 34%)]] [[Image(Screenshot 2024-07-29 at 2.48.26 PM.png​, 64%)]] Added background-subtraction to honeybee data. Better models the simulated dataset. Subtraction should remove noise -> improve model accuracy [[Image(no-background-subtract-ezgif.com-video-to-gif-converter.gif​, 40%)]] [[Image(background-subtract-ezgif.com-video-to-gif-converter.gif​, 40%)]] Last week's high-distribution simulation data training results: [[Image(unnamed-17.png​, 50%)]] Compared to the less distributed data (~90% accuracy), the model seems to be learning by location rather than motion, as accuracy drops with better location distribution -> Performed a control test (see below for results ~55%). Used single pixel to represent bee instead of blue ellipse. This removes orientation angle and other potential features to be learned. Ideally there should only be location and motion that exist as features. [[Image(unnamed-20.png​, 45%)]] Further control tests with single-pixel data: Naive-Bayes Probability Classifier (~55%; given x, y coordinates only), Decision Tree Model (~97%; given x, y coordinates only). Decision Tree's Decision Boundary plotted blow: [[Image(unnamed-19.png​, 45%)]]