wiki:Other/Summer/2024/aiBD

AI For Behavioral Discovery

Team: Adarsh NarayananUG, Benjamin YuUG, Elias XuHS, Shreyas MusukuHS

Advisors: Dr. Richard Martin and Dr. Richard Howard


Project Description & Goals:

The past 40 years has seen enormous increases in man-made Radio Frequency (RF) radiation. However, the possible small and long term impacts of RF radiation are not well understood. This project seeks to discover if RF exposure impacts animal behaviors. In this experimental paradigm, animals are subject to RF exposure while their behaviors are video recorded. Deep Neural Networks (DNNs) are then tasked to correctly classify if the video contains exposure to RF or not. This uses DNNs as powerful pattern discovery tools, in contrast to their traditional uses of classification and generation. The project involves evaluating the accuracies of a number of DNN architectures on pre-recorded videos, as well as describe any behavioral patterns found by the DNNs.


Weekly Progress:

Week 1 - https://docs.google.com/presentation/d/1oZaaNaLMyTjMO3_yzCU-_ruVrwQr0rTuVrc27HG6WAo/edit?usp=share_link

  • Created synthetic data to train a model to perform binary classification based on linear vs curved path (due to presence of distortion field) of a single bee flight to home.
    • Gathered further insight to how data preparation and model training can work for the real dataset.

Week 2 - https://docs.google.com/presentation/d/1BebSXbCDB7Z3yCCVYtAP1WkcEWYFKNNOK1TcxBEtSrs/edit?usp=share_link

software_pipeline no_distortion_field yes_distortion_field


Week 3 - https://docs.google.com/presentation/d/1gv2Mb9vWc3VottF-gdk-rGUH5jPb285WYxMsCsw6OnI/edit?usp=share_link

1 frame per sample:

a_fold_1 a_fold_2

4 frames per sample:

b_fold_1 b_fold_2

Confusion matrices (bias towards class 1, where field is on); Overfitted (expected because the scenario the data is trying to emulate is oversimplified for the complex model):

confusion_1 confusion_2


Week 4 - https://docs.google.com/presentation/d/1v5lVYUB6YdxdCAED8_bN5KRCGSDFeQI9UTT7IYCR_as/edit?usp=sharing

Calculated a hypothetical decision boundary

If our hypothetical recall ~= actual recall ⇒ hypothetical decision boundary might actually represent the model's decision boundary

Class 0 hypothetical recall (between curves / total actual class 0): 0.6070519810977826
Actual Class 0 Recall (tested among 500 samples): ~0.896
up-down-trajectories

Gathered grad-cam heat-maps

→ Gave us insight and confirmation that the home and bee were being used as features

gradcam


Week 5 - https://docs.google.com/presentation/d/1zLVgiYbtt4TZ1SGsb9uq4mg7eFy8aJ9Eb-sj8c6GtM4/edit?usp=sharing

Week 4's dataset results (left - 1 frame/sample, right - 4 frames/sample)

week4-1frm-result week4-4frm-result

First iteration of radial dataset - Radial entries, 200 entries per side, normalized vectors, fixed center home, fixed field magnitude, 4 possible field directions → significantly improved location distribution of both classes

(left - trajectory map, right - sped-up video of radial simulation)

1st iteration of radial dataset radial simulation sped-up gif

First radial iteration's training results (left - 1 frame/sample, right - 4 frames/sample)

1st radial 1 frm 1st radial 4 frm

Grad-CAM plot for one of the layers- model seems to be focused at background instead of bee

1st radial - gradcam

→ Similar result accuracies between 1 and 4 frames/sample could suggest that the model is not learning motion/sequence of frames


Week 6 - https://docs.google.com/presentation/d/1KhvYrNr8cXDZuFGPOcjqMmSxNhW-MUsw3jaT43cRWME/edit?usp=sharing

Discovered underlying issues with the simulation data that contribute noise and possibly mislead the model.

Issues include: numerous frames without the bee present (left) and multi-frame batches where a sample contains parts of 2 trajectories, rather than just a single trajectory (right).

week6 blank frame week 6 contiguous frames

Solutions: remove erroneous frames (where bee is not present. Find frame count per trajectory and shorten to a multiple of (sample frame length). E.g. if frames per sample is 4, shorten a trajectory to a multiple so that there is never overlapping between trajectories


Week 7 - https://docs.google.com/presentation/d/190Q4dk5QiZkNhdKbVkTFXgQYj4Q9lRnhFJMPzQQJzss/edit?usp=sharing

Preparing raspberry pi and scripts to begin camera data for ants & honeybees. Created a script that uses multiprocessing & OpenCV to quickly convert .h264 to .mp4 and create the dataset.csv

Created 2nd iteration of radial simulation. Many major improvements.

Improvements include: increased frames/sample from 4 to 7 (more frames for ML to learn curve), removed frames where bee was not present, each trajectory is a multiple of the frames/sample (no overlapping trajectories), better overall trajectory distribution (see trajectory plot)

radial 2nd iteration

Poor/abnormal training results— severe under-fitting. The testing accuracy is higher than the training

Second radial iteration's training results (left - 1 frame/sample, right - 7 frames/sample)

radial 2nd 1 frame result radial 2nd 7 frm result


Week 8 - https://docs.google.com/presentation/d/1KR-OetxvyUYUJDXgIzFuEu50IlTqxxUanz4t6_TT648/edit?usp=sharing

The setup for data collection. Features: Helmholtz coils to generate a uniform magnetic field, overhead camera positioning, raspberry pi, inside a small room with steady/constant lighting conditions

setup2 setup1

Data collection of ants (grayscale)

ants

Created 3rd iteration of radial simulation. Improvements: reduced banding within class trajectories, randomized speeds (3 options), greater distortion → improved overall distribution to reduce potential shortcut learning. (top - 2nd iteration, bottom - 3rd iteration (latest))

3rd iteration radial trajectory

Training details: used 80-20 split instead of K-fold to avoid data leakage. There is still a degree of under-fitting. (left - AlexNet, right - ResNet18)

week 8 alexnet result week 8 resnet result

Worst classification frames (below). Seems to confuse areas where the bee moves in relatively linear path for both classes.


Week 9 - https://docs.google.com/presentation/d/1ND2mShKl7sqPFndrb6Z6vwcD20V-8ldVLMqPlqIxgOg/edit?usp=sharing

Completed the new video sampler script. Now ran as 1 sbatch job and can finish 49 GB tar size for 354 videos in about 10 hours.

Ran a control training test (compass data). Obtained high accuracy → successful workflow test

Ran training on ant-data. Suspiciously high accuracy, possible overfitting Grad-CAM plots suggest model is looking at the background instead of the ants. There could be a possible lighting issue in the apparatus.

ant setup ant grad-cam

Added background-subtraction to honeybee data. Better models the simulated dataset. Subtraction should remove noise → improve model accuracy

bee gif bee bg gif

Last week's high-distribution simulation data training results:

varying-speed results

Compared to the less distributed data (~90% accuracy), the model seems to be learning by location rather than motion, as accuracy drops with better location distribution

→ Performed a control test (see below for results ~55%). Used single pixel to represent bee instead of blue ellipse. This removes orientation angle and other potential features to be learned. Ideally there should only be location and motion that exist as features.

single point results

Further control tests with single-pixel data: Naive-Bayes Probability Classifier (~55%; given x, y coordinates only), Decision Tree Model (~97%; given x, y coordinates only). Decision Tree's Decision Boundary plotted blow:

decision tree

→ Suggests that the location decision boundary is more complex than we think. Either the model is using a different feature or the decision boundary for location is highly complex

Last modified 4 months ago Last modified on Jul 29, 2024, 7:26:46 PM

Attachments (38)

Note: See TracWiki for help on using the wiki.