Other/Summer/2024/rT – Orbit

wiki:Other/Summer/2024/rT

Context Navigation

← Previous Version
View Latest Version
Next Version →

Version 25 (modified by kirthana_1, 18 months ago) ( diff )
—

Real time, robust and reliable (R3) machine learning over wireless networks

Group Members: Akshar Vedantham, Kirthana Ram, Varun Kota

Advisor: Anand Sarwate

Project Overview

As machine learning applications continue to be developed, more and more computationally intense tasks will have to be performed on mobile devices such as phones, cars, and drones. Mobile devices often offload data to the cloud to help execute these applications. However, offloading this process can result in delays and a lack of latency.

To reduce latency when working with the cloud, several methods have been proposed. The two that we will be focusing on are called split computing and early exiting. Our goal will be to construct AI/ML algorithms, implement them on Orbit nodes using split computing and early exiting, and build a documented codebase while evaluating the efficiency of these algorithms.

Weekly Progress

Week 1 (5/28 - 5/30)

Phones, cars, and other devices will want to start using ML/AI applications
Leveraging the cloud to help them with this
Issues - latency and security

Possible Solution - Early Exiting

Week 2 (6/03 - 6/06)

Familiarizing ourselves with Machine Learning concepts, PyTorch, neural network architecture, gradient descent, cost function, weights and biases
Met with our advisors, learned about their work, and discussed what projects we wanted to work on

Week 3 (6/10 - 6/13)

Created a NN using the MNIST dataset
Achieved an overall network accuracy of 98.17%
Worked on an NN for classifying fashion outfits via image recognition
Read several research papers given to us
Worked with Orbit to familiarize ourselves with communicating between nodes

Week 4 (6/17 - 6/20)

Created an NN using the CIFAR-10 dataset

Week 5 (6/24 - 6/27)

Compared the mean accuracy and standard deviation for different thresholds
Compared the mean number of early exits and standard deviation for different thresholds
Set up Nvidia CUDA on orbit nodes
Training and testing AI models on the nodes (Alternative to Google Colab)
Encountered hardware issues with measuring latency (PTP)

Week 6 (7/01 - 7/03)

Used Cosmos SB1, SB2 and Bed
Got PTP working on Bed, but it doesn’t have wireless connection
We need PTP AND wireless connection

Week 7 (7/08 - 7/11)

Experimented with new variables (feature variance and entropy)
Feature Variance represents the diversity of features detected by CNN layers (e.g., edges, textures, shapes)
Entropy measures the amount of disorder or uncertainty in a dataset

Week 8 (7/15 - 7/18)

Evaluated the results of our normal confidence model and our feature variance entropy model
Got PTP working on Orbit, not Cosmos
Got nodes to synchronize with each other, recorded latency
Was able to send and run code from one node to another via ssh

Week 9 (7/22 - 7/25)

Separate models to run on multiple nodes
Train models before setting up on nodes - FVE and Confidence Model
Resolve overfitting
Used SSH to have nodes access each other - very slow and huge security/privacy risk for client/server
Began using Restful API interface (sends data securely over the internet via json)
Orders of magnitude faster
Models stay running and send/receive data fast
More secure
Device can’t run or access programs on the server

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text