My Portfolio

Monocular Depth

Monocular depth estimation using a single image

Monocular depth estimation from a single image input using deep encoder/decoder models, projected as a pointcloud using three.js. An indoors and an outdoors model is available(please see model setting under the Settings section) feel free to experiment with both depending on the input image

Note: Since these models runs locally on device, any uploaded pictures stay safely on the device. Please note that the first run would be much slower than subsequent runs as Tensorflow loads the necessary files. (WIP so there may be bugs!)

Settings:

Model:

Depth quality:

Select an image:	Upload an image:	Enter URL:

Prediction:

3D point cloud:

Details

This demo uses a Monocular depth estimation model (encoder decoder achitecture based on this paper) which is trained using a sequence of single camera views.

The training is fully self-supervised, I mostly use open datasets but augment them with data collected by myself and from other sources. Since I sometimes do not have the camera intrinsics (a vital requirement for training) for the image sequences I am using, I also use a part of the model to predict the camera intrinsics (using ideas based on this paper)

The output of the model is then converted to a 3D point cloud using the three.js library. Please note, at the moment only single images are used for inference, therefore, the currently deployed models cannot predict camera intrinsics (intrinsics prediction coming soon!) hence the 3D point cloud is only very approximate.

All models were trained and optimised in Pytorch and then converted for deployment on TensorFlow.js. For deployment, all the layers in the models that could be fused were fused to give the final optimised model.

Currently working on

Deploying models for prediction of camera intrinsics from any video/image sequence
Building a 3D map from any video/image sequence
On device depth estimation refinement using any video/image sequence
Deploying the demo as a React Native app
On device training of models using available video/image sequences (Privacy preserving training/Federated learning)

Credits

This demo was put together by Zawar Qureshi but could not have been done without the following:

I'd love to hear from people interested in making tools/apps using these models!