Publications

Text-image Alignment for Diffusion-based Perception

Published in arXiv preprint, 2023

We use automatically generated captions to improve the text-image alignment of a diffusion backbone in downstream visual tasks such as semantic segmentation, depth estimation and object detection. Our method also achieves improves the SOTA in both single-domain and cross-domain tasks.

Recommended citation: Neehar Kondapaneni, Markus Marks, Manuel Knott, Rogério Guimarães, & Pietro Perona. (2023). Text-image Alignment for Diffusion-based Perception. https://arxiv.org/abs/2310.00031

Method and System for an End-to-End Deep Learning Based Optical Coherence Tomography (OCT) Multi Retinal Layer Segmentation

Published in US Patent, 2023

We use a Transformer-based model to segment retinal layers from OCT scans. We process an image as 1D sequence of A-scans and treat each of them as a token, instead of processing a 2D image, which is more computationally efficient.