I am a computer vision research engineer with a PhD and a strong background in image processing and deep learning. My thesis focused on remote sensing and 3D vision tasks (reconstruction, calibration, co-registration, change detection, etc.). My latest works focus on neural rendering techniques applied to remote sensing images, the digitization of cultural heritage and the exploration of generative AI.
I am from Barcelona (1995), former student of Universitat Pompeu Fabra. In Barcelona I completed with honors a BSc in Audiovisual Systems Engineering and then a specialized MSc in Computer Vision. I moved to Paris in October 2018 to pursue my PhD under the supervision of Gabriele Facciolo, at the Centre Borelli (ENS Paris-Saclay). I defended my thesis Applications of multi-image remote sensing in December 2022. In January 2024 I joined Eurecat where I currently contribute to the coordination and development of AI and computer vision projects in Catalonia and Europe.
I am interested in computer vision, machine learning, optimization, and image processing. I am particularly passionate about 3D models and the whole process behind their acquisition, generation and assessment.
S-EO: A Large-Scale Dataset for Geometry-Aware Shadow Detection in Remote Sensing Applications
Elías Masquil, Roger Marí, Thibaud Ehret, Enric Meinhardt-Llopis, Pablo Musé, Gabriele Facciolo
CVPR Workshops, 2025
project page
/
paper
/
data
This new dataset comprises multi-view satellite images (PAN, RGB), corresponding vegetation and shadow masks, bundle-adjusted RPC camera models and ground-truth DSMs for 702 different geographic areas of 500x500 m each across three different US cities.
Latent Diffusion Approaches for Conditional Generation of Aerial Imagery: A Study Roger Marí, Rafael Redondo
IPOL, 2025
paper
/
demo
/
code
We evaluate the fidelity and realism of different architectural variations of a latent diffusion model, which is used
to generate RGB aerial images conditioned to semantic maps.
Pseudo Pansharpening NeRF for Satellite Image Collections
Emilie Pic, Thibaud Ehret, Gabriele Facciolo, Roger Marí IGARSS, 2024
paper
EO-NeRF is extended to handle high-res panchromatic (PAN) and low-res multispectral (MS) inputs, eliminating the need for separate pansharpening. The resulting model can render pansharpened image surrogates with high-res color information for each input viewpoint.
A Generic and Flexible Regularization Framework for NeRFs
Thibaud Ehret, Roger Marí, Gabriele Facciolo
WACV, 2024
paper
/
code
/
poster
We propose a generic regularization framework for NeRF based on differential geometry that outperforms previous state-of-the-art methods with only three input views. We compare our approach with RegNeRF (CVPR 2022).
Multi-Date Earth Observation NeRF: The Detail Is in the Shadows Roger Marí, Gabriele Facciolo, Thibaud Ehret
CVPR Workshops, 2023
project page
/
paper
/
code
/
poster
We present EO-NeRF, that reveals scene geometry from multi-date satellite images with an unprecedented level of detail. We propose a geometrically consistent shadow model and a radiometric decomposition of the scene adapted to pansharpened satellite images.
Disparity Estimation Networks for Aerial and High-Resolution Satellite Images: A Review Roger Marí, Thibaud Ehret, Gabriele Facciolo
IPOL, 2022
paper
/
demo
We evaluate the performance of the deep learning architectures PSM (CVPR 2018) and HSM (CVPR 2019) for disparity estimation on multiple pairs of high-resolution satellite images.
Sat-NeRF: Learning Multi-View Satellite Photogrammetry With Transient Objects and Shadow Modeling Using RPC Cameras Roger Marí, Gabriele Facciolo, Thibaud Ehret
CVPR Workshops, 2022
project page
/
paper
/
code
/
poster
Sat-NeRF is the first work in neural rendering for multi-date satellite images to demonstrate quantitatively convincing results in terms of surface reconstruction.
L1B+: A Perfect Sensor Localization Model for Simple Satellite Stereo Reconstruction from Push-Frame Image Strips Roger Marí, Thibaud Ehret, Jérémy Anger, Carlo de Franchis, Gabriele Facciolo
ISPRS Annals, 2022
paper
/
poster
We emulate a perfect sensor to generate a single image from a fragmented push-frame strip. The resulting product simplifies large-scale 3D modeling from push-frame imagery.
A Generic Bundle Adjustment Methodology for Indirect RPC Model Refinement of Satellite Imagery Roger Marí, Carlo de Franchis, Enric Meinhardt-Llopis, Jérémy Anger, Gabriele Facciolo
IPOL, 2021
paper
/
demo
/
code
We propose a generic bundle adjustment method for multi-view stereo pipelines for satellite images. The RPC camera models of the input views are refined with a rotation that compensates localization errors related to the attitude angles encoding the satellite orientation.
Automatic Stockpile Volume Monitoring Using Multi-View Stereo from SkySat Imagery Roger Marí, Carlo de Franchis, Enric Meinhardt-Llopis, Gabriele Facciolo
IGARSS, 2021
paper
The RPC camera models of a time series of SkySat acquisitions are refined and used to compute a surface model for each date, which is used to measure the stockpile volume.
Robust Rational Polynomial Camera Modelling for SAR and Pushbroom Imaging
Roland Akiki, Roger Marí, Carlo de Franchis, Jean-Michel Morel, Gabriele Facciolo
IGARSS, 2021
paper
/
code
We describe a terrain-independent algorithm to accurately derive the RPC camera model linking a set of 3D-2D point correspondences based on a regularized least squares fit.
To Bundle Adjust or Not: A Comparison of Relative Geolocation Correction Strategies for Satellite Multi-View Stereo Roger Marí, Carlo de Franchis, Enric Meinhardt-Llopis, Gabriele Facciolo
ICCV Workshops, 2019
project page
/
paper
/
poster
This work investigates and compares different relative geolocation correction techniques for multi-view stereo pipelines for satellite images. We assess the impact on the output geometry.
Deep Single Image Camera Calibration with Radial Distortion
Manuel López-Antequera, Roger Marí, Pau Gargallo, Yubin Kuang, Javier Gonzalez-Jimenez, Gloria Haro
CVPR, 2019
paper
/
supp
We present a deep learning method to predict extrinsic (tilt and roll) and intrinsic (focal length and radial distortion) parameters from a single image. We use a parameterization that is better suited for learning than directly predicting the camera parameters.