Computer Vision Research Engineer (PhD) with a strong background in image processing and deep learning. My doctoral research focused on remote sensing and 3D vision tasks (reconstruction, calibration, co-registration, change detection). More recently, my work has centered on neural rendering for remote-sensing imagery, cultural heritage digitization, and the development and application of generative AI methods.
Born in Barcelona (1995), I studied at Universitat Pompeu Fabra, completing with honors a BSc in Audiovisual Systems Engineering and a specialized MSc in Computer Vision. I moved to Paris in October 2018 to pursue a PhD at Centre Borelli (ENS Paris-Saclay) under the supervision of Gabriele Facciolo. I defended my thesis Applications of multi-image remote sensing in December 2022. In January 2024 I joined Eurecat where I currently contribute to Computer Vision and AI projects in Catalonia and Europe.
I am passionate about scientific writing and communicating research through peer-reviewed publications. Selected publication highlights are listed below.
ShinyNeRF advances NeRF-based 3D digitization of specular surfaces by phisically modeling isotropic and anisotropic reflections based on interpretable material/geometric parameters (normals, tangents, ASG anisotropy), enabling anisotropic material editing.
S-EO: A Large-Scale Dataset for Geometry-Aware Shadow Detection in Remote Sensing Applications
Elías Masquil, Roger Marí, Thibaud Ehret, Enric Meinhardt-Llopis, Pablo Musé, Gabriele Facciolo
CVPR Workshops, 2025
project page
/
paper
/
data
/
doi: 10.1109/CVPRW67362.2025.00224
This new dataset comprises multi-view satellite images (PAN, RGB), corresponding vegetation and shadow masks, bundle-adjusted RPC camera models and ground-truth DSMs for 702 different geographic areas of 500x500 m each across three different US cities.
Latent Diffusion Approaches for Conditional Generation of Aerial Imagery: A Study Roger Marí, Rafael Redondo
IPOL, 2025
paper
/
demo
/
code
/
doi: 10.5201/ipol.2025.580
We evaluate the fidelity and realism of different architectural variations of a latent diffusion model, which is used
to generate RGB aerial images conditioned to semantic maps.
Pseudo Pansharpening NeRF for Satellite Image Collections
Emilie Pic, Thibaud Ehret, Gabriele Facciolo, Roger Marí IGARSS, 2024
paper
/
doi: 10.1109/IGARSS53475.2024.10641439
EO-NeRF is extended to handle high-res panchromatic (PAN) and low-res multispectral (MS) inputs, eliminating the need for separate pansharpening. The resulting model can render pansharpened image surrogates with high-res color information for each input viewpoint.
We propose a generic regularization framework for NeRF based on differential geometry that outperforms previous state-of-the-art methods with only three input views. We compare our approach with RegNeRF (CVPR 2022).
We present EO-NeRF, that reveals scene geometry from multi-date satellite images with an unprecedented level of detail. We propose a geometrically consistent shadow model and a radiometric decomposition of the scene adapted to pansharpened satellite images.
Disparity Estimation Networks for Aerial and High-Resolution Satellite Images: A Review Roger Marí, Thibaud Ehret, Gabriele Facciolo
IPOL, 2022
paper
/
demo
/
doi: 10.5201/ipol.2022.435
We evaluate the performance of the deep learning architectures PSM (CVPR 2018) and HSM (CVPR 2019) for disparity estimation on multiple pairs of high-resolution satellite images.
Sat-NeRF: Learning Multi-View Satellite Photogrammetry With Transient Objects and Shadow Modeling Using RPC Cameras Roger Marí, Gabriele Facciolo, Thibaud Ehret
CVPR Workshops, 2022
project page
/
paper
/
code
/
poster
/
doi: 10.1109/CVPRW56347.2022.00137
Sat-NeRF is the first work in neural rendering for multi-date satellite images to demonstrate quantitatively convincing results in terms of surface reconstruction.
L1B+: A Perfect Sensor Localization Model for Simple Satellite Stereo Reconstruction from Push-Frame Image Strips Roger Marí, Thibaud Ehret, Jérémy Anger, Carlo de Franchis, Gabriele Facciolo
ISPRS Annals, 2022
paper
/
poster
/
doi: 10.5194/isprs-annals-V-1-2022-137-2022
We emulate a perfect sensor to generate a single image from a fragmented push-frame strip. The resulting product simplifies large-scale 3D modeling from push-frame imagery.
A Generic Bundle Adjustment Methodology for Indirect RPC Model Refinement of Satellite Imagery Roger Marí, Carlo de Franchis, Enric Meinhardt-Llopis, Jérémy Anger, Gabriele Facciolo
IPOL, 2021
paper
/
demo
/
code
/
doi: 10.5201/ipol.2021.352
We propose a generic bundle adjustment method for multi-view stereo pipelines for satellite images. The RPC camera models of the input views are refined with a rotation that compensates localization errors related to the attitude angles encoding the satellite orientation.
Automatic Stockpile Volume Monitoring Using Multi-View Stereo from SkySat Imagery Roger Marí, Carlo de Franchis, Enric Meinhardt-Llopis, Gabriele Facciolo
IGARSS, 2021
paper
/
doi: 10.1109/IGARSS47720.2021.9554482
The RPC camera models of a time series of SkySat acquisitions are refined and used to compute a surface model for each date, which is used to measure the stockpile volume.
Robust Rational Polynomial Camera Modelling for SAR and Pushbroom Imaging
Roland Akiki, Roger Marí, Carlo de Franchis, Jean-Michel Morel, Gabriele Facciolo
IGARSS, 2021
paper
/
code
/
doi: 10.1109/IGARSS47720.2021.9554583
We describe a terrain-independent algorithm to accurately derive the RPC camera model linking a set of 3D-2D point correspondences based on a regularized least squares fit.
To Bundle Adjust or Not: A Comparison of Relative Geolocation Correction Strategies for Satellite Multi-View Stereo Roger Marí, Carlo de Franchis, Enric Meinhardt-Llopis, Gabriele Facciolo
ICCV Workshops, 2019
project page
/
paper
/
poster
/
doi: 10.1109/ICCVW.2019.00274
This work investigates and compares different relative geolocation correction techniques for multi-view stereo pipelines for satellite images. We assess the impact on the output geometry.
Deep Single Image Camera Calibration with Radial Distortion
Manuel López-Antequera, Roger Marí, Pau Gargallo, Yubin Kuang, Javier Gonzalez-Jimenez, Gloria Haro
CVPR, 2019
paper
/
supp
/
doi: 10.1109/CVPR.2019.01209
We present a deep learning method to predict extrinsic (tilt and roll) and intrinsic (focal length and radial distortion) parameters from a single image. We use a parameterization that is better suited for learning than directly predicting the camera parameters.