I build large-scale distributed data systems and research natural language interfaces for structured data. My engineering work focuses on high-throughput, mission-critical data platforms, distributed storage, real-time pipelines, and federated query systems operating at scale.
My research at the University of São Paulo (USP) sits at the intersection of NLP and databases, specifically, how large language models can be fine-tuned to understand complex data schemas and translate natural language into executable SQL. I work on the problems that live at the boundary of computational linguistics, neural language modeling, and production data infrastructure.
- Natural Language Processing & Computational Linguistics — formal and statistical representations of language for machine reasoning
- Natural Language Interfaces for Databases — making structured data accessible through language
- LLM Fine-Tuning — adapting language models to specialized structured data domains (PEFT, LoRA, AdaLoRA)
- Semantic Parsing — mapping linguistic structure to executable queries
- Low-Resource NLP — Portuguese language support in text-to-SQL and related tasks
🔭 Natural Language Interfaces for Databases Fine-tuning and evaluation of LLMs for Text-to-SQL tasks, with a focus on complex schemas, cross-domain generalization, and Portuguese language support.
👉 github.com/datafromlopes/geo-nlq-to-sql
7+ years designing and operating production-grade distributed data systems:
- High-throughput transactional platforms (Apache Cassandra, billions of writes, microsecond latency)
- Lakehouse architecture (Apache Iceberg on AWS, federated queries with Trino)
- Real-time and batch ELT pipelines at scale
- Data platform reliability and performance engineering
Languages Python · C++ · SQL · Scala
Databases & Storage PostgreSQL/PostGIS · Apache Cassandra · MongoDB · MySQL
Distributed Systems & Processing Apache Spark · Kafka · Trino · Hive · Hadoop
Data Platform Apache Airflow · Apache Iceberg · DBT · AWS · Docker
ML & NLP PyTorch · HuggingFace Transformers · PEFT (LoRA, AdaLoRA, IA³)
- Natural language interfaces for relational and non-relational databases
- LLM fine-tuning and evaluation for specialized or low-resource domains
- NLP datasets and benchmarks in Portuguese
- Large-scale data platform architecture
⚡ Private pilot, aviation and aircraft systems enthusiast

