📊 Full opportunity report: AMÁLIA · The Three Hard Questions. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Portugal’s €5.5 million AMÁLIA large language model is now operational and outperforms several models in Portuguese tasks. However, fundamental questions about openness, native data sufficiency, and optimization goals remain unresolved, highlighting broader issues in European sovereign LLM efforts.
Portugal’s €5.5 million AMÁLIA large language model is now operational, outperforming several models on Portuguese benchmarks, but key structural questions about its openness, data sufficiency, and optimization focus remain unresolved, raising concerns about the broader European sovereign-LLM movement.
Developed by a consortium of approximately 60 researchers from Portugal’s leading research institutions, AMÁLIA was publicly announced in December 2024 and became operational in late 2025. The model, which handles text in European Portuguese, is built as a continuation of the EuroLLM multilingual foundation rather than trained from scratch. It has achieved superior performance on Portuguese benchmarks, surpassing models like Qwen 3-8B in most tasks, though it still trails on certain benchmarks like ALBA.
According to the technical report by Vieira et al. (2026), the training involved 107 billion tokens, with only about 5.8 billion tokens from Portuguese web archives, representing roughly 5.5% of the total. Supervised fine-tuning included approximately 17-18% Portuguese data, but native-language emphasis was limited. The final version is expected in June 2026, with ongoing development potentially addressing current gaps.
Despite these advancements, public analysis by Duarte O.Carmo has raised three critical questions about the project’s openness, native data sufficiency, and strategic objectives, questions that are central to the broader European sovereign-LLM landscape and remain unanswered at this stage.
AMÁLIA
The three hard
questions.
Portugal spent €5.5M to build a European Portuguese LLM. The base version is operational, the benchmarks beat Qwen 3-8B on most pt-PT tasks. So why are the most important questions still unanswered?
Last month, Duarte O.Carmo published the sharpest public analysis of AMÁLIA — Portugal’s state-funded European Portuguese large language model. He prefaces his critique with the necessary diplomatic apparatus before doing what almost nobody else in the European-sovereign-LLM discourse has been willing to do publicly: asking hard questions about whether the work, as released, actually does what it set out to do. This piece is a structural extension of his analysis. The AMÁLIA case study exposes three hard questions every national LLM effort needs to answer publicly — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
Three questions every national LLM effort needs to answer publicly.
Duarte O.Carmo’s framing maps cleanly onto the structural argument. Each question lands specifically in AMÁLIA — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
The three questions form a structural feedback loop. Q3 (optimization target) determines Q2 (data volume needed) which conditions Q1 (openness sufficient for community contribution). The European sovereign-LLM movement collectively benefits from these questions becoming standard methodology disclosure, not exceptional critique.
large language model development kit
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
107 billion tokens. 5.8 billion clearly pt-PT.
The structurally tractable question with a structurally surprising answer. For a model whose entire stated purpose is European Portuguese prioritization, the native-language share of extended pre-training is 5.5%. The implications cascade into every other question.
Portuguese language AI training data
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
The Olmo standard. AMÁLIA’s current state.
Allen Institute for AI’s Olmo project defines what “fully open” operationally requires. Olmo doesn’t lead frontier benchmarks. That’s not the point. The point is to be the structural reference for openness. AMÁLIA’s “fully open source” claim should track to the operational standard.
AI model fine-tuning tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Four strategic positions. AMÁLIA between two and three.
Approximately €100M+ in publicly disclosed European sovereign-LLM funding across the major initiatives. The structural question every project faces: what is the actual competitive position you’re staking? Four options — none mutually exclusive — but each requiring different commitments.
European language NLP software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three standards. For AMÁLIA and the movement.
The structural critique generalizes beyond AMÁLIA. Italy, France, Germany, Switzerland, the OpenEuroLLM consortium, and every subsequent national project benefit from public discourse holding national LLM efforts to operational standards on openness, data accounting, and strategic positioning.
The European sovereign-AI agenda is a serious strategic project that deserves serious public discourse. O.Carmo’s analysis is what serious public discourse looks like. Appropriately diplomatic. Structurally rigorous. Willing to ask the hard questions in public when the public investment justifies it. More of this is needed — across every European sovereign-LLM project, not just AMÁLIA.
Implications for European Sovereign-LLM Strategies
The development of AMÁLIA exemplifies the broader European effort to build sovereign-language large models, with significant public funding and institutional involvement. Its performance indicates progress in Portuguese NLP, but the unresolved structural questions highlight ongoing debates about transparency, data adequacy, and strategic focus. These issues matter because they influence how European nations approach AI sovereignty, model openness, and language-specific AI development, shaping future policies and investments.
European Sovereign-Language Model Initiatives Face Core Challenges
Across Europe, multiple countries have launched or announced projects to develop native-language large language models, including Italy’s Minerva, Germany’s Aleph Alpha, France’s Mistral, and others. These efforts aim to foster AI sovereignty and reduce reliance on US or Chinese models. However, most projects are still in early stages, with ongoing debates about training approaches—whether from scratch or via continuation—and the transparency of data sources and model openness. The case of AMÁLIA is notable because it involves significant public investment and aims to serve Portugal’s academic and governmental sectors, but also exemplifies persistent structural questions about the nature of openness and data sufficiency in national models.
“The questions about openness, native data, and strategic goals are not just technical—they are foundational for the future of European AI sovereignty.”
— Duarte O.Carmo
Unanswered Questions About Model Openness and Strategy
It is not yet clear how open AMÁLIA will ultimately be—whether the model and training data will be fully accessible to the public—and how the project’s strategic priorities might evolve before the final release in June 2026. Additionally, the broader implications for European AI sovereignty remain under discussion, with some stakeholders questioning whether current approaches are sufficient to ensure independence and transparency.
Next Milestones and Ongoing Evaluations
The final version of AMÁLIA is scheduled for release in June 2026, which will likely include more detailed evaluations and possibly increased transparency. Over the next 12-24 months, the project team may address current gaps, expand native-language data, and clarify openness policies. Broader European initiatives will also continue to evolve, with policymakers and researchers closely monitoring these developments to shape future AI strategies.
Key Questions
What is the current status of AMÁLIA?
The base version is operational, publicly available to academic users, and has demonstrated superior performance on Portuguese benchmarks. The final version is expected in June 2026.
How does AMÁLIA compare to other European models?
It outperforms many models like Qwen 3-8B on most Portuguese benchmarks but still trails on some, such as ALBA. Its development as a continuation of a multilingual foundation distinguishes it from models trained from scratch.
What are the main concerns about AMÁLIA?
Key concerns include the limited amount of native Portuguese data used in training, questions about how open the model and data will be, and whether strategic goals align with broader European sovereignty aims.
What are the broader implications for Europe’s AI efforts?
AMÁLIA exemplifies the challenges faced by European countries in balancing performance, transparency, and sovereignty. Its development will influence future policies on open models and native-language AI research across Europe.
Source: ThorstenMeyerAI.com