AI Under Pressure: The Model Race, AI Agent Security Gaps, and New Alliances with Hollywood

The past seven days in the global artificial intelligence market have revealed how fast the model race is accelerating, the growing scale of security threats, and how capital and major corporations are rapidly shifting towards the new algorithm economy. Central to these events were the debut of OpenAI’s GPT-5.2, the launch of Google’s Gemini 3 Flash, a record report on attacks against AI agents, and Disney’s spectacular investment in OpenAI^[16] combined with a license to use iconic characters in the video model Sora.

Rapid Advances and Strategic Alliances in AI

Alongside the technological sprint, strategic alliances are tightening: Accenture is building the largest corporate partnership around AI agents, OpenAI has acquired the Polish startup Neptune.ai, and open-source models from China and the USA are challenging the established technological order. Meanwhile, regulators in Europe and the United States are adopting divergent regulatory strategies, directly influencing the costs and directions of AI solution development, including in Poland.

Behind these spectacular moves, funding is rising sharply— estimated already at over 210 billion dollars^[31] this year—with clearer signals of a possible speculative bubble. Investors raise valuations despite weak conditions in other sectors, betting that artificial intelligence will become a tool to reduce costs and drive productivity amid global economic uncertainty.

The launch of GPT-5.2 marked the strongest signal of a new phase in the competition between OpenAI and Google. OpenAI released the model on December 11, following several weeks of internal “code red” status declared after the November debut of Gemini 3. This internal narrative shows the model race has entered a phase of constant mobilization—each major release by one player promptly triggers a counterreaction by the other.

According to internal tests, the new GPT-5.2 generates 38% fewer hallucinations^[1] than GPT-5.1 and achieves 98.7% reliability in tool handling on the Tau2 benchmark. In long-context tests (up to 256,000 tokens), it approaches full accuracy, meaning the system can process very extensive technical documents, contracts, or complex project correspondence without losing track. OpenAI also aggressively competes on price: it introduces token caching, which reduces the cost of reusing the same prompt fragments by up to 90%, directly impacting competitors’ margins.

The standard rates for GPT-5.2 are $1.75 per million input tokens, targeting customers demanding the highest stability. Underlying the system is Microsoft Azure infrastructure and the latest NVIDIA GPU units: H100, H200, and the GB200-NVL72 platform. The scale and cost of this infrastructure will determine how long OpenAI retains its price-quality advantage.

The market consequences are clear: shortening release cycles from months to weeks, stronger pressure on price reductions, and product differentiation into “thinking” models—slower but more accurate—and fast instant versions. For Polish business users, this means a conscious choice between price and reasoning quality, and for Microsoft, the risk that any OpenAI delay or technological failure could jeopardize multibillion-dollar cloud infrastructure investments.

Google’s response was swift. On December 17, it launched Gemini 3 Flash Preview^[4], a new model variant designed for tasks prioritizing speed and cost. This marked the public debut of the flash Gemini 3 variant, designed as a direct competitor to fast models from OpenAI and other providers.

According to Google’s announcements, Gemini 3 Flash offers “frontier-class performance at a fraction of the cost”^[5], delivering capabilities close to the most powerful models but with lower resource consumption. The model advances multimodal capabilities—better handling images, spatial relations, and visual material analysis—and introduces so-called *”vibe coding”, generating application interfaces with output format choices directly by the model.

Simultaneously, Google develops Gemini 3 Deep Think, available since December 4 for top-tier AI Ultra subscribers. This variant focuses on iterative reasoning and longer inference chains, aiming to compete with OpenAI’s “thinking” approach. Together, the Flash and Deep Think models form a toolset intended to cover most corporate scenarios—from simple chatbots to complex analytic systems.

The result is further market fragmentation into three main categories^[7]: the most powerful frontier models (GPT-5.2, Gemini 3 Pro, and successors), fast and cheaper variants (Gemini Flash, GPT mini, Haiku models), and lightweight models optimized for low operational costs. For smaller businesses, this offers real possibilities to diversify away from a single platform, selecting models tailored to specific use cases without permanent vendor lock-in.

Multimodality—once a desirable add-on—is becoming standard. In the MMMU benchmark, which tests models’ ability to answer complex multi-domain questions based on text and images, leading systems like Claude or GPT-4o still don’t fully match domain experts but achieve around 77%, with human experts scoring between 76% and 88%. This clearly indicates AI more often complements human skills than fully replaces them.

Beyond the giants’ rivalry, open-source models, especially those developed outside the United States and EU, are gaining prominence. The Chinese company Deep. Seek with its Deep. Seek-R1 model symbolizes this shift. The first version released in late January 2025, with subsequent iterations like R1 V3.1 refined over the year, demonstrates peak performance is achievable beyond Silicon Valley.

Deep. Seek-R1 uses a mixture-of-experts architecture where the overall parameter count reaches hundreds of billions, with smaller subsets active at any moment—such as 37 billion parameters. Training relies mainly on reinforcement learning methods without classical supervised pretraining. As a result, it approaches reasoning performance on par with closed-source models like OpenAI o1.

The trade-off for this approach is higher token consumption—initially five to fifteen times greater than competitors—but the Deep. Seek team gradually releases distilled variants ranging from 1.5 to 70 billion parameters. These smaller models lower computation costs and enable deployment on modest infrastructure, including servers of medium-sized firms outside major cloud centers.

Deep. Seek-R1’s security level raises significant concerns. Tests show approximately 12% of its responses may contain harmful content, while one of OpenAI’s newest lighter models—o3-mini—records about 1.2%. The model’s open nature makes enforcing corrections or implementing compliance mechanisms difficult. Companies and regulators face a new question: how to enforce compliance requirements on a model whose weights are downloadable online but lack a single authoritative publisher?

The implications go beyond technology. Open weights models like Deep. Seek or China’s Qwen3, alongside Olmo or Gemma from the USA and Europe, pressure commercial vendors to reduce prices and improve quality. Asia, especially China and India, strengthen their positions in the global power balance. For European and Polish companies, this means more choices but also greater uncertainty about ecosystem stability and solution security without unified vendor control.

Running parallel to the model race is a newly revealed, highly concrete frontline: AI agent security. The latest global red-teaming benchmark, which collected 1.8 million attack attempts on agents^[13], exposes the scale of the issue. In over 60,000 cases, prompt injection attacks successfully breached security policies, and nearly all tested systems were compromised within a few to several dozen queries.

The test covered 22 different models across 44 scenarios—from simple data modification commands to complex task chains where agents had access to external tools like CRM systems, databases, or APIs. It became clear there is no simple correlation between model size and resilience. Some larger, more advanced models proved easier to deceive than their lighter counterparts, debunking the convenient myth that bigger models automatically mean better security.

This context is confirmed by the Akto report on corporate AI agent security published December 14. It shows that 69% of companies have deployed or piloted AI agents in processes ranging from customer service to finance and HR support. Yet only 21% have a full inventory of agents in use, while 79% lack transparency regarding agent permissions in internal systems. Around 80% of firms have no formal AI agent management policies or audit procedures.

The risk shifts from what the model says to what the agent does. An agent capable of modifying database entries, sending emails, initiating payments, creating helpdesk tickets, or altering cloud infrastructure settings becomes a new attack vector. A single successful malicious command injection can trigger cascades of actions that are hard to reconstruct and audit without precise monitoring.

The growing number of incidents and test results lead to the conclusion that AI agents require isolation and sandboxing, where every action is strictly limited and logged. For Polish companies, this means building new competences—from security engineers specialized in prompt protection to teams overseeing AI agent behavior continuously. Real-time monitoring, full interaction logging, penetration tests using prompt injection, and permission audits become essential elements of corporate governance.

Prompt injection—once considered an academic curiosity by some—is now the most common vulnerability exploited in modern AI systems. The attack involves injecting commands into the model input designed to override or bypass hidden system instructions. In agents capable of acting on external systems, the effectiveness of such attacks rises dramatically.

Analysis of 1.8 million attack attempts found that, with properly crafted task chains, prompt injection success rates can reach between 91% and 96%. Architectures particularly vulnerable to function calls show higher attack success rates than those using more limited data exchange protocols. The Context Protocol model, widely promoted as a secure way to link models with external tools, also requires independent auditing and rigorous security testing.

Researchers from MIT, Stanford, Google Research, and Microsoft are developing new defenses. Alongside classical input/output filters and blocklists, advanced techniques like Attention Tracker have emerged—a tool that tracks which prompt fragments the model focuses on most without costly extra inference. Its aim is to detect attempts to sneak malicious instructions hidden in seemingly innocent text fragments.

In the coming years, prompt security is expected to become a distinct specialization within cybersecurity teams. Companies will establish internal prompt injection testing procedures before deploying any new agent, with test results potentially becoming mandatory compliance documentation—on par with traditional network and web application penetration tests.

The week’s loudest partnership was the collaboration between Walt Disney Company and OpenAI. Under the deal, Disney will invest about 1 billion dollars in OpenAI shares while licensing over 200 iconic characters for use in the video generation model Sora. Characters include Mickey Mouse, Frozen heroes, Star Wars universe figures such as Yoda and stormtroopers, and Avengers characters.

AI Agents’ Security Challenges and Industry Responses

The agreement also includes deploying ChatGPT Enterprise for Disney employees to speed up internal processes and content production. However, strict limits were imposed: models cannot generate portraits or voices of specific actors, and the system must block content deemed inappropriate when linked with recognizable characters. Some creatives, including animator John Attanasio, warn against a “costless production chain” where studios gain vast video material with minimal professional creator input.

For competitors, this creates pressure to reach similar deals with generative model providers. Other film studios and platforms like You. Tube or Tik. Tok will likely seek ways to integrate AI models with their own universes and communities. Sora moves from beta to commercial phase, and users grow accustomed to high-quality video generation as a regular streaming platform offering.

At the corporate level, one of the most significant events is the strategic agreement between Accenture and OpenAI. The consulting firm, employing hundreds of thousands worldwide, becomes OpenAI’s main enterprise partner, building a new business transformation offer based on AI agents. Announced in early December, the key element is deploying ChatGPT Enterprise for tens of thousands of Accenture workers globally.

Accenture gains access to Agent Kit—a framework from OpenAI allowing creating and managing proprietary agents integrated with client systems. The new Flagship AI client program aims to help companies automate finance, HR, logistics, and customer service processes while preparing teams to work in environments where AI agents take on operational tasks. Accenture promises massive expansion of OpenAI training and certification for its consultants.

Financial markets reacted quickly: Accenture’s pre-market trading prices rose by several tens of percentage points. For Polish integrators and service firms, it signals global players transforming into distribution channels for frontier AI technologies, similar to the cloud computing wave. This presents opportunities to build niche competencies and local partnerships but also risks losing talent to international corporations offering global projects and certifications.

Accenture’s model assumes companies will need to restructure employment. CEO Julie Sweet admitted some workers will leave, while others will undergo intensive reskilling to meet new demands. Practically, there is growing demand for specialists combining business acumen with skills to design and oversee AI agents and integrate them with existing IT infrastructure.

Another sign of change is OpenAI’s acquisition of Polish startup Neptune.ai. Originating from Warsaw-based Deepsense, Neptune.ai has specialized since 2018 in ML-Ops tools for monitoring and managing machine learning model training. Its clients included Samsung, Roche, and HP, with total funding exceeding $18 million.

Announced in early December, the deal is valued below $400 million in stock, with details undisclosed. Neptune’s tools were already used internally by OpenAI to monitor training of large models. The acquisition aims for full integration of Neptune solutions into OpenAI’s training infrastructure, allowing tighter control over the entire model creation and improvement chain—from input data to security and quality metrics monitoring.

For the Polish tech scene, this is an important example of a successful deeptech exit. Neptune proves Poland can build world-class infrastructure products crucial for industry giants. At the same time, it signals that closed, highly specialized ML-Ops tools increasingly end up with tech giants, complicating smaller firms’ access to enterprise-grade solutions.

Against this backdrop, a new regulatory landscape is taking shape. The European Union is finalizing implementation of the AI Act, introducing a risk hierarchy for AI systems and steep financial penalties—up to 35 million euros or 7% of global turnover, whichever is higher. The first rules banning selected practices, such as social credit systems for citizens, will apply from February 2025. Rules for general-purpose AI systems (GPAI) take effect six months later, with full enforcement scheduled for August 2, 2026.

Simultaneously, the United States adopted a presidential regulation centralizing AI governance at the federal level and limiting states’ ability to enforce separate rules, such as those by California or Colorado. The Department of Justice plans to sue states attempting autonomous AI regulation. This means many U. S. AI startups will operate in a notably less compliance-burdened environment than their European rivals.

This poses a serious challenge for the European ecosystem. The AI Act is meant to boost social trust and prevent the riskiest applications. However, overly strict and costly requirements might drive innovative firms to relocate R&D outside Europe or delay new product launches within the EU. For Poland, aiming to build its own AI startup ecosystem, this necessitates balancing security demands and regulatory competitiveness.

Financially, 2025 will close with an impressive wave of AI company investments. By November, cumulative funding surpassed 210 billion dollars, noticeably higher than most other tech sectors. December saw further large rounds: PolyAI raised $86 million in seed funding for neuromorphic hardware, and Gradium secured $70 million for voice solutions.

Meanwhile, valuations of the biggest players are rising. OpenAI is valued around $500 billion, startup xAI about $200 billion, and Anthropic above $30 billion. Many younger companies receive hundreds of millions in funding despite limited revenue and unclear business models. For instance, FieldAI, developing humanoid robots, raised over $400 million before presenting a stable commercial product.

Analysts increasingly talk about an investment bubble. If by 2027 AI fails to deliver measurable returns at the economy-wide level—in cost savings or new revenues—some companies may need to reduce valuations in subsequent rounds or acquisitions. This could spur consolidation waves, with major players acquiring the most promising but underfunded early-stage assets.

For Europe, including Poland, this is both a risk and an opportunity. The local venture capital market rarely competes with American or Asian funds in scale. Yet examples like Neptune.ai show Polish deeptech companies can achieve successful exits through acquisitions by global leaders. Building firms at the infrastructure layer—tools essential regardless of dominant models or providers—will be crucial.

Aside from the main current, several weaker but notable signals may bear long-term significance. HSBC and French startup Mistral announced a multi-year partnership deploying Mistral models in the bank’s self-hosted environment, confirming the “private cloud” trend and preference for European providers in heavily regulated sectors.

Thomson Reuters and Imperial College launched a five-year AI research lab focusing on safety, reliability, and social impact. The Allen Institute for AI develops the open-source Olmo 3 line, offering full checkpoint transparency and training data openness, counterbalancing Deep. Seek’s open models. Google Deep Mind released the Gemma 3 series with models from 4 to 27 billion parameters, distributed via Hugging Face and competing in performance with closed models like Claude 3.7 Sonnet.

NVIDIA presented the Alpamayo-R1 model, combining image, language, and action processing for autonomous vehicles, sharing it open source. Byte. Dance started deploying its voice assistant Doubao on ZTE phones, boosting the mobile voice AI trend in Asia. Meanwhile, the Pindrop Security report highlights a surge in deepfake attacks—from one attack per client monthly in 2023 to an average of seven daily in 2025, including a high-profile case where a Hong Kong employee was tricked by a deepfake call into transferring $25 million.

Anthropic remains the third market force with the Claude 4.5 series available on Amazon Bedrock. Alibaba develops the Qwen3 line, achieving in many benchmarks results approaching GPT-4o and Deep. Seek, using expert architectures. This all paints a picture of an increasingly multipolar AI world, where dominance by only one or two providers grows less likely.

What does this dynamic landscape mean for the Polish reader—managers, IT professionals, regulators, investors, or educators? Firstly, IT and service sectors face a growing market: AI agent security. The scale of the gap—79% of organizations lack full access to agent permissions—creates huge demand for auditing, red-teaming, sandbox building, and continuous behavioral monitoring services. Polish integrators might draw on Accenture’s model, building certification practices and competencies around platforms like OpenAI or Google.

Secondly, regulators and data protection authorities, including the Personal Data Protection Office, must extend focus from classic cybersecurity to monitoring LLM logs and agent behavior. Red-teaming benchmarks clearly show model size isn’t a reliable safety measure; risk assessments require insights into system architecture, permissions, and supervision procedures.

With full AI Act enforcement from August 2026, European supervisory bodies will need not only lawyers but engineers and data analysts capable of interpreting safety test results, bias reports, and algorithmic audits. This mandates immediate investments in skilled personnel—both public and private sectors.

Investors and startups should view Neptune.ai as proof that Polish deeptech can succeed globally through acquisitions. However, with funding exceeding 200 billion dollars annually, a bubble might form if some companies fail to show credible profitability paths. Careful segment selection—focusing on infrastructure, open source, security tools, and compliance—offers more sustainable value than generic chatbots.

Education systems and HR teams must prepare for massive skill transformations. Accenture’s example, investing in OpenAI certifications alongside workforce restructuring, signals reskilling will become mandatory, not optional. Global training programs from model providers will likely become de facto qualification standards in Poland’s job market.

Finally, global threats carry local impact. A rapid rise in deepfake attacks—growing several hundred percent year-on-year—hits identity verification processes in banking and public administration. The tech race between the US and China, alongside Europe’s more cautious stance, will influence availability and costs of key components like GPUs and advanced models. Rising price competition for tokens and inference costs will force Polish firms to carefully calculate total AI solution ownership costs—from deciding between open and closed models, optimizing prompts and caching, to localizing infrastructure.

From a Polish organizational perspective, a simple but essential control question arises: do we know how many AI agents already operate in our company and what permissions they have? Have they undergone any red-teaming or penetration tests? Have we compared Gemini 3 Flash and GPT-5.2 capabilities against real business cases instead of mechanically choosing one platform? Does our compliance department know exact AI Act enforcement dates and have an implementation plan? Can we calculate the real cost of closed models versus open solutions like Deep. Seek? And finally—are our KYC and voice verification procedures robust against increasingly convincing deepfakes?

Answers to these questions in the coming months will determine whether Polish companies leverage the global AI revolution’s pace as a chance to ascend the value chain, or remain stuck firefighting issues caused by technology deployed without reflection on safety, cost, and responsibility.

Rapid Advances and Strategic Alliances in AI

AI Agents’ Security Challenges and Industry Responses

Sources