新規

Agent S: コンピュータを人間のように利用するオープンエージェントフレームワーク

2025年2月27日

サケト・アガシェ*、

九州漢*、

ヘイ!数か月前、プリンストン大学でエージェントとSimularについての私の考えについて講演しました。要約をまとめてブログ記事にするべきだと思いました。

最先端のパフォーマンス

私の最初の仕事は、Google DeepMindのリサーチサイエンティストでした。そこでの私の重要な役割は、さまざまなGoogle製品チームと協力して、最先端のAIテクノロジーを適用する機会を見つけることでした。しかし、あるGoogle社員から全く関係のない質問があり、それが最終的にDeepMindを辞めてSimularを始める決断のきっかけになったのかもしれません。

エージェント S は新しいエージェント
フレームワーク可能になるように設計
として使用するコンピュータ
人間のように直感的に

エクスペリエンス拡張型階層計画法を紹介します。この方法では、頻繁に変更されるソフトウェアやウェブサイトに関する最新情報を提供するオンライン・ウェブ・ナレッジと、過去のやりとりから得たハイレベルな体験を活用するためのナラティブ・メモリーを活用します。エージェントSは、複雑なタスクを管理しやすいサブタスクに分割し、エピソード記憶を使用して段階的なガイダンスを行うことで、継続的にアクションを改良し、経験から学び、適応可能で効果的なタスクプランニングを実現します。

要約

オープンエージェントフレームワークであるエージェントSを紹介します。自律的な対話を可能にする複雑な複数ステップのタスクを自動化することにより、人間とコンピューターの相互作用を変革することを目的としたグラフィカルユーザーインターフェイス（GUI）によるコンピューターとの連携

この目的のために、エージェントSは、外部知識検索と内部経験検索から複数のレベルで学習する経験拡張階層計画を導入し、効率的なタスク計画とサブタスク実行を促進します。

さらに、マルチモーダル大規模言語モデルに基づくGUIエージェントの推論と制御機能をよりよく引き出すために、エージェント-コンピューターインターフェースを採用しています。OsWorld ベンチマークの評価によると、Agent S はベースラインの成功率を 9.37% 上回り (相対的に 83.6% 向上)、新しい最先端技術を実現しています。包括的な分析により、個々のコンポーネントの有効性が明らかになり、将来の改善のための洞察が得られます。

さらに、Agent S は、新しくリリースされたオペレーティングシステムでさまざまなオペレーティングシステムに幅広く汎用化できることを示しています。
ウィンドウズエージェントアリーナのベンチマーク。

エージェントSは、コンピュータタスクの自動化における3つの主要な課題に対処します。

タスク指示

私を助けてアカウントを削除する「anonym-x2024@outlook.com」

概要エージェント S フレームワークの

タスク Tu と初期環境観察 0 o が与えられたら、マネージャーは Web 知識とナラティブメモリを使用して経験に基づく階層計画を実施し、サブタスク So,..., Sn を作成します。ワーカー Wi は Si ごとに、エピソード記憶から抽出して時間 t でのアクションを生成します。このアクションは ACI によって実行され、次の即時観測値 ot+1 が返されます。自己評価モジュールは、要約されたサブタスクとフルタスクの軌跡を物語記憶とエピソード記憶に保存することでループを閉じます。

のパイプラインメモリ構築と更新

メモリの構築と更新のパイプラインには、自己監視による探索と継続的なメモリ更新という2つのフェーズが含まれます。最初の物語記憶とエピソード記憶は、探索段階でランダムにキュレーションされたいくつかのタスクによって構築され、その後、推論タスクに基づいて継続的に更新されます。

Pipeline of Memory Construction and Update

主な結果

この表は、OsWorldテストセット全体で評価された、Agent Sとベースラインモデルのパフォーマンスの比較を示しています。GPT-4o モデルでは、エージェント S の全体的な成功率は 20.58% で、対応する最良のベースライン (GPT-4o で 11.21%) のほぼ2倍のパフォーマンスを達成しています。

エージェントSは、「デイリー」タスクと「プロフェッショナル」タスクで一貫してベースラインを上回り、成功率はそれぞれ 27.06% と 36.73% に達しています。これに対し、ベースラインの最高結果は 12.33% と 14.29% でした。これらのタスクは日常生活でよく使われたり、知識を大量に消費するプロフェッショナルアプリケーションに関係するもので、エージェントSの検索増強のメリットがより大きくなります。Claude-3.5-SonnetとGPT-4oはどちらも、大部分のタスクでベースラインバージョンよりも優れています。Claude-3.5-Sonnetは、「デイリー」タスクや「プロフェッショナル」タスクでも、GPT-4Oよりも優れたパフォーマンスを発揮します。

この結果は、基本アプローチよりも多様で複雑なタスクをより効果的に処理するエージェントSの能力が強化されていることを示しています。

369のテスト例すべてを含むOSWorldフルテストセットでの成功率（％）の主な結果

分析

エージェントSの個々のモジュールの有効性を実証するために、65のサブセットを層別化してサンプリングしました。
アブレーションスタディの全テストセットからインスタンスとテストサブを推論コストを考慮して、GPT-4Oをそのまま利用しました。
ベースラインとエージェントSの両方のすべてのアブレーション研究に対応するLLMバックボーン

経験から学ぶことで、GUIエージェントのドメイン知識が深まる

Main results of Successful Rate (%) on the OSWorld full test set of all 369 test examples

369のテスト例すべてを含むOSWorldフルテストセットでの成功率（％）の主な結果

Webの知識として利用できる普遍的な経験から学ぶことで、エージェントSは幅広いタスクにわたって情報に基づいた計画を立てることができ、最も大きな影響を与えます。物語記憶とエピソード記憶からの学習は、ウェブ検索と効果的に相乗効果を発揮し、その結果から、それらのアブレーションがエージェントの複雑なタスク処理能力にどのように影響するかが詳しく説明されており、体験学習の価値が強調されています。これらの結果は、各コンポーネントがエージェントのドメイン知識を高める上で重要な役割を果たすことを示しています。3 つのコンポーネントすべて (すべて除く) を削除するとパフォーマンスが大幅に低下し、設計における経験から学ぶことの重要性が明らかになります。

ACIはLLMのより優れた推論能力を引き出す そして、より優れたエージェント・ラーニングをサポートします。

ベースラインをエージェントS（ACIのみ）と比較すると、ACIを組み込むことで達成される推論能力の向上が浮き彫りになります。さらに、体験学習プロセスを統合することにより、ACIがエージェント学習に与える影響を調べました。ベースラインとしては、体験学習を追加すると、全体的なパフォーマンスが若干向上しました。しかし、エージェントS（ACIのみ）に追加するとパフォーマンスが大幅に向上し、エージェント型学習の強化におけるACIの有効性が実証されました。

階層型計画サポート
長期ワークフロー

のACI専用+体験学習設定では、階層型プランニングを行わない場合のエージェントSのパフォーマンスと、観察されたパフォーマンスの低下が示されています。 エージェント S 全体と比較すると (26.15% から 20.00%)、長期的なワークフローのモデル化における階層計画の重要性が強調されています。マネージャーはサブタスク計画段階でより詳細で正確な計画を作成できるため、経験学習があると、階層的定式化の効果が顕著になります。

記憶構築には、探索、継続的な記憶更新、自己評価が不可欠です

探索を削除すると、メモリの更新は推論フェーズのみに制限されます。継続的なメモリ更新を削除するということは、探索フェーズで取得したメモリだけを使用し、その後の更新は行わないということです。自己評価基準を削除すると、要約された体験を元の完全な軌跡に置き換える必要があります。その結果、継続的なメモリ更新フェーズと自己教師付き探索フェーズの両方を調整するとパフォーマンスが低下し、自己教師付き探索のほうがはるかに影響が大きいことが明らかになりました。Self-Evaluatorのアブレーションは、計画に完全な軌跡の模範ではなく、要約された軌跡を使うことの利点をさらに示しています。

異なるものへの一般化オペレーティングシステム

エージェントSフレームワークは、私たちの作業と同時にリリースされたWindows OSベンチマークであるWindowsAgentArenaで、変更なしでテストしています。Agent S を MLLM バックボーンとして GPT-4o、入力としてアクセシビリティツリー + イメージを入力、OCR で解析した同様の構成と比較します。表に示すように、新しい Windows 環境に適応しなくても、エージェント S は Navi エージェントよりも優れています。

Results of Successful Rate (%) on WindowsAgentArena using GPT-4o and Image + Accessibility Tree input on the full test set of all 154 test examples

GPT-4Oとイメージ+アクセシビリティツリーの入力をフルに使用したWindowsAgentArenaでの成功率 (%) の結果

ビブテックス

@misc {エージェント、
  title= {Agent S: コンピュータを人間のように利用するオープンエージェントフレームワーク},
  author= {Saaket Agashe*、Jiuzhou Han*、Shuyu Gan、Jiachen Yang、Ang Li、Xin Eric Wang}、
年= {2024},
  スプリント= {}、
  アーカイブプレフィックス = {arXiv}、
  プライマリクラス = {cs.AI} 
}

Understanding the AI Agentic Framework

The AI agentic framework is a modern approach that combines artificial intelligence (AI) with agent-based modeling. This combination aims to improve decision-making processes. With this framework, intelligent agents can work on their own within a system, which makes workflows smoother and promotes collaboration. By using machine learning and automation, the agentic framework creates a solid foundation for developing multi-agent systems that adjust to various situations.

Here are some key components of this framework:

Intelligent Agents: These software entities can take independent actions to achieve specific goals.
Decision-Making Algorithms: These algorithms help agents make informed choices based on the information they receive.
Agent Systems: This refers to groups of interconnected agents collaborating to complete complex tasks.

Microsoft and other tech leaders are using this framework to create smarter applications that need less human involvement.

Key Concepts of the Agentic Framework

The agentic framework includes several important concepts that are essential for its successful application:

Agent-Based Framework: A setup where individual agents work together to accomplish tasks, boosting efficiency.
Agentic Approach: This method encourages agents to act independently and highlights their ability to learn and adapt.
Workflows: Built in AI workplace assistants, these are the planned paths that agents follow to enhance processes and ensure smooth task execution.
Human-Agent Interaction: This is how humans communicate and guide the agents.

By incorporating languages like Python, developers can effectively use design patterns, adaptive agents, and debugging methods. This integration helps create better feedback loops and improves the overall performance of the system.

Applications of AI Agentic Framework

The applications of the AI agentic framework are broad and relevant across various fields:

AI Framework Variations: Different types can be adjusted to meet specific industry needs, ensuring flexibility.
AI Solutions: From virtual assistants to intricate management systems, these solutions expand operational possibilities.
Agent Orchestration: This involves coordinating multiple agents to achieve unified results.
Security and Management: The framework helps boost organizational efficiency while upholding security standards.

Prominent examples include platforms like GitHub and tools such as Langchain, showcasing how agentic AI can be implemented in real-world settings. These applications illustrate how intelligent systems can reshape business functions and enhance user experiences.

Benefits of Using an Agentic Framework

Using an agentic framework comes with many advantages:

Efficiency: It increases productivity by automating repetitive tasks, reducing the need for manual work.
Quality Management: The framework ensures consistent quality in results through structured processes.
Continuous Integration: Updates and improvements become easier, keeping the systems current and effective.
Cooperative Agents: It encourages collaboration among different agents, leading to improved problem-solving abilities.

This framework also addresses ethical concerns in AI, promoting transparency and responsible use of self-learning agents.

Challenges in Implementing Agentic Frameworks

While there are clear benefits, organizations may face a few challenges when adopting agentic frameworks:

Data Privacy: Protecting sensitive data is critical when implementing intelligent systems.
AI Governance: Setting regulations is necessary to manage the proper use and oversight of AI technologies.
Agent Performance Metrics: Finding suitable metrics to measure how well agents perform their tasks is essential.
Real-Time Agents: Managing agents in fast-paced environments requires advanced strategies and resilient systems.

Tackling these challenges is vital for successfully integrating the AI agentic model into existing systems to ensure safety and trustworthiness.

Conclusion

The AI agentic framework shows promise in the realm of artificial intelligence by providing a structured way to effectively utilize intelligent systems. By grasping its core concepts, applications, benefits, and challenges, organizations can better leverage AI to foster innovation and enhance efficiency.

Feel free to explore more about the AI agentic framework or share your opinions in the comments! Your questions and insights are valuable as we move forward in this exciting field.

Understanding the AI Agentic Framework

The AI agentic framework is a collection of ideas and methods aimed at creating intelligent systems that can act and make decisions on their own. This framework enhances collaboration between human users and artificial intelligence (AI) agents, promoting smooth workflows and effective automation.

Key aspects of the agent-based framework include:

Intelligent Agent Frameworks: These form the foundation for developing AI solutions that function in real-time.
Collaboration Mechanisms: Good communication among multiple agents boosts system performance.
Human-Agent Interaction: This part emphasizes how people can work alongside cognitive agents, leading to better experiences.
Multi-Agent Systems: Different agents work together to accomplish complex tasks, which may be too much for a single agent to handle.

You can see real-world applications of this framework in areas like healthcare, finance, and logistics, where AI applications enhance processes, lower mistakes, and improve results.

Key Components of an Agentic Approach

An agentic approach consists of essential components that define how it works and its effectiveness.

Agent Autonomy: The level of independence an agent has is crucial for effective automation.
Decision-Making Algorithms: These allow agents to evaluate situations and make smart choices based on current data.
Agent-Based Modeling: This method helps simulate interactions within a system, improving understanding and optimization.
Design Patterns: Established design patterns assist with programming agent systems, making them easier to maintain and scale.
Agent Cooperation: Successful implementation depends on agents working together smoothly.

A strong agentic model includes these components, enabling powerful agent technologies that drive innovation across various sectors.

Applications of the AI Agentic Framework

The AI agent framework has many applications across different sectors, highlighting its flexibility and effectiveness.

Some noteworthy examples are:

Project Management: AI agents improve project workflows, ensuring tasks are completed quickly and on time.
Data Privacy: Intelligent agents help manage sensitive data while ensuring compliance with regulations like GDPR.
Autonomous Agents: These self-operating agents take care of repetitive tasks, such as entering data so that humans can concentrate on strategic work.
Task-Oriented Agents: Designed to perform specific functions, these agents carry out tasks with great accuracy.

Leading companies like Microsoft and Nvidia utilize the agentic AI framework, showing how AI capabilities can be integrated effectively into their operations.

Benefits of Implementing Agentic Systems

Implementing agentic systems brings a variety of benefits that can boost efficiency and effectiveness in organizations:

Automation: Cuts down on manual work, speeding up task completion.
Ease of Use: Built with user experience in mind, making acceptance simple.
Real-Time Analytics: Offers instant feedback, supporting data-driven decisions.
AI Ethics: Complies with ethical standards, building trust with users.
Performance Metrics: Measures agent effectiveness, promoting continuous improvement.

These benefits explain why many organizations are adopting agentic variations to stay competitive in their fields.

Challenges and Considerations

While the agentic framework offers many chances for improvement, it also presents challenges that businesses should think about:

Security Risks: Protecting data and systems from cyber threats is crucial.
Complexity: Creating and implementing multi-agent systems can be intricate and time-consuming.
Data Governance: Organizations must follow regulations and best practices for data management.
AI Accountability: Figuring out who is responsible when AI makes decisions is an important concern.

Addressing these challenges requires a solid grasp of the framework's varieties and the underlying technologies, along with effective governance and accountability strategies in distributed AI systems.

Call to Action

Are you interested in exploring the potential of the AI agentic framework? Join the conversation below, share your thoughts, or learn more about how Simular AI can assist you in embracing intelligent automation.

Understanding the AI Agentic Framework

The AI Agentic Framework marks a significant change in how we design and use artificial intelligence (AI) systems. This framework aims to create intelligent systems that can make decisions on their own, work together with other agents, and adjust to changing environments. It serves as a foundational structure for cognitive agents to interact, manage workflows, and respond to dynamic situations effectively.

Key aspects include:

Agent-based Approach: This involves using independent entities that act according to specific guidelines and goals.
Multi-Agent Systems: These systems enable various agents to collaborate, which boosts overall efficiency and effectiveness.
Decision-Making Algorithms: These sophisticated algorithms help agents make informed choices by analyzing available data and context.

By leveraging this framework, AI can perform tasks more like humans do, leading to increased productivity and innovative applications across various fields.

Key Components of Agentic AI Systems

To build successful agentic AI systems, several key components need to be considered:

Management Tools: These tools help streamline coordination among agents to ensure smooth operation.
Automation Features: Automation minimizes the need for manual input, which enhances process efficiency.
Reasoning Capabilities: Intelligent agents utilize strong reasoning skills to evaluate situations and make sound decisions.
Design Patterns: By implementing established design patterns, developers can effectively structure complex agent systems.
Debugging Tools: These tools are vital for maintaining system reliability by quickly identifying and fixing issues.
Agent Collaboration Mechanisms: Encouraging cooperation among agents is essential for achieving complex objectives.

Together, these components work to enhance the effectiveness of the agentic approach, paving the way for advanced AI solutions.

Applications of the Agentic Framework in AI

The agentic framework supports a wide range of applications that can greatly benefit different industries:

Virtual Agents: Often used in customer support, these agents provide 24/7 assistance, improving user satisfaction.
Autonomous Agents: In logistics and supply chain management, these agents optimize delivery processes.
Human-Agent Interaction: The framework helps improve user interfaces for better engagement and accessibility when used to build AI agent apps like ai browser automation.
Data Integration: It enables seamless connectivity between various data sources which enriches decision-making.
Feedback Mechanisms: These allow agents to learn from interactions, enhancing their capabilities over time.

This broad versatility illustrates how the framework adapts to different sectors, from finance to healthcare.

Challenges and Considerations

While the AI agent framework holds great potential, it also brings along certain challenges:

Data Privacy Concerns: With the increase in data usage, protecting personal information becomes essential.
Security Risks: Addressing vulnerabilities is crucial to safeguarding against cyber threats.
Ethical Considerations: The deployment of AI must follow ethical standards to prevent misuse.
Project Management Complexity: Coordinating multiple agent systems requires effective leadership and clear guidelines.
Performance Metrics: Setting performance metrics for agents is important for measuring success and adjusting strategies.

Tackling these challenges is important for the successful rollout of agentic systems, ensuring they remain efficient, secure, and ethically sound.

Overall, the AI Agentic Framework lays a solid foundation for developing advanced AI systems. By focusing on collaborative, intelligent agents, organizations can reach new heights in efficiency and creativity. As you explore the potential applications of this framework, keep in mind its benefits and the challenges that may arise to maintain a balanced approach to AI deployment.

If you found this information useful or have questions, feel free to share your thoughts below or distribute this article to others interested in the evolving landscape of AI.

すぐに使用できる
同じような方法でコンピューター？

記憶を共有して整理し、タスクをパーソナライズします。

Sai をお試しください

Agent S: コンピュータを人間のように利用するオープンエージェントフレームワーク

最先端のパフォーマンス

エージェント S は 新しいエージェント フレームワーク 可能になるように設計 として使用するコンピュータ 人間のように直感的に

要約

タスク指示

私を助けて アカウントを削除する 「anonym-x2024@outlook.com」

概要 エージェント S フレームワークの

のパイプライン メモリ構築 と更新

主な結果

分析

経験から学ぶことで、GUIエージェントのドメイン知識が深まる

ACIはLLMのより優れた推論能力を引き出す そして、より優れたエージェント・ラーニングをサポートします。

階層型計画サポート長期ワークフロー

記憶構築には、探索、継続的な記憶更新、自己評価が不可欠です

異なるものへの一般化 オペレーティングシステム

ビブテックス

すぐに使用できる 同じような方法でコンピューター？

エージェント S は新しいエージェント
フレームワーク可能になるように設計
として使用するコンピュータ
人間のように直感的に

私を助けてアカウントを削除する「anonym-x2024@outlook.com」

概要エージェント S フレームワークの

のパイプラインメモリ構築と更新

ACIはLLMのより優れた推論能力を引き出す そして、より優れたエージェント・ラーニングをサポートします。

階層型計画サポート
長期ワークフロー

異なるものへの一般化オペレーティングシステム

すぐに使用できる
同じような方法でコンピューター？