Neu

Agent S: Ein offenes Agentic Framework, das Computer wie ein Mensch nutzt

27. Februar 2025

Saket Agashe*,

Jiuzhou Han*,

Hallo! Vor ein paar Monaten hielt ich an der Princeton University einen Vortrag über meine Gedanken zu Agenten und Simular. Ich dachte mir, ich sollte eine Zusammenfassung zusammenstellen und daraus einen Blogbeitrag machen.

Modernste Leistung

Mein erster Job war als wissenschaftlicher Mitarbeiter bei Google DeepMind, wo ein wichtiger Teil meiner Rolle darin bestand, mit verschiedenen Google-Produktteams zusammenzuarbeiten, um Möglichkeiten für die Anwendung unserer hochmodernen KI-Technologie zu identifizieren. Ein Googler stellte mir jedoch eine völlig andere Frage, die letztendlich zu meiner Entscheidung geführt haben könnte, DeepMind zu verlassen und Simular zu gründen.

Agent S ist ein neuer Agent
Rahmen entworfen, um zu ermöglichen
Computer zur Verwendung als
intuitiv, wie es ein Mensch tun würde

Wir führen eine erfahrungsgestützte hierarchische Planungsmethode ein. Diese Methode nutzt Online-Webwissen für aktuelle Informationen über sich häufig ändernde Software und Websites sowie Narrative Memory, um Erfahrungen auf hoher Ebene aus vergangenen Interaktionen zu nutzen. Agent S unterteilt komplexe Aufgaben in überschaubare Unteraufgaben und nutzt Episodic Memory zur schrittweisen Anleitung. So verfeinert er kontinuierlich seine Aktionen und lernt aus Erfahrungen, um eine anpassungsfähige und effektive Aufgabenplanung zu erreichen.

Zusammenfassung

Wir präsentieren Agent S, ein offenes agentisches Framework, das ermöglicht autonome Interaktion mit Computern über eine grafische Benutzeroberfläche (GUI), die darauf abzielt, die Interaktion zwischen Mensch und Computer durch die Automatisierung komplexer, mehrstufiger Aufgaben zu transformieren

Zu diesem Zweck führt Agent S eine erfahrungsgestützte hierarchische Planung ein, die aus der externen Wissenssuche und dem Abrufen interner Erfahrungen auf mehreren Ebenen lernt und so eine effiziente Aufgabenplanung und Ausführung von Unteraufgaben ermöglicht.

Darüber hinaus verwendet es eine Agent-Computer-Schnittstelle, um die Denk- und Kontrollfähigkeiten von GUI-Agenten auf der Grundlage multimodaler Large Language Models besser zu ermitteln. Die Auswertung anhand des OSWorld-Benchmarks zeigt, dass Agent S den Ausgangswert in Bezug auf die Erfolgsquote um 9,37% übertrifft (eine relative Verbesserung von 83,6%) und entspricht einem neuen Stand der Technik. Eine umfassende Analyse unterstreicht die Wirksamkeit einzelner Komponenten und liefert Erkenntnisse für zukünftige Verbesserungen.

Darüber hinaus zeigt Agent S eine breite Generalisierbarkeit auf verschiedene Betriebssysteme auf einem neu veröffentlichten
Windows AgentArena-Benchmark.

Agent S befasst sich mit drei zentralen Herausforderungen bei der Automatisierung von Computeraufgaben:

Aufgabenanweisung

Hilf mir dabei entferne das Konto „anonym-x2024@outlook.com“

Überblick von Agent S Framework

Ausgehend von der Aufgabe Tu und der anfänglichen Umgebungsbeobachtung 0o führt der Manager eine erfahrungsgestützte hierarchische Planung durch, bei der Webwissen und narratives Gedächtnis verwendet werden, um die Unteraufgaben So,..., Sn zu erstellen. Für jedes Si nutzt Worker Wi das episodische Gedächtnis, um zum Zeitpunkt t eine Aktion zu generieren, die vom ACI ausgeführt wird, um die nächste unmittelbare Beobachtung ot+1 zurückzugeben. Ein Selbstbewertungsmodul schließt den Kreislauf, indem es die zusammengefassten Trajektorien der Teilaufgabe und der gesamten Aufgabe im narrativen und episodischen Gedächtnis speichert.

Pipeline von Aufbau des Speichers und Update

Die Pipeline zur Speicherkonstruktion und -aktualisierung, die zwei Phasen umfasst: Selbstüberwachte Erkundung und kontinuierliche Speicheraktualisierung. Das anfängliche narrative und episodische Gedächtnis wird durch einige zufällig kuratierte Aufgaben während der Explorationsphase konstruiert und dann auf der Grundlage der Inferenzaufgaben kontinuierlich aktualisiert.

Pipeline of Memory Construction and Update

Hauptergebnis

Diese Tabelle zeigt den Leistungsvergleich zwischen Agent S und den Basismodellen, der für das gesamte OSWorld-Testset ausgewertet wurde. Für das GPT-4o-Modell erreicht Agent S eine Gesamterfolgsrate von 20,58% und verdoppelt damit fast die Leistung des besten entsprechenden Ausgangswerts (GPT-4o mit 11,21%).

Agent S übertrifft die Ausgangswerte bei den Aufgaben „Täglich“ und „Professionell“ durchweg und erreicht dort eine Erfolgsquote von 27,06% bzw. 36,73%, verglichen mit den besten Ausgangsergebnissen von 12,33% bzw. 14,29%. Diese Aufgaben werden häufig im täglichen Leben verwendet oder sind mit wissensintensiven professionellen Anwendungen verbunden, die mehr von der Retrieval-Erweiterung von Agent S profitieren. Sowohl Claude-3.5-Sonnet als auch GPT-4o übertreffen die Basisversionen bei den meisten Aufgaben. Claude-3.5-Sonnet schneidet bei den Aufgaben „Daily“ und „Professional“ sogar besser ab als GPT-4o.

Die Ergebnisse belegen die verbesserte Fähigkeit von Agent S, vielfältige und komplexe Aufgaben effektiver zu bewältigen als die Basisansätze.

Wichtigste Ergebnisse der Erfolgsquote (%) für den vollständigen OSWorld-Testsatz aller 369 Testbeispiele

Analyse

Um die Wirksamkeit der einzelnen Module von Agent S zu demonstrieren, haben wir eine Untergruppe von 65 stratifiziert.
Instanzen, Testsubstanz aus dem vollständigen Testsatz für die Ablationsstudie. In Anbetracht der Inferenzkosten verwendeten wir GPT-4o als
LLM-Backbone für alle Ablationsstudien sowohl für den Ausgangswert als auch für Agent S.

Aus Erfahrung zu lernen verbessert das Domänenwissen von GUI-Agenten

Main results of Successful Rate (%) on the OSWorld full test set of all 369 test examples

Wichtigste Ergebnisse der Erfolgsquote (%) für den vollständigen OSWorld-Testsatz aller 369 Testbeispiele

Durch das Lernen aus universellen Erfahrungen, die als Web-Wissen verfügbar sind, kann Agent S fundierte Pläne für eine Vielzahl von Aufgaben erstellen, was die größte Wirkung hat. Das Lernen aus narrativen und episodischen Erinnerungen ergibt eine effektive Synergie mit dem Web-Retrieval. Die Ergebnisse zeigen detailliert, wie sich deren Ablation auf die Fähigkeit des Agenten auswirkt, komplexe Aufgaben zu bewältigen, was den Wert des Erfahrungslernens unterstreicht. Diese Ergebnisse zeigen, dass jede Komponente eine entscheidende Rolle bei der Erweiterung des Domänenwissens des Agenten spielt. Wenn alle drei Komponenten (ohne Alle) entfernt werden, verschlechtert sich die Leistung erheblich, was zeigt, wie wichtig es ist, bei der Planung aus Erfahrungen zu lernen.

ACI entlockt LLMs bessere Denkfähigkeiten und unterstützt besseres agentisches Lernen

Der Vergleich des Ausgangswerts mit dem Wirkstoff S (nur ACI) verdeutlicht die verbesserten Denkfähigkeiten, die durch die Einbeziehung von ACI erreicht werden. Darüber hinaus untersuchten wir die Auswirkungen von ACI auf das agentische Lernen, indem wir den Prozess des Erfahrungslernens integriert haben. Als Ausgangsbasis verbesserte das Hinzufügen von Erfahrungslernen die Gesamtleistung leicht. In Kombination mit Agent S (nur ACI) verbesserte sich die Leistung jedoch erheblich, was die Wirksamkeit von ACI bei der Verbesserung des agentischen Lernens unter Beweis stellte

Hierarchische Planungsunterstützung
Workflows mit langem Horizont

Das reine ACI-Setup + Experiential Learning in zeigt die Leistung von Agent S ohne hierarchische Planung und den beobachteten Leistungsabfall Agent S (26,15% bis 20,00%) im Vergleich zur Vollversion unterstreicht die Bedeutung der hierarchischen Planung bei der Modellierung langfristiger Arbeitsabläufe. Der Effekt der hierarchischen Formulierung wird bei erfahrungsorientiertem Lernen deutlich, da der Manager in der Planungsphase der Teilaufgaben detailliertere und genauere Pläne erstellen kann.

Exploration, Continual Memory Update und Self-Evaluator sind für die Gedächtniskonstruktion unverzichtbar

Durch das Entfernen der Exploration werden Speicheraktualisierungen nur auf die Inferenzphase beschränkt. Das Entfernen der kontinuierlichen Speicheraktualisierung bedeutet, dass wir nur den Speicher verwenden, der in der Explorationsphase gewonnen wurde, ohne nachfolgende Aktualisierungen. Das Entfernen des Selbstbewerters beinhaltet das Ersetzen zusammengefasster Erfahrungen durch die ursprünglichen vollständigen Trajektorien. Die Ergebnisse zeigen, dass die Verzögerung sowohl der kontinuierlichen Gedächtnisaktualisierung als auch der Phase der selbstüberwachten Erkundung zu einem Leistungsabfall führt, wobei die selbstüberwachte Erkundung viel wirkungsvoller ist. Die Ablation des Self-Evaluators zeigt außerdem, welche Vorteile es hat, zusammengefasste Trajektorien statt vollständiger Trajektorienbeispiele für die Planung zu verwenden.

Verallgemeinerung auf Verschiedenes Betriebssysteme

Wir testen das Agent S-Framework ohne Änderungen auf WindowsAgentArena, einem Windows-Betriebssystem-Benchmark, der gleichzeitig mit unserer Arbeit veröffentlicht wurde. Wir vergleichen Agent S mit der ähnlichen Konfiguration mit GPT-4o als MLLM-Backbone, Accessibility Tree + Image als Eingabe und Parsen mit OCR. Wie in der Tabelle gezeigt, übertrifft Agent S den Navi-Agenten, ohne dass eine Anpassung an die neue Windows-Umgebung erforderlich ist.

Results of Successful Rate (%) on WindowsAgentArena using GPT-4o and Image + Accessibility Tree input on the full test set of all 154 test examples

Ergebnisse der Erfolgsrate (%) auf WindowsAgentArena unter Verwendung von GPT-4O und Bild+Accessibility Tree-Eingabe im vollen Umfang

BibTeX

@misc {Agenten,
  title= {Agent S: Ein Open Agentic Framework, das Computer wie ein Mensch benutzt},
  author= {Saaket Agashe*, Jiuzhou Han*, Shuyu Gan, Jiachen Yang, Ang Li, Xin Eric Wang},
jahr= {2024},
  eprint= {},
  Archivpräfix = {arXiv},
  Primäre Klasse = {cs.AI} 
}

Understanding the AI Agentic Framework

The AI agentic framework is a modern approach that combines artificial intelligence (AI) with agent-based modeling. This combination aims to improve decision-making processes. With this framework, intelligent agents can work on their own within a system, which makes workflows smoother and promotes collaboration. By using machine learning and automation, the agentic framework creates a solid foundation for developing multi-agent systems that adjust to various situations.

Here are some key components of this framework:

Intelligent Agents: These software entities can take independent actions to achieve specific goals.
Decision-Making Algorithms: These algorithms help agents make informed choices based on the information they receive.
Agent Systems: This refers to groups of interconnected agents collaborating to complete complex tasks.

Microsoft and other tech leaders are using this framework to create smarter applications that need less human involvement.

Key Concepts of the Agentic Framework

The agentic framework includes several important concepts that are essential for its successful application:

Agent-Based Framework: A setup where individual agents work together to accomplish tasks, boosting efficiency.
Agentic Approach: This method encourages agents to act independently and highlights their ability to learn and adapt.
Workflows: Built in AI workplace assistants, these are the planned paths that agents follow to enhance processes and ensure smooth task execution.
Human-Agent Interaction: This is how humans communicate and guide the agents.

By incorporating languages like Python, developers can effectively use design patterns, adaptive agents, and debugging methods. This integration helps create better feedback loops and improves the overall performance of the system.

Applications of AI Agentic Framework

The applications of the AI agentic framework are broad and relevant across various fields:

AI Framework Variations: Different types can be adjusted to meet specific industry needs, ensuring flexibility.
AI Solutions: From virtual assistants to intricate management systems, these solutions expand operational possibilities.
Agent Orchestration: This involves coordinating multiple agents to achieve unified results.
Security and Management: The framework helps boost organizational efficiency while upholding security standards.

Prominent examples include platforms like GitHub and tools such as Langchain, showcasing how agentic AI can be implemented in real-world settings. These applications illustrate how intelligent systems can reshape business functions and enhance user experiences.

Benefits of Using an Agentic Framework

Using an agentic framework comes with many advantages:

Efficiency: It increases productivity by automating repetitive tasks, reducing the need for manual work.
Quality Management: The framework ensures consistent quality in results through structured processes.
Continuous Integration: Updates and improvements become easier, keeping the systems current and effective.
Cooperative Agents: It encourages collaboration among different agents, leading to improved problem-solving abilities.

This framework also addresses ethical concerns in AI, promoting transparency and responsible use of self-learning agents.

Challenges in Implementing Agentic Frameworks

While there are clear benefits, organizations may face a few challenges when adopting agentic frameworks:

Data Privacy: Protecting sensitive data is critical when implementing intelligent systems.
AI Governance: Setting regulations is necessary to manage the proper use and oversight of AI technologies.
Agent Performance Metrics: Finding suitable metrics to measure how well agents perform their tasks is essential.
Real-Time Agents: Managing agents in fast-paced environments requires advanced strategies and resilient systems.

Tackling these challenges is vital for successfully integrating the AI agentic model into existing systems to ensure safety and trustworthiness.

Conclusion

The AI agentic framework shows promise in the realm of artificial intelligence by providing a structured way to effectively utilize intelligent systems. By grasping its core concepts, applications, benefits, and challenges, organizations can better leverage AI to foster innovation and enhance efficiency.

Feel free to explore more about the AI agentic framework or share your opinions in the comments! Your questions and insights are valuable as we move forward in this exciting field.

Understanding the AI Agentic Framework

The AI agentic framework is a collection of ideas and methods aimed at creating intelligent systems that can act and make decisions on their own. This framework enhances collaboration between human users and artificial intelligence (AI) agents, promoting smooth workflows and effective automation.

Key aspects of the agent-based framework include:

Intelligent Agent Frameworks: These form the foundation for developing AI solutions that function in real-time.
Collaboration Mechanisms: Good communication among multiple agents boosts system performance.
Human-Agent Interaction: This part emphasizes how people can work alongside cognitive agents, leading to better experiences.
Multi-Agent Systems: Different agents work together to accomplish complex tasks, which may be too much for a single agent to handle.

You can see real-world applications of this framework in areas like healthcare, finance, and logistics, where AI applications enhance processes, lower mistakes, and improve results.

Key Components of an Agentic Approach

An agentic approach consists of essential components that define how it works and its effectiveness.

Agent Autonomy: The level of independence an agent has is crucial for effective automation.
Decision-Making Algorithms: These allow agents to evaluate situations and make smart choices based on current data.
Agent-Based Modeling: This method helps simulate interactions within a system, improving understanding and optimization.
Design Patterns: Established design patterns assist with programming agent systems, making them easier to maintain and scale.
Agent Cooperation: Successful implementation depends on agents working together smoothly.

A strong agentic model includes these components, enabling powerful agent technologies that drive innovation across various sectors.

Applications of the AI Agentic Framework

The AI agent framework has many applications across different sectors, highlighting its flexibility and effectiveness.

Some noteworthy examples are:

Project Management: AI agents improve project workflows, ensuring tasks are completed quickly and on time.
Data Privacy: Intelligent agents help manage sensitive data while ensuring compliance with regulations like GDPR.
Autonomous Agents: These self-operating agents take care of repetitive tasks, such as entering data so that humans can concentrate on strategic work.
Task-Oriented Agents: Designed to perform specific functions, these agents carry out tasks with great accuracy.

Leading companies like Microsoft and Nvidia utilize the agentic AI framework, showing how AI capabilities can be integrated effectively into their operations.

Benefits of Implementing Agentic Systems

Implementing agentic systems brings a variety of benefits that can boost efficiency and effectiveness in organizations:

Automation: Cuts down on manual work, speeding up task completion.
Ease of Use: Built with user experience in mind, making acceptance simple.
Real-Time Analytics: Offers instant feedback, supporting data-driven decisions.
AI Ethics: Complies with ethical standards, building trust with users.
Performance Metrics: Measures agent effectiveness, promoting continuous improvement.

These benefits explain why many organizations are adopting agentic variations to stay competitive in their fields.

Challenges and Considerations

While the agentic framework offers many chances for improvement, it also presents challenges that businesses should think about:

Security Risks: Protecting data and systems from cyber threats is crucial.
Complexity: Creating and implementing multi-agent systems can be intricate and time-consuming.
Data Governance: Organizations must follow regulations and best practices for data management.
AI Accountability: Figuring out who is responsible when AI makes decisions is an important concern.

Addressing these challenges requires a solid grasp of the framework's varieties and the underlying technologies, along with effective governance and accountability strategies in distributed AI systems.

Call to Action

Are you interested in exploring the potential of the AI agentic framework? Join the conversation below, share your thoughts, or learn more about how Simular AI can assist you in embracing intelligent automation.

Understanding the AI Agentic Framework

The AI Agentic Framework marks a significant change in how we design and use artificial intelligence (AI) systems. This framework aims to create intelligent systems that can make decisions on their own, work together with other agents, and adjust to changing environments. It serves as a foundational structure for cognitive agents to interact, manage workflows, and respond to dynamic situations effectively.

Key aspects include:

Agent-based Approach: This involves using independent entities that act according to specific guidelines and goals.
Multi-Agent Systems: These systems enable various agents to collaborate, which boosts overall efficiency and effectiveness.
Decision-Making Algorithms: These sophisticated algorithms help agents make informed choices by analyzing available data and context.

By leveraging this framework, AI can perform tasks more like humans do, leading to increased productivity and innovative applications across various fields.

Key Components of Agentic AI Systems

To build successful agentic AI systems, several key components need to be considered:

Management Tools: These tools help streamline coordination among agents to ensure smooth operation.
Automation Features: Automation minimizes the need for manual input, which enhances process efficiency.
Reasoning Capabilities: Intelligent agents utilize strong reasoning skills to evaluate situations and make sound decisions.
Design Patterns: By implementing established design patterns, developers can effectively structure complex agent systems.
Debugging Tools: These tools are vital for maintaining system reliability by quickly identifying and fixing issues.
Agent Collaboration Mechanisms: Encouraging cooperation among agents is essential for achieving complex objectives.

Together, these components work to enhance the effectiveness of the agentic approach, paving the way for advanced AI solutions.

Applications of the Agentic Framework in AI

The agentic framework supports a wide range of applications that can greatly benefit different industries:

Virtual Agents: Often used in customer support, these agents provide 24/7 assistance, improving user satisfaction.
Autonomous Agents: In logistics and supply chain management, these agents optimize delivery processes.
Human-Agent Interaction: The framework helps improve user interfaces for better engagement and accessibility when used to build AI agent apps like ai browser automation.
Data Integration: It enables seamless connectivity between various data sources which enriches decision-making.
Feedback Mechanisms: These allow agents to learn from interactions, enhancing their capabilities over time.

This broad versatility illustrates how the framework adapts to different sectors, from finance to healthcare.

Challenges and Considerations

While the AI agent framework holds great potential, it also brings along certain challenges:

Data Privacy Concerns: With the increase in data usage, protecting personal information becomes essential.
Security Risks: Addressing vulnerabilities is crucial to safeguarding against cyber threats.
Ethical Considerations: The deployment of AI must follow ethical standards to prevent misuse.
Project Management Complexity: Coordinating multiple agent systems requires effective leadership and clear guidelines.
Performance Metrics: Setting performance metrics for agents is important for measuring success and adjusting strategies.

Tackling these challenges is important for the successful rollout of agentic systems, ensuring they remain efficient, secure, and ethically sound.

Overall, the AI Agentic Framework lays a solid foundation for developing advanced AI systems. By focusing on collaborative, intelligent agents, organizations can reach new heights in efficiency and creativity. As you explore the potential applications of this framework, keep in mind its benefits and the challenges that may arise to maintain a balanced approach to AI deployment.

If you found this information useful or have questions, feel free to share your thoughts below or distribute this article to others interested in the evolving landscape of AI.

Bereit, dein zu benutzen
Computer auf ähnliche Weise?

Teile und organisiere dein Gedächtnis und personalisiere deine Aufgaben.

Versuche es mit Sai

Agent S: Ein offenes Agentic Framework, das Computer wie ein Mensch nutzt

Modernste Leistung

Agent S ist ein neuer Agent Rahmen entworfen, um zu ermöglichen Computer zur Verwendung als intuitiv, wie es ein Mensch tun würde

Zusammenfassung

Aufgabenanweisung

Hilf mir dabei entferne das Konto „anonym-x2024@outlook.com“

Überblick von Agent S Framework

Pipeline von Aufbau des Speichers und Update

Hauptergebnis

Analyse

Aus Erfahrung zu lernen verbessert das Domänenwissen von GUI-Agenten

ACI entlockt LLMs bessere Denkfähigkeiten und unterstützt besseres agentisches Lernen

Hierarchische PlanungsunterstützungWorkflows mit langem Horizont

Exploration, Continual Memory Update und Self-Evaluator sind für die Gedächtniskonstruktion unverzichtbar

Verallgemeinerung auf Verschiedenes Betriebssysteme

BibTeX

Bereit, dein zu benutzen Computer auf ähnliche Weise?

Agent S ist ein neuer Agent
Rahmen entworfen, um zu ermöglichen
Computer zur Verwendung als
intuitiv, wie es ein Mensch tun würde

ACI entlockt LLMs bessere Denkfähigkeiten und unterstützt besseres agentisches Lernen

Hierarchische Planungsunterstützung
Workflows mit langem Horizont

Bereit, dein zu benutzen
Computer auf ähnliche Weise?