Hervorgehoben

Agent S2: Ein offener,
Modulares und skalierbares Framework für Computer Use Agents

12. März 2025

Agenten zur Computernutzungsind autonome KI-Agenten, die Aufgaben für menschliche Benutzer beobachten, begründen und ausführen, indem sie direkt mit grafischen Benutzeroberflächen (GUI), einschließlich Desktops, Mobilgeräten, Browsern und verschiedener Software, interagieren. Sie fungieren auf die intuitivste Art und Weise als intelligente Vermittler zwischen menschlichen Benutzern und ihren digitalen Tools — mit Maus- und Tastatursteuerung, genau wie ein Mensch. Diese menschenähnliche Fähigkeit, Software zu navigieren und zu steuern, stellt einen grundlegenden Sprung in der KI dar und bereitet die Voraussetzungen für die nächste Ära des technologischen Fortschritts, der von autonomen Computerbenutzern angetrieben wird.  

Wir freuen uns, heute unseren nächsten Sprung nach vorne bei Agenten für die Computernutzung ankündigen zu können: Agent S2, die zweite Generation unseres agentischen Frameworks. Aufbauend auf unseren ersten Erfolgen Agent S2 bietet noch mehr Leistung und Modularität, indem sowohl Frontier-Foundation-Modelle als auch spezialisierte Modelle genutzt werden. Agent S2 erzielt neue Ergebnisse auf dem neuesten Stand der Technik, skaliert gut mit mehr Schritten und vor allem ist es vollständig offen!

Modernste Leistung

Agent S2: An Open, Modular, and Scalable Framework for Computer Use Agents

Agent S2 mit Claude 3.7 + UI-Tars auf dem OSWorld Benchmark

(Agent S2 mit Claude 3.7 + UI-Tars auf dem OSWorld Benchmark)

Agent S2 weist eine überragende Computer- und Telefonnutzung auf, was sich in erheblichen Fortschritten bei den wichtigsten Benchmark-Herausforderungen zeigt.
‍
Für den Computergebrauch liefert Agent S2 modernste Ergebnisse auf OSWorld sowohl bei 15-stufigen als auch bei 50-stufigen Evaluierungen (zwei praktischste Einstellungen für den realen Einsatz). Dies beweist, dass unser agentisches Framework präzisere Maßnahmen ergreift und den besten Plan für eine Aufgabe generiert, während es gleichzeitig in der Lage ist, sich selbst zu korrigieren und über einen langen Horizont hinweg zu verbessern. Insbesondere Agent S2 erreicht eine Genauigkeit von 34,5% bei einer Bewertung in 50 Schritten und übertrifft damit die vorherige SOTA (OpenAI CUA/Operator mit 32,6%). Dies zeigt, wie agentische Frameworks über ein einziges trainiertes Modell hinaus skalieren können.

Für die Verwendung mit Smartphones erreicht Agent S2 auf AndroidWorld eine Genauigkeit von 50% und übertrifft damit das vorherige SOTA (UI-TARS mit 46,8%). Dies zeigt die Verallgemeinerung agentischer Frameworks in verschiedenen visuellen Benutzeroberflächenumgebungen.

Im Anschluss an diesen Blogbeitrag haben wir bei der Vorbereitung unseres Papiers stärkere Ergebnisse auf AndroidWorld erzielt. Wir haben diese Tabelle aktualisiert, um die neueste Leistung widerzuspiegeln. Umfassende Informationen finden Sie in dem Dokument.

Warum modulare Frameworks wichtig sind: Inspiration aus dem menschlichen Gehirn

Das menschliche Gehirn ist ein bemerkenswertes Beispiel für modulares Design — ein Netzwerk spezialisierter Komponenten, die zusammenarbeiten. Verschiedene Regionen eignen sich hervorragend für unterschiedliche Aufgaben: Die linke Gehirnhälfte treibt das analytische Denken an, die rechte fördert die Kreativität, während motorische und sensorische Bereiche die körperliche Koordination steuern. Diese modulare Struktur, die für die Zusammenarbeit optimiert ist, inspiriert uns, wie wir das Design von KI-Agenten für den Computergebrauch angehen.

Wir bei Simular sind der Meinung, dass die effektivsten KI-Agenten einem ähnlichen Prinzip folgen sollten — modularen Frameworks, die verschiedene Modelle nahtlos orchestrieren, anstatt sich auf ein einziges monolithisches System zu verlassen. Unser erstes Agenten-Framework, Agent S, das am 11. Oktober 2024 lanciert wurde, verkörpert diese Vision. Mit erfahrungsgestützter hierarchischer Planung als Kern erzielte Agent S eine bessere Gesamtleistung als alle Modelle und Frameworks zu dieser Zeit.
‍
Unsere neuesten Untersuchungen zeigen außerdem, dass ein gut durchdachtes modulares Framework selbst mit suboptimalen Einzelmodellen das beste Standalone-Modell übertreffen kann. Warum? Weil sich verschiedene Modelle in verschiedenen Bereichen auszeichnen und jedes seine eigenen Stärken und Schwächen besitzt. Ein robustes Framework optimiert die Orchestrierung unter diesen Modulen, um sicherzustellen, dass jedes Modell dort seinen Beitrag leistet, wo es am besten abschneidet, was zu überragenden Gesamtergebnissen führt. In der sich schnell entwickelnden Landschaft der Stiftungsmodelle Modularität ist der Schlüssel. Unser agentisches Framework der nächsten Generation, Agent S2, erreicht aufgrund seiner verbesserten Modularität und Flexibilität eine deutlich bessere Wahrnehmung, Planung und feinkörnige Steuerung.

Agent S2: So funktioniert es

Agent S2 wurde entwickelt, um komplexe digitale Aufgaben durch einen modularen und skalierbaren Ansatz zu bewältigen. Sein Framework betont vier wichtige Designprinzipien:

Proaktive hierarchische Planung
‍
Agent S2 folgt einer natürlichen Aufgabenhierarchie und kombiniert spezialisierte Modelle für die Ausführung auf niedriger Ebene mit generalisierte Modelle für die Planung auf hoher Ebene. Aufgaben auf niedriger Ebene, wie die Auswahl von Benutzeroberflächenelementen oder die Hervorhebung von Text, erfordern ein hohes Maß an Präzision und fachspezifisches Fachwissen, wohingegen Aufgaben auf höherer Ebene eine umfassendere Anpassungsfähigkeit und strategische Kontrolle erfordern. Ein weiterer wichtiger Fortschritt von Agent S2 ist seine Verlagerung von der reaktiven zur proaktiven Planung. Anstatt eine Neuplanung erst nach Auftreten von Fehlern durchzuführen, was mehr Schritte zur Rückverfolgung erfordern würde und mehr Fehler anfallen könnten, aktualisiert Agent S2 seine Pläne nach jeder Unteraufgabe dynamisch. Dieser proaktive Ansatz verbessert die Anpassungsfähigkeit an Änderungen in Echtzeit, die Kontinuität von einer Unteraufgabe zur nächsten und die Optimalität zukünftiger Schritte.

Visuelle Grundlage für präzise Interaktion
‍
Agent S2 ermöglicht eine hochpräzise Interaktion mit grafischen Benutzeroberflächen (GUIs) durch spezielle visuelle Erdungsmodelle. Im Gegensatz zu seinem Vorgänger, der für das Verständnis der Benutzeroberfläche auf Barrierefreiheitsbäume angewiesen war, Agent S2 arbeitet ausschließlich mit Roh-Screenshots als Eingabewodurch die Notwendigkeit strukturierter Barrierefreiheitsdaten entfällt. Indem Agent S2 das visuelle Verständnis an spezielle Modelle delegiert, kann er Benutzeroberflächenelemente wie Schaltflächen, Text, Bilder und Zellen präzise lokalisieren und bearbeiten. Dadurch wird eine feinkörnige Steuerung ermöglicht, die zuvor durch Einschränkungen der Barrierefreiheit eingeschränkt war.

Agent-Computer-Schnittstelle mit Expertenmodulen
‍
Agent S2 verbessert sein Agent-Computer-Interface (ACI), indem er komplexe, einfache Aufgaben wie Textmarkierung auslagert spezialisierte Expertenmodule. Das reduziert die kognitive Belastung auf die Gründungsmodelle, sodass sie sich ausschließlich auf die Planung und strategische Entscheidungsfindung auf hoher Ebene konzentrieren können.

Agentischer Gedächtnismechanismus
‍
Agent S2 verwendet einen Mechanismus des kontinuierlichen Lernens, der es ihm ermöglicht, sich mit der Erfahrung weiterzuentwickeln und die Effizienz im Laufe der Zeit zu verbessern. Die Erfahrung aus zuvor erledigten Aufgaben bleibt erhalten, sodass Agent S2 sich an frühere Aktionen erinnern und zukünftige Strategien auf der Grundlage historischer Erfolge und Misserfolge verfeinern kann. Diese Fähigkeit zum adaptiven Lernen ermöglicht es Agent S2, mit jeder Anwendung besser umzugehen, wodurch eine Grundlage für langfristige adaptive Intelligenz und personalisierte Automatisierung geschaffen wird.

Diese modulare Architektur macht auch die Skalierung und Anpassung mühelos. Neue Module, die auf Basis- oder Expertenmodellen basieren, können einfach integriert, entfernt oder ausgetauscht werden, sodass sich Agent S2 schnell und mühelos an neue Aufgabenbereiche anpassen kann.

Agent S2 in Aktion

Computernutzung

Laden Sie ein Bild von Google Drive herunter und komprimieren Sie es mit GIMP

Bild in Dokument kopieren
‍

Kopieren Sie ein Bild von GIMP in ein LibreOffice Writer-Dokument und exportieren Sie das Dokument dann

Web-Erweiterung einrichten
‍

Eine Weberweiterung einrichten

Videountertitel entfernen

Entfernen Sie Untertitel aus einem Video und exportieren Sie das neue Video

Gewinn berechnen

Berechnen Sie den Gewinn in einem LibreOffice Calc-Blatt

Absatz durchstreichen
‍

Den letzten Absatz in einem LibreOffice Writer-Dokument durchstreichen

Agent S2 auf Ihrem Smartphone

Füllen Sie Formulare aus

Aufgabe: Gehen Sie zum neuen Kontaktbildschirm und geben Sie die folgenden Details ein: Vorname: Grace, Nachname: Taylor, Telefon: 799-802-1530, Telefonlabel:
Arbeit. Drücken Sie NICHT auf Speichern.

Organisieren Sie Dateien

Aufgabe: Verschieben Sie die Datei holiday_photos.jpg von Podcasts im Speicherbereich sdk_gphone_x86_64 in das DCIM innerhalb desselben sdk_gphone_x86_64-Speicherbereichs im Android-Dateisystem.

What are the Key Features of a Computer Use Agent?

What functionalities does a computer use agent offer beyond basic automation?

A computer use agent goes beyond just automating tasks. It uses artificial intelligence to handle complex jobs and boost productivity. By integrating processes smoothly, users can automate workflows efficiently. AI capabilities help these agents analyze data, predict results, and adjust strategies, enhancing productivity.

How does modularity enhance the capabilities of a computer use agent?

Modularity improves a computer use agent by making it flexible and scalable. This setup allows for ongoing updates and customization to fit specific needs. Each module has its function, so users can upgrade parts without affecting the whole system. This adaptability helps agents keep up with changing technology, supporting growth and efficiency.

Can a computer use agent adapt to different operating systems and software applications?

Adaptability is vital for a computer use agent, ensuring it works well with varying operating systems and software. These agents integrate easily, maintaining performance on any platform, whether Windows, macOS, or Linux. Their broad compatibility means organizations can use them across diverse IT systems without facing issues.

How Does a Computer Use Agent Learn and Improve?

What learning mechanisms enable a computer use agent to adapt to user needs?

Computer use agents use machine learning and artificial intelligence to learn and adjust to user needs. They process large data sets to find patterns and make predictions. This helps them provide tailored solutions based on user behavior and preferences. By using feedback loops, they refine their operations for better accuracy over time. These systems effectively meet changing user demands with these methods.

How does a computer use agent handle unexpected situations or errors?

To manage unexpected situations or errors, computer use agents use strong error handling and automation. AI boosts their reliability by spotting and addressing anomalies quickly. These agents stay robust, preventing small issues from becoming serious problems. Automation in troubleshooting allows them to fix errors swiftly and keeps operations running smoothly even in unforeseen circumstances. This enhances user trust and system reliability.

What data privacy measures are implemented in a computer use agent?

Data privacy is crucial for computer use agents. They use strict security protocols to protect sensitive data and comply with regulations. Privacy measures include encrypting data during transmission and storage to guard against unauthorized access. Regular monitoring and updates of security systems help maintain data integrity. By focusing on data privacy, users can trust that their information is secure and handled properly.

What are the Practical Applications of Computer Use Agents?

Computer use agents, powered by artificial intelligence and automation, have a significant role in managing complex workflows. These advanced systems do more than just automate simple tasks. With their sophisticated reasoning abilities, they can handle intricate processes. Autonomous AI agents can organize multiple tasks, use resources efficiently, and keep workflows running smoothly. This technology is changing industries, helping businesses operate more efficiently and foster innovation with browser automation and AI workplace assistant.

Beyond automating simple tasks, what complex workflows can a computer use agent manage?

Computer use agents are deployed to handle complex workflows that need coordination and decision-making. These agents use artificial intelligence and automation to improve operations. Their advanced reasoning lets them assess situations, predict outcomes, and make decisions. This is useful in fields like logistics, finance, and healthcare that require quick adaptation to changing conditions.

How can a computer use agent improve productivity in specific professional contexts?

Using computer use agents in daily operations can greatly increase productivity and efficiency. In professional areas like project management or customer service, these agents automate regular tasks. This allows employees to focus on more strategic work. The innovative use of computer technology improves workflow and speeds up innovation, giving organizations a competitive edge.

What emerging technologies are integrated with advanced computer use agents?

The development of computer use agents is linked with advances in emerging technologies. Machine learning and foundation models, such as those from OpenAI, are crucial for these agents. These technologies allow agents to learn from data, adapt to new information, and improve over time. Continuous integration of new technology ensures that computer use agents remain highly effective across different domains.

Simular AI excels in this field by providing advanced solutions that use these technological developments. By keeping up with emerging technologies, Simular AI ensures its computer use agents are optimized for varied applications, offering great value to clients.

What are the Potential Limitations of Computer Use Agents?

What are the challenges in building truly reliable and trustworthy computer use agents?

Building reliable and trustworthy computer use agents involves several challenges. Integrating AI into these systems can create unexpected issues. This makes solid development protocols essential. It's important to ensure that automation aligns with human values and safety standards. To build trustworthy AI, we need to address biases, improve transparency, and perform thorough tests. Ongoing development and learning from real-world applications help enhance reliability.

How can the risk of errors and malfunctions be minimized in computer use agents?

Minimizing errors and malfunctions in computer use agents requires careful engineering practices and protocols. Advanced computing techniques should be used to detect and correct errors. Rigorous testing and simulation before deployment can help identify potential risks. Continuous monitoring after deployment allows for quick fixes. Effective risk management includes having fallback systems and regular software updates to fix vulnerabilities and boost operational stability.

What are the ethical considerations regarding the development and deployment of computer use agents?

When developing and deploying computer use agents, ethics are a major consideration. Protecting user data privacy and security is crucial. Developers must promote responsible AI use by setting transparent guidelines to prevent misuse or bias. Ethical considerations include assessing AI's impact on society and addressing job displacement concerns. Continuous dialogue among stakeholders is important for responsible development that aligns with societal values and legal standards.

How Can I Get Started with a Computer Use Agent?

What are the available options for accessing and utilizing computer use agent technology?

You have several options for accessing and using a computer use agent. Some popular software platforms include open-source solutions, which offer customization, and products from companies like Microsoft and Google. These platforms provide APIs, allowing seamless integration into your existing systems. Choosing the right one depends on your needs and how well it fits into your current technology setup.

What factors should be considered when choosing a computer use agent solution?

When picking a computer use agent, consider the following:

Functionality: Make sure the agent fulfills your operational needs.
Integration: Verify compatibility with your systems, such as OpenAI API.
Cost: Look at both initial and ongoing costs.
Support and Security: Check the availability of support and the strength of security features.
User Interface: Ensure it's easy to use and intuitive for users.

By assessing these points, you can find a solution that meets your organization's goals and technical needs.

Where can I find resources and support for learning more about and using computer use agents?

To learn more about using computer use agents, consider these resources:

Tutorials and Documentation: You can find guides on GitHub and the official websites of platforms like Microsoft Azure and OpenAI.
Community Forums: Join forums to gain insights and practical knowledge from other users.
Training Programs: Participate in training sessions by providers or external educators for hands-on experience.
Learning Resources: Many online platforms offer courses and materials focusing on different aspects of computer use agents.

These resources will help you fully utilize your chosen computer use agent technology.

What are the benefits of using an autonomous agent for computer use?

Autonomous agents can automate and optimize computer use, improving efficiency. They help manage complex tasks, reduce human errors, and increase productivity by adapting to specific user needs.

How can a computer agent improve cyberattack management?

A computer agent analyzes suspicious behavior in real time to reinforce security against cyberattacks. It monitors networks, detects anomalies, and initiates quick responses to minimize risks.

What is the role of artificial intelligence in computer foundation models?

Artificial intelligence enhances foundation models by improving natural language understanding, which facilitates processing and analyzing complex data. It supports the development of innovative and user-friendly solutions.

Why is it important to consider user interfaces when implementing computer agents?

User interfaces ensure smooth interaction between computer agents and users. They make functionalities accessible and understandable, thereby improving user experience and operational efficiency.

How do navigation agents affect computer software usage?

Navigation agents simplify software interaction by guiding users through complex processes. They facilitate task customization and automation, optimizing software use.

What role do usage agents play in improving organizational workflows?

Usage agents automate redundant tasks and integrate various tools to create efficient workflows. They enable smooth operation management, reduce response times, and boost productivity in organizations.

How can computer agent usage strategies adapt to current work environments?

Computer agent strategies adapt by integrating emerging technologies like federated learning and multimodal models. These approaches make systems more flexible and responsive to changing work environment dynamics.

What are advanced language models, and how do they influence interaction with computer agents?

Advanced language models, such as large language models, enhance computer agents' ability to understand and generate human text. This leads to more natural and effective interactions, enriching the overall user experience.

Key Insights

We offer advanced solutions for computer use agents in regions such as California, Canada, Florida, New York, Texas, the United Kingdom, and Washington.
Our platform utilizes state-of-the-art computer use software to enhance computer utilization, positioning it as a leading intelligent system.
Experience seamless interaction with user agents in GUI and UI design, which provides innovative navigation and autonomous agents.
Work alongside industry leaders like Bill Gates and Sam Altman on groundbreaking projects in AI technology and sandbox environments.
Engage with foundation models and large language models (LLMs) for top performance, backed by notable researchers like Ilya Sutskever.
Gain insights into AI with resources from IEEE, MIT Technology Review, and publications on trustworthy AI practices.
Prioritize secure computing environments to shield against cyber attacks, using tools and protocols for strong data integrity.
Innovate with AI-driven solutions through platforms like OpenAI's ChatGPT and DALL-E, supported by pioneers such as Kai-Fu Lee.
Access our suite of tools, including apps, plugins, and AI technologies that transform user interaction.
Improve efficiency with AIOps for incident management and federated learning, optimizing organizational strategies and workflows.
Enhance emotional well-being and decision-making through effective AI applications in business scenarios.
Use advanced reasoning and adaptive systems for solving complex problems and strategic initiatives.
Stay updated with the latest developments in AI, focusing on practical applications and emerging technologies.
Apply machine learning across sectors, ensuring compliance and ethical standards in deployment.
Explore uses of GPT and AGI in real-world situations through demonstrations, evaluations, and scholarly citations.
Navigate the AI landscape with expertise, using resources like stargate datasets, deepfake prevention, and open-source contributions.
Integrate AI seamlessly into existing agentic frameworks with reference implementations and best practice guidelines.
Encourage AI innovation in computing, fostering community engagement and collaborative growth in tech.
Enhance potential with AI-powered solutions for data analysis, automation, and process optimization across industries.
Connect with professionals on platforms like LinkedIn to explore trends, job opportunities, and advancements in AI.
Optimize your digital experience with tools designed for efficient browsing, communication, and information management.
Maintain robust cybersecurity measures and safe data practices in all AI deployments and integrations.

Bereit, dein zu benutzen
Computer auf ähnliche Weise?

Teile und organisiere dein Gedächtnis und personalisiere deine Aufgaben.

Versuche es mit Sai

Agent S2: Ein offener, Modulares und skalierbares Framework für Computer Use Agents