Content deleted Content added
No edit summary |
Maxeto0910 (talk | contribs) No edit summary |
||
(35 intermediate revisions by 28 users not shown) | |||
Line 1:
{{Short description|Software}}
{{Multiple issues|
{{essay-like|date=December 2008}}
{{No footnotes|date=April 2009}}
}}
'''Agent-oriented software engineering''' ('''AOSE''') is a software engineering [[paradigm]] that arose to apply best practice in the development of complex [[Multi-agent systems|Multi-Agent Systems]] (MAS) by focusing on the use of agents, and organizations (communities) of agents as the main abstractions. The field of [[Product Family Engineering|Software Product Line]]s (SPL) covers all the [[software]] development lifecycle necessary to develop a family of products where the derivation of concrete products is made systematically and rapidly.
==Commentary==
With the advent of biologically inspired, pervasive, and [[autonomic computing]], the advantages of, and necessity of, agent-based technologies and MASs has become obvious{{Citation needed|date=December 2008}}. Unfortunately, current AOSE methodologies are dedicated to developing single MASs. Clearly, many MASs will make use of significantly the
same techniques, adaptations, and approaches. The field is thus ripe for exploiting the benefits of SPL: reduced costs, improved time-to-market, etc. and enhancing agent technology in such a way that it is more industrially applicable.
Multiagent Systems Product Lines (MAS-PL) is a research field devoted to combining the two approaches: applying the SPL philosophy for building a MAS. This will afford all of the advantages of SPLs and make MAS development more practical.
==Benchmarks==
Several benchmarks have been developed to evaluate the capabilities of AI coding agents and large language models in software engineering tasks. Here are some of the key benchmarks:
{|class="wikitable"
|+ Agentic software engineering benchmarks
! Benchmark !! Description
|-
| [https://www.swebench.com/ SWE-bench]
| Assesses the ability of AI models to solve real-world software engineering issues sourced from GitHub repositories. The benchmark involves:
* Providing agents with a code repository and issue description
* Challenging them to generate a patch that resolves the described problem
* Evaluating the generated patch against unit tests
|-
| [https://github.com/snap-stanford/MLAgentBench ML-Agent-Bench]
| Designed to evaluate AI agent performance on machine learning tasks
|-
| [https://github.com/sierra-research/tau-bench τ-Bench]
| τ-Bench is a benchmark developed by Sierra AI to evaluate AI agent performance and reliability in real-world settings. It focuses on:
* Testing agents on complex tasks with dynamic user and tool interactions
* Assessing the ability to follow ___domain-specific policies
* Measuring consistency and reliability at scale
|-
| [https://github.com/web-arena-x/webarena WebArena]
| Evaluates AI agents in a simulated web environment. The benchmark tasks include:
* Navigating complex websites to complete user-driven tasks
* Extracting relevant information from the web
* Testing the adaptability of agents to diverse web-based challenges
|-
| [https://github.com/THUDM/AgentBench AgentBench]
| A benchmark designed to assess the capabilities of AI agents in handling multi-agent coordination tasks. The key areas of evaluation include:
* Communication and cooperation between agents
* Task efficiency and resource management
* Adaptability in dynamic environments
|-
| [https://github.com/aryopg/mmlu-redux MMLU-Redux]
| An enhanced version of the MMLU benchmark, focusing on evaluating AI models across a broad range of academic subjects and domains. It measures:
* Subject matter expertise across multiple disciplines
* Ability to handle complex problem-solving tasks
* Consistency in providing accurate answers across topics
|-
| [https://github.com/MCEVAL/McEval McEval]
| A coding benchmark designed to test AI models' ability to solve coding challenges. The benchmark evaluates:
* Code correctness and efficiency
* Ability to handle diverse programming languages
* Performance across different coding paradigms and tasks
|-
| [https://csbench.github.io/ CS-Bench]
| A specialized benchmark for evaluating AI performance in computer science-related tasks. The key focus areas include:
* Algorithms and data structures
* Computational complexity and optimization
* Theoretical and applied computer science concepts
|-
| [https://github.com/allenai/WildBench WildBench]
| Tests AI models in understanding and reasoning about real-world wild environments. It emphasizes:
* Handling noisy and unstructured data
* Adapting to unpredictable changes in the environment
* Performing well in multi-modal scenarios with real-world relevance
|-
| [https://huggingface.co/datasets/baharef/ToT Test of Time]
| A benchmark that focuses on evaluating AI models' ability to reason about temporal sequences and events over time. It assesses:
* Understanding of temporal logic and sequence prediction
* Ability to make decisions based on time-dependent data
* Performance in tasks requiring long-term planning and foresight
|}
== Software engineering agent systems ==
There are several software engineering (SWE) agent systems in development. Here are some examples:
{| class="wikitable"
|+ List of SWE Agent Systems
! SWE Agent System !! Backend LLM
|-
| [https://salesforce-research-dei-agents.github.io/ Salesforce Research DEIBASE-1] || gpt4o
|-
| [https://cosine.sh/ Cosine Genie] || Fine-tuned OpenAI GPT
|-
| [https://aide.dev/ CodeStory Aide] || gpt4o + Claude 3.5 Sonnet
|-
| [https://mentat.ai/blog/mentatbot-sota-coding-agent AbenteAI MentatBot] || gpt4o
|-
| Salesforce Research DEIBASE-2 || gpt4o
|-
| Salesforce Research DEI-Open || gpt4o
|-
| [https://www.marscode.com/ Bytedance MarsCode] || gpt4o
|-
| [https://arxiv.org/abs/2406.01422 Alibaba Lingma] || gpt-4-1106-preview
|-
| [https://www.factory.ai/ Factory Code Droid] || Anthropic + OpenAI
|-
| [https://autocoderover.dev/ AutoCodeRover] || gpt4o
|-
| [https://aws.amazon.com/q/developer/ Amazon Q Developer] || (unknown)
|-
| [https://github.com/NL2Code/CodeR CodeR] || gpt-4-1106-preview
|-
| [https://github.com/masai-dev-agent/masai MASAI] || (unknown)
|-
| [https://github.com/swe-bench/experiments/tree/main/evaluation/lite/20240706_sima_gpt4o SIMA] || gpt4o
|-
| [https://github.com/OpenAutoCoder/Agentless Agentless] || gpt4o
|-
| [https://github.com/aorwall/moatless-tools Moatless Tools] || Claude 3.5 Sonnet
|-
| [https://github.com/swe-bench/experiments/tree/main/evaluation/lite/20240612_IBM_Research_Agent101 IBM Research Agent] || (unknown)
|-
| [https://github.com/paul-gauthier/aider Aider] || gpt4o + Claude 3 Opus
|-
| [https://docs.all-hands.dev/ OpenDevin + CodeAct] || gpt4o
|-
| [https://github.com/FSoft-AI4Code/AgileCoder AgileCoder] || (various)
|-
| [https://chatdev.ai/ ChatDev] || (unknown)
|-
| [https://github.com/geekan/MetaGPT MetaGPT] || gpt4o
|}
== External links ==
* ''Agent-Oriented Software Engineering: Reflections on Architectures, Methodologies, Languages, and Frameworks'' {{ISBN|978-3642544316}}
== References ==
* Michael Winikoff and Lin Padgham. ''Agent Oriented Software Engineering''. Chapter 15 (pages 695-757) In G. Weiss (Ed.). [http://mitpress.mit.edu/multiagentsystems Multiagent Systems]. 2nd Edition. MIT Press. {{ISBN|978-0-262-01889-0}} (a recent survey of the field)
* Site of the MaCMAS methodology which is applying MAS-PL. https://web.archive.org/web/20100922120209/http://james.eii.us.es/MaCMAS/index.php/Main_Page
* MAS Product Lines site: https://web.archive.org/web/20140518122645/http://mas-productlines.org/
* Joaquin Peña, Michael G. Hinchey,
* {{cite journal | last1 = Peña | first1 = Joaquin | last2 = Hinchey | first2 = Michael G. | last3 = Resinas | first3 = Manuel | last4 = Sterritt | first4 = Roy | last5 = Rash | first5 = James L. | title = Designing and Managing Evolving Systems using a MAS-Product-Line Approach | doi = 10.1016/j.scico.2006.10.007 | journal = Journal of Science of Computer Programming | year = 2007 | volume = 66| pages = 71–86| url = https://pure.ulster.ac.uk/en/publications/0a91f377-9421-4585-957b-77060a458644 | doi-access = free }}
* Joaquin Peña, Michael G. Hinchey, Antonio Ruiz-Cortés, and Pablo Trinidad. Building the Core Architecture of a NASA Multiagent System Product Line. In 7th International Workshop on Agent Oriented Software Engineering 2006, page to be published, Hakodate, Japan, May 2006. LNCS. https://doi.org/10.1007%2F978-3-540-70945-9_13
* Joaquin Peña, Michael G. Hinchey, Manuel Resinas, Roy Sterritt, James L. Rash. Managing the Evolution of an Enterprise Architecture using a MAS-Product-Line Approach. 5th Int. Workshop on System/Software Architectures (IWSSA’06). Nevada, USA. 2006
* Soe-Tsyr Yuan. MAS Building Environments with Product-Line-Architecture Awareness.
* [https://web.archive.org/web/20070517214904/http://www.cs.iastate.edu/~dehlinge/publications.html Josh_Dehlinger] and [
* [https://web.archive.org/web/20091231195122/http://james.eii.us.es/MaCMAS/images/6/69/Current-Research-MAS-PL-TF4-Lisbon.pdf
[[Category:Software project management]]
{{software-eng-stub}}
|