Agent-oriented software engineering: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:13, 15 August 2014 edit Samboy (talk \| contribs) Extended confirmed users 7,668 edits Since we have two other tags which specify issues, no need to ancient {{cleanup}} tag without reason ← Previous edit		Latest revision as of 19:12, 1 January 2025 edit undo Maxeto0910 (talk \| contribs) Extended confirmed users 116,779 edits No edit summary Tag: Visual edit
(23 intermediate revisions by 16 users not shown)
Line 1: {{Short description\|Software}} {{Multiple issues\| {{essay-like\|date=December 2008}} Line 4 ⟶ 5: }} '''Agent-~~Oriented~~oriented ~~Software~~software ~~Engineering~~engineering''' ('''AOSE''') is a ~~new~~ software engineering [[paradigm]] that arose to apply best practice in the development of complex [[Multi-agent systems\|Multi-Agent Systems]] (MAS) by focusing on the use of agents, and organizations (communities) of agents as the main abstractions. The field of [[Product Family Engineering\|Software Product Line]]s (SPL) covers all the [[software]] development lifecycle necessary to develop a family of products where the derivation of concrete products is made systematically and rapidly. ==Commentary== Line 11 ⟶ 12: Multiagent Systems Product Lines (MAS-PL) is a research field devoted to combining the two approaches: applying the SPL philosophy for building a MAS. This will afford all of the advantages of SPLs and make MAS development more practical. ==Benchmarks== Several benchmarks have been developed to evaluate the capabilities of AI coding agents and large language models in software engineering tasks. Here are some of the key benchmarks: {\|class="wikitable" \|+ Agentic software engineering benchmarks ! Benchmark !! Description \|- \| [https://www.swebench.com/ SWE-bench] \| Assesses the ability of AI models to solve real-world software engineering issues sourced from GitHub repositories. The benchmark involves: * Providing agents with a code repository and issue description * Challenging them to generate a patch that resolves the described problem * Evaluating the generated patch against unit tests \|- \| [https://github.com/snap-stanford/MLAgentBench ML-Agent-Bench] \| Designed to evaluate AI agent performance on machine learning tasks \|- \| [https://github.com/sierra-research/tau-bench τ-Bench] \| τ-Bench is a benchmark developed by Sierra AI to evaluate AI agent performance and reliability in real-world settings. It focuses on: * Testing agents on complex tasks with dynamic user and tool interactions * Assessing the ability to follow ___domain-specific policies * Measuring consistency and reliability at scale \|- \| [https://github.com/web-arena-x/webarena WebArena] \| Evaluates AI agents in a simulated web environment. The benchmark tasks include: * Navigating complex websites to complete user-driven tasks * Extracting relevant information from the web * Testing the adaptability of agents to diverse web-based challenges \|- \| [https://github.com/THUDM/AgentBench AgentBench] \| A benchmark designed to assess the capabilities of AI agents in handling multi-agent coordination tasks. The key areas of evaluation include: * Communication and cooperation between agents * Task efficiency and resource management * Adaptability in dynamic environments \|- \| [https://github.com/aryopg/mmlu-redux MMLU-Redux] \| An enhanced version of the MMLU benchmark, focusing on evaluating AI models across a broad range of academic subjects and domains. It measures: * Subject matter expertise across multiple disciplines * Ability to handle complex problem-solving tasks * Consistency in providing accurate answers across topics \|- \| [https://github.com/MCEVAL/McEval McEval] \| A coding benchmark designed to test AI models' ability to solve coding challenges. The benchmark evaluates: * Code correctness and efficiency * Ability to handle diverse programming languages * Performance across different coding paradigms and tasks \|- \| [https://csbench.github.io/ CS-Bench] \| A specialized benchmark for evaluating AI performance in computer science-related tasks. The key focus areas include: * Algorithms and data structures * Computational complexity and optimization * Theoretical and applied computer science concepts \|- \| [https://github.com/allenai/WildBench WildBench] \| Tests AI models in understanding and reasoning about real-world wild environments. It emphasizes: * Handling noisy and unstructured data * Adapting to unpredictable changes in the environment * Performing well in multi-modal scenarios with real-world relevance \|- \| [https://huggingface.co/datasets/baharef/ToT Test of Time] \| A benchmark that focuses on evaluating AI models' ability to reason about temporal sequences and events over time. It assesses: * Understanding of temporal logic and sequence prediction * Ability to make decisions based on time-dependent data * Performance in tasks requiring long-term planning and foresight \|} == Software engineering agent systems == There are several software engineering (SWE) agent systems in development. Here are some examples: {\| class="wikitable" \|+ List of SWE Agent Systems ! SWE Agent System !! Backend LLM \|- \| [https://salesforce-research-dei-agents.github.io/ Salesforce Research DEIBASE-1] \|\| gpt4o \|- \| [https://cosine.sh/ Cosine Genie] \|\| Fine-tuned OpenAI GPT \|- \| [https://aide.dev/ CodeStory Aide] \|\| gpt4o + Claude 3.5 Sonnet \|- \| [https://mentat.ai/blog/mentatbot-sota-coding-agent AbenteAI MentatBot] \|\| gpt4o \|- \| Salesforce Research DEIBASE-2 \|\| gpt4o \|- \| Salesforce Research DEI-Open \|\| gpt4o \|- \| [https://www.marscode.com/ Bytedance MarsCode] \|\| gpt4o \|- \| [https://arxiv.org/abs/2406.01422 Alibaba Lingma] \|\| gpt-4-1106-preview \|- \| [https://www.factory.ai/ Factory Code Droid] \|\| Anthropic + OpenAI \|- \| [https://autocoderover.dev/ AutoCodeRover] \|\| gpt4o \|- \| [https://aws.amazon.com/q/developer/ Amazon Q Developer] \|\| (unknown) \|- \| [https://github.com/NL2Code/CodeR CodeR] \|\| gpt-4-1106-preview \|- \| [https://github.com/masai-dev-agent/masai MASAI] \|\| (unknown) \|- \| [https://github.com/swe-bench/experiments/tree/main/evaluation/lite/20240706_sima_gpt4o SIMA] \|\| gpt4o \|- \| [https://github.com/OpenAutoCoder/Agentless Agentless] \|\| gpt4o \|- \| [https://github.com/aorwall/moatless-tools Moatless Tools] \|\| Claude 3.5 Sonnet \|- \| [https://github.com/swe-bench/experiments/tree/main/evaluation/lite/20240612_IBM_Research_Agent101 IBM Research Agent] \|\| (unknown) \|- \| [https://github.com/paul-gauthier/aider Aider] \|\| gpt4o + Claude 3 Opus \|- \| [https://docs.all-hands.dev/ OpenDevin + CodeAct] \|\| gpt4o \|- \| [https://github.com/FSoft-AI4Code/AgileCoder AgileCoder] \|\| (various) \|- \| [https://chatdev.ai/ ChatDev] \|\| (unknown) \|- \| [https://github.com/geekan/MetaGPT MetaGPT] \|\| gpt4o \|} == External links == * ''Agent-Oriented Software Engineering: Reflections on Architectures, Methodologies, Languages, and Frameworks'' {{ISBN\|978-3642544316}} == References == * Michael Winikoff and Lin Padgham. ''Agent Oriented Software Engineering''. Chapter 15 (pages 695-757) In G. Weiss (Ed.). [http://mitpress.mit.edu/multiagentsystems Multiagent Systems]. 2nd Edition. MIT Press. {{ISBN\|978-0-262-01889-0}} (a recent survey of the field) * Site of the MaCMAS methodology which is applying MAS-PL. http://www.macmas.org * Site of the MaCMAS methodology which is applying MAS-PL. https://web.archive.org/web/20100922120209/http://james.eii.us.es/MaCMAS/index.php/Main_Page * MAS Product Lines site: http://www.mas-productlines.org * MAS Product Lines site: https://web.archive.org/web/20140518122645/http://mas-productlines.org/ * Joaquin Peña, Michael G. Hinchey, and Antonio Ruiz-Cortés. Multiagent system product lines: Challenges and benefits. Communications of the ACM, December 2006, volume 49, issue number 12. http://doi.acm.org/10.1145/1183236.1183272 * Joaquin Peña, Michael G. Hinchey, ~~Manuel~~and ~~Resinas,~~Antonio ~~Roy~~Ruiz-Cortés. ~~Sterritt,~~Multiagent ~~James~~system L.product ~~Rash.~~lines: ~~Designing~~Challenges and ~~Managing~~benefits. ~~Evolving~~Communications ~~Systems~~of ~~using~~the aACM, ~~MAS-Product-Line~~December ~~Approach.~~2006, ~~Journal~~volume of49, ~~Science of~~issue ~~Computer~~number ~~Programming~~12. ~~http://dx.~~{{doi~~.org/~~\|10.~~1016~~1145/~~j.scico.2006.10~~1183236.~~007~~1183272}} * {{cite journal \| last1 = Peña \| first1 = Joaquin \| last2 = Hinchey \| first2 = Michael G. \| last3 = Resinas \| first3 = Manuel \| last4 = Sterritt \| first4 = Roy \| last5 = Rash \| first5 = James L. \| title = Designing and Managing Evolving Systems using a MAS-Product-Line Approach \| doi = 10.1016/j.scico.2006.10.007 \| journal = Journal of Science of Computer Programming \| year = 2007 \| volume = 66\| pages = 71–86\| url = https://pure.ulster.ac.uk/en/publications/0a91f377-9421-4585-957b-77060a458644 \| doi-access = free }} * Joaquin Peña, Michael G. Hinchey, Antonio Ruiz-Cortés, and Pablo Trinidad. Building the Core Architecture of a NASA Multiagent System Product Line. In 7th International Workshop on Agent Oriented Software Engineering 2006, page to be published, Hakodate, Japan, May 2006. LNCS. http://www.springerlink.com/content/vrv5857n41j44521/fulltext.pdf * Joaquin Peña, Michael G. Hinchey, Antonio Ruiz-Cortés, and Pablo Trinidad. Building the Core Architecture of a NASA Multiagent System Product Line. In 7th International Workshop on Agent Oriented Software Engineering 2006, page to be published, Hakodate, Japan, May 2006. LNCS. https://doi.org/10.1007%2F978-3-540-70945-9_13 * Joaquin Peña, Michael G. Hinchey, Manuel Resinas, Roy Sterritt, James L. Rash. Managing the Evolution of an Enterprise Architecture using a MAS-Product-Line Approach. 5th Int. Workshop on System/Software Architectures (IWSSA’06). Nevada, USA. 2006 * Soe-Tsyr Yuan. MAS Building Environments with Product-Line-Architecture Awareness. * [https://web.archive.org/web/20070517214904/http://www.cs.iastate.edu/~dehlinge/publications.html Josh_Dehlinger] and [~~http://www.cs.iastate.edu/~rlutz/homepage.html~~ [Robyn R. Lutz]] have several publications in this field. * [https://web.archive.org/web/20091231195122/http://james.eii.us.es/MaCMAS/images/6/69/Current-Research-MAS-PL-TF4-Lisbon.pdf MAS-PL -- Current research]. In [http://www.irit.fr/ACTIVITES/EQ_SMI/SMAC/TFG4_CFP.html THE FOURTH TECHNICAL FORUM (TF4) of AgentLink]. December 2006. [[Category:Software project management]] ~~{{software-stub}}~~ {{software-eng-stub}}