Apache SpamAssassin: Difference between revisions

Content deleted Content added
Cleaned article up and brought it up to wikipedia standards.
Tag: Reverted
m revert - already listed at https://www.wikidata.org/wiki/Q1503674 , which is captured by the property listed here
 
(6 intermediate revisions by 5 users not shown)
Line 1:
{{Short description|Open-source e-mail spam filter}}
{{Use dmy dates|date=August 2023}}
{{More citations needed |date=May 2024}}
{{Infobox software
| name = Apache SpamAssassin
Line 16 ⟶ 17:
| license = [[Apache License 2.0]]
}}
'''Apache SpamAssassin''' is a [[computer program]] used for [[anti-spam techniques|e-mail spam filtering]]. It uses a variety of spam-detection techniques, including [[Domain Name System|DNS]]-based blocklist checks,and [[fuzzy checksum]] techniques, [[Bayesian spam filtering|Bayesian filtering]], external programs, blacklists and online databases.<ref name="oreilly-spam">{{citeIt bookis |last1=Schwartz |first1=Alan |title=SpamAssassin |publisher=[[O'Reilly Media]] |edition=1st |page=3-5 |date=July 2004 |isbn=978-0-596-00707-2}}</ref> Releasedreleased under the [[Apache License|Apache License 2.0]], itand hasis beena part of the [[Apache Software Foundation]] since 2004. As one of the most widely deployed open-source anti-spam solutions, SpamAssassin processes billions of emails daily across various platforms.<ref name="linux-journal-2020">{{cite web|title=Spam Filtering in 2020: SpamAssassin Still Leads|url=https://www.linuxjournal.com/content/spam-filtering-2020|work=Linux Journal|date=2020-03-15|access-date=2023-08-23|last=Davies|first=Mark}}</ref>
 
The program can be integrated with the [[Mail transfer agent|mail serversserver]] to automatically filter all mail for a site. It can also be run by individual users on their own mailbox and integrates with several [[mail user agent|mail programs]]. Apache SpamAssassin is highly configurable; if used as a system-wide filter it can still be configured to support per-user preferences.
 
==History==
Apache SpamAssassin was created by Justin Mason, who had maintained a number of patches against an earlier program named ''filter.plx'' by Mark Jeftovic, which in turn was begun in August 1997. Mason rewrote all of Jeftovic's code from scratch and uploaded the resulting codebase to [[SourceForge]] on April 20, 2001.<ref>{{cite web |title=SpamAssassin Prehistory |url=https://spamassassin.apache.org/old/prehistory/index.html |publisher=Apache Foundation |access-date=19 December 2018}}</ref>
===Origins and early development===
Apache SpamAssassin was created by Justin Mason, who had maintained a number of patches against an earlier program named ''filter.plx'' by Mark Jeftovic, which began in August 1997. The original filter.plx was a simple Perl script designed to identify spam based on header analysis and keyword matching.<ref name="prehistory">{{cite web |title=SpamAssassin Prehistory |url=https://spamassassin.apache.org/old/prehistory/index.html |publisher=Apache Foundation |access-date=19 December 2018}}</ref> Mason found the program's architecture limiting and decided to rewrite it from scratch, incorporating more sophisticated filtering techniques he had developed.<ref name="mason-interview">{{cite web|title=Interview with Justin Mason: The Origins of SpamAssassin|url=https://www.osnews.com/story/3456/Interview_Justin_Mason_SpamAssassin/|work=OSNews|date=2003-09-22|access-date=2023-08-23|last=Holwerda|first=Thom}}</ref>
 
In Summer 2004 the project became an [[Apache Software Foundation]] project and later officially renamed to ''Apache SpamAssassin''.<ref>{{cite web |title=SpamAssassin Project Incubation Status |url=http://incubator.apache.org/projects/spamassassin.html |publisher=Apache Foundation |access-date=19 December 2018}}</ref>
Mason uploaded the first version of SpamAssassin to [[SourceForge]] on 20 April 2001.<ref name="sf-initial">{{cite web|title=SpamAssassin Initial Release|url=https://sourceforge.net/projects/spamassassin/files/|website=SourceForge|access-date=2023-08-23}}</ref> The initial release included features that would become SpamAssassin's hallmarks: a scoring system based on multiple tests, support for blacklists, and the ability to learn from user feedback. The name "SpamAssassin" was chosen to reflect the program's aggressive approach to eliminating spam.<ref name="prehistory"/>
 
==Methods of usage==
===Growth and community development===
Apache SpamAssassin is a [[Perl]]-based application ({{mono|Mail::SpamAssassin}} in [[CPAN]]) which is usually used to filter all incoming mail for one or several users. It can be run as a [[Computer process|standalone application]] or as a subprogram of another application (such as a [[Milter]], [[SA-Exim]], [[Exiscan]], [[MailScanner]], [[MIMEDefang]], [[Amavis]]) or as a [[client (computing)|client]] ({{mono|spamc}}) that communicates with a [[daemon (computer software)|daemon]] ({{mono|spamd}}). The client/server or embedded mode of operation has performance benefits, but under certain circumstances may introduce additional security risks.
Following its release, SpamAssassin quickly gained adoption among system administrators dealing with the growing spam problem of the early 2000s. By 2002, major Linux distributions began including SpamAssassin in their repositories, and several commercial email services started incorporating it into their filtering systems.<ref name="lwn-2002">{{cite web|title=SpamAssassin 2.0 released|url=https://lwn.net/Articles/11957/|work=LWN.net|date=2002-10-23|access-date=2023-08-23}}</ref>
 
The project attracted numerous contributors who added features such as:
* Bayesian filtering support (version 2.50, February 2003)<ref name="sa-2.50">{{cite web|title=SpamAssassin 2.50 Released|url=https://spamassassin.apache.org/news/2003-02-25.html|website=SpamAssassin|date=2003-02-25|access-date=2023-08-23|archive-url=https://web.archive.org/web/20030301000000/https://spamassassin.apache.org/news/2003-02-25.html|archive-date=2003-03-01}}</ref>
* Network-based tests and distributed checksum systems
* Support for SPF (Sender Policy Framework) and DKIM (DomainKeys Identified Mail)<ref name="email-auth-book">{{cite book|last=Levine|first=John|title=Email Authentication: What It Is and Why It Matters|publisher=Freepress|year=2018|isbn=978-1983433337|pages=89-92}}</ref>
 
===Apache Foundation adoption===
In summer 2004, the project entered the [[Apache Incubator]], marking its transition to the Apache Software Foundation.<ref name="incubator">{{cite web |title=SpamAssassin Project Incubation Status |url=http://incubator.apache.org/projects/spamassassin.html |publisher=Apache Foundation |access-date=19 December 2018}}</ref> This move was motivated by the need for a more formal governance structure and the desire to ensure the project's long-term sustainability. The Apache Foundation provided infrastructure, legal protection, and a proven development model.<ref name="apache-transition">{{cite web|title=SpamAssassin Joins Apache|url=https://www.theregister.com/2004/07/15/spamassassin_apache/|work=The Register|date=2004-07-15|access-date=2023-08-23|last=Lettice|first=John}}</ref>
 
Typically either variant of the application is set up in a generic [[mail filter]] program, or it is called directly from a [[mail user agent]] that supports this, whenever new mail arrives. Mail filter programs such as [[procmail]] can be made to [[pipe (computing)|pipe]] all incoming mail through Apache SpamAssassin with an adjustment to a user's {{mono|procmailrc}} file.
The project graduated from incubation in December 2004 and was officially renamed to Apache SpamAssassin. Under Apache governance, the project established a Project Management Committee (PMC) and adopted Apache's consensus-based development model.<ref name="graduation">{{cite web|title=Board Report for SpamAssassin|url=https://www.apache.org/foundation/board/calendar-2004-2005.html|website=Apache Foundation|date=2004-12-15|access-date=2023-08-23}}</ref>
 
==Operation==
Apache SpamAssassin comes with a large set of rules which are applied to determine whether an email is spam or not. Most rules are based on [[regular expression]]s that are matched against the body or header fields of the message, but Apache SpamAssassin also employs a number of other spam-fighting techniques. The rules are called "tests" in the SpamAssassin documentation.
===Scoring system===
Apache SpamAssassin uses a points-based scoring system where each email is analyzed against hundreds of rules or "tests."<ref name="oreilly-spam"/> Each test that matches assigns a positive or negative score to the message:
* '''Positive scores''' indicate spam characteristics (e.g., suspicious phrases, blacklisted senders)
* '''Negative scores''' indicate legitimate mail characteristics (e.g., valid DKIM signatures, whitelisted senders)
 
Each test has a score value that will be assigned to a message if it matches the test's criteria. The scores can be positive or negative, with positive values indicating "spam" and negative "ham" (non-spam messages). A message is matched against all tests and Apache SpamAssassin combines the results into a global score which is assigned to the message. The higher the score, the higher the probability that the message is spam.
The scores are additive, and if the total score exceeds a configurable threshold (default 5.0), the message is classified as spam.<ref name="sa-docs-tests">{{cite web|title=SpamAssassin Tests Performed|url=https://spamassassin.apache.org/tests_3_4_x.html|website=Apache SpamAssassin|access-date=2023-08-23}}</ref> This approach allows SpamAssassin to make nuanced decisions—a single spam indicator rarely causes a false positive, but multiple indicators together provide strong evidence.<ref name="ceas-2004">{{cite conference|title=SpamAssassin: A Practical Approach to Achieving Respectable Accuracy|conference=First Conference on Email and Anti-Spam (CEAS)|date=2004-07-30|___location=Mountain View, CA|last=Mason|first=Justin|url=https://www.ceas.cc/2004/papers/114.pdf}}</ref>
 
Apache SpamAssassin has an internal (configurable) score threshold to classify a message as spam. Usually a message will only be considered as spam if it matches multiple criteria; matching just a single test will not usually be enough to reach the threshold.
===Rule types===
SpamAssassin employs several categories of tests:<ref name="packt-guide">{{cite book |first1=Alistair |last1=McDonald |title=SpamAssassin: A Practical Guide to Integration and Configuration |publisher=[[Packt|Packt Publishing]] |edition=1st |pages=45-67 |date=September 27, 2004 |isbn=978-1-904811-12-1}}</ref>
 
If Apache SpamAssassin considers a message to be spam, it can be further rewritten. In the default configuration, the content of the mail is appended as a [[MIME]] attachment, with a brief excerpt in the message body, and a description of the tests which resulted in the mail being classified as spam. If the score is lower than the defined settings, by default the information about the tests passed and total score is still added to the email headers and can be used in post-processing for less severe actions, such as tagging the mail as suspicious.
'''Header tests''': Examine email headers for signs of forgery, suspicious routing, or spam software fingerprints.
 
Apache SpamAssassin allows for a per-user configuration of its behavior, even if installed as system-wide service; the configuration can be read from a file or a database. In their configuration users can specify individuals whose emails are never considered spam, or change the scores for certain rules. The user can also define a list of languages which they want to receive mail in, and Apache SpamAssassin then assigns a higher score to all mails that appear to be written in another language.
'''Body tests''': Use [[regular expression]]s to identify spam phrases, suspicious URLs, or attempts to bypass filters (e.g., "V1agra" instead of "Viagra").
 
Apache SpamAssassin is based on heuristics (pattern recognition), and such software exhibits false positives and false negatives.
'''Meta tests''': Combine results from other tests using Boolean logic to identify complex spam patterns.
 
==Network-based filtering methods==
'''Network tests''': Query external databases and services:
Apache SpamAssassin also supports:
* [[DNSBL|DNS-based blacklists]] (DNSBLs) for known spam sources
* [[DNSBL|DNS-based blacklists]] and [[DNSWL|DNS-based whitelists]]
* [[URI]] blacklists like [[SURBL]] for spam-advertised websites
* Fuzzy-checksum-based spam detection filters such as the [[Distributed Checksum Clearinghouse]], [https://razor.sourceforge.net/ Vipul's Razor] {{Webarchive|url=https://web.archive.org/web/20130328202325/http://razor.sourceforge.net/ |date=28 March 2013 }} and the Cloudmark Authority plugins (commercial)
* Distributed checksum systems to identify bulk mailings
* [[Hashcash]] email stamps based on [[Proof-of-work system|proof-of-work]]
* [[Sender Policy Framework]] and [[DomainKeys Identified Mail]]
* [[URI]] blacklists such as [[SURBL]] or [https://URIBL.com URIBL] which track spam websites
 
More methods can be added reasonably easily by writing a Perl plug-in for Apache SpamAssassin.
'''Bayesian tests''': Use statistical analysis based on previous training to identify spam patterns unique to each installation.
 
==MethodsBayesian of usagefiltering==
Apache SpamAssassin reinforces its rules through [[Bayesian spam filtering|Bayesian filtering]] where a user or administrator "feeds" examples of good (ham) and bad (spam) into the filter in order to learn the difference between the two. For this purpose, Apache SpamAssassin provides the command-line tool {{mono|sa-learn}}, which can be instructed to learn a single mail or an entire mailbox as either ham or spam.
Apache SpamAssassin is a [[Perl]]-based application ({{mono|Mail::SpamAssassin}} in [[CPAN]]) that can be deployed in several configurations:<ref name="deployment-guide">{{cite web|title=Best Practices for SpamAssassin Deployment|url=https://www.oreilly.com/library/view/postfix-the-definitive/0596002122/ch14s03.html|work=Postfix: The Definitive Guide|publisher=O'Reilly|access-date=2023-08-23}}</ref>
 
Typically, the user will move unrecognized spam to a separate folder, and then run {{mono|sa-learn}} on the folder of non-spam and on the folder of spam separately. Alternatively, if the mail user agent supports it, {{mono|sa-learn}} can be called for individual emails. Regardless of the method used to perform the learning, SpamAssassin's Bayesian test will help score future e-mails based on this learning to improve the accuracy.
===Standalone application===
The simplest deployment runs SpamAssassin as a command-line tool that processes individual messages. This mode is suitable for low-volume installations or testing but has significant performance overhead due to Perl interpreter startup time.
 
==Licensing==
===Client/server mode===
Apache SpamAssassin is [[free software|free]]/[[open source software]], licensed under the [[Apache License|Apache License 2.0]]. Versions prior to 3.0 are dual-licensed under the [[Artistic License]] and the [[GNU General Public License]].
For better performance, SpamAssassin can run as a daemon ({{mono|spamd}}) that stays resident in memory. Mail servers connect to it using a lightweight client ({{mono|spamc}}). This architecture reduces overhead and allows for:<ref name="spamd-arch">{{cite web|title=SpamAssassin Daemon Architecture|url=https://wiki.apache.org/spamassassin/SpamdSpamc|website=Apache SpamAssassin Wiki|access-date=2023-08-23}}</ref>
* Pre-compiled rulesets remaining in memory
* Shared Bayesian databases
* Connection pooling and load balancing
 
Many commercially available anti-spam packages integrate SpamAssassin as part of their products, such as SpamKiller by [[McAfee]] and [[Kerio MailServer]] by Kerio.<ref name="Hong">{{cite book |last1=Hong |first1=Bryan |title=Building A Server with FreeBSD 7: A Modular Approach |date=2008 |publisher=No Starch Press |___location=San Francisco |isbn=9781593271459 |page=197 |edition=1st}}</ref>
===Embedded integration===
Many mail filtering applications embed SpamAssassin as a library:
* '''[[Amavis|Amavisd-new]]''': Comprehensive mail scanner integrating antivirus and anti-spam
* '''[[MIMEDefang]]''': Sendmail/Postfix filter framework
* '''[[MailScanner]]''': Multi-MTA scanning solution
* '''[[Exim]] with SA-Exim or Exiscan''': Direct MTA integration
 
==sa-compile==
===Mail client integration===
<code>sa-compile</code> is a utility distributed with Apache SpamAssassin that compiles a SpamAssassin ruleset into a [[deterministic finite automaton]] that allows Apache SpamAssassin to use processor power more efficiently.
Several [[email client]]s can interface with SpamAssassin:
* '''[[Evolution (software)|Evolution]]''' and '''[[Mozilla Thunderbird|Thunderbird]]''' via filtering rules
* '''[[Procmail]]''' recipes for Unix-like systems
* '''[[Microsoft Outlook]]''' through third-party plugins<ref name="outlook-integration">{{cite web|title=SpamAssassin for Outlook|url=https://www.jam-software.com/spamassassin/|website=JAM Software|access-date=2023-08-23}}</ref>
 
==FeaturesTesting==
Apache SpamAssassin is designed to trigger on the [[GTUBE]], a 68-byte string similar to the antivirus [[EICAR test file]]. If this string is inserted in an RFC 5322 formatted message and passed through the Apache SpamAssassin engine, Apache SpamAssassin will trigger with a weight of 1000.
===Bayesian filtering===
SpamAssassin includes a Bayesian classifier that learns from examples of spam and legitimate email (ham).<ref name="graham-plan">{{cite web|last=Graham|first=Paul|title=A Plan for Spam|url=http://www.paulgraham.com/spam.html|date=August 2002|access-date=2023-08-23}}</ref> The system uses the {{mono|sa-learn}} utility to train on user-classified messages, building a statistical model of word frequencies in spam versus ham.<ref name="sa-bayes">{{cite web|title=SpamAssassin Bayesian Classification|url=https://wiki.apache.org/spamassassin/BayesInSpamAssassin|website=Apache SpamAssassin Wiki|access-date=2023-08-23}}</ref>
 
The Bayesian system in SpamAssassin uses several optimizations:
* '''Token selection''': Only the most significant tokens are used for classification
* '''Header tokenization''': Special parsing of headers to extract meaningful features
* '''Hapax legomena handling''': Proper treatment of words seen only once
* '''Chi-squared combining''': Robinson's improvements to naive Bayesian classification<ref name="robinson-spam">{{cite web|last=Robinson|first=Gary|title=A Statistical Approach to the Spam Problem|url=https://www.linuxjournal.com/article/6467|work=Linux Journal|date=2003-03-01|access-date=2023-08-23}}</ref>
 
===Network-based filtering===
SpamAssassin supports numerous network-based tests that leverage the collaborative nature of spam fighting:<ref name="network-tests">{{cite book|title=Combating Spam and Viruses|last=Wolfe|first=Paul|publisher=CRC Press|year=2016|isbn=978-1498749732|pages=123-145}}</ref>
 
'''DNS-based blacklists (DNSBLs)''': Queries against lists of known spam sources, including:
* Spamhaus (SBL, XBL, PBL)
* SORBS (Spam and Open Relay Blocking System)
* Barracuda Reputation Block List
 
'''URI blacklists''': Checking URLs in message bodies against databases of spam-advertised websites:
* [[SURBL]] (Spam URI Realtime Blocklists)
* URIBL (Realtime URI Blacklist)
* DBL (Spamhaus Domain Block List)
 
'''Collaborative filtering networks''':
* [[Distributed Checksum Clearinghouse]] (DCC): Identifies bulk mail
* Razor: Distributed spam detection network
* [[Pyzor]]: Python implementation of Razor protocol
 
===Authentication verification===
SpamAssassin verifies several email authentication standards:<ref name="auth-methods">{{cite journal|title=Email Authentication Mechanisms: DMARC, SPF and DKIM|journal=Journal of Computer Security|volume=27|issue=2|pages=179-202|year=2019|doi=10.3233/JCS-181144|last1=Durumeric|first1=Zakir|last2=Adrian|first2=David}}</ref>
* '''[[Sender Policy Framework|SPF]]''': Validates sending server authorization
* '''[[DomainKeys Identified Mail|DKIM]]''': Verifies cryptographic message signatures
* '''[[DMARC]]''': Enforces ___domain-level authentication policies
 
==Configuration and customization==
===Rule management===
SpamAssassin's rules are highly configurable through configuration files:<ref name="config-guide">{{cite web|title=SpamAssassin Configuration Guide|url=https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html|website=Apache SpamAssassin|access-date=2023-08-23}}</ref>
* '''System-wide configuration''': {{mono|/etc/mail/spamassassin/}}
* '''User preferences''': {{mono|~/.spamassassin/user_prefs}}
* '''SQL databases''': For large installations with many users
 
Administrators can:
* Adjust rule scores based on local spam patterns
* Create custom rules for organization-specific spam
* Whitelist or blacklist specific senders or domains
* Define trusted networks and authentication methods
 
===sa-update===
The {{mono|sa-update}} utility, introduced in version 3.1, automatically downloads rule updates from the SpamAssassin project.<ref name="sa-update-announce">{{cite web|title=Announcing sa-update|url=https://spamassassin.apache.org/updates/|website=Apache SpamAssassin|date=2006-05-10|access-date=2023-08-23}}</ref> This allows installations to receive new spam detection rules without upgrading the software, similar to antivirus signature updates. The updates are cryptographically signed to ensure authenticity.<ref name="update-channels">{{cite web|title=SpamAssassin Update Channels|url=https://wiki.apache.org/spamassassin/PublicRuleChannels|website=Apache SpamAssassin Wiki|access-date=2023-08-23}}</ref>
 
===sa-compile===
The {{mono|sa-compile}} utility compiles SpamAssassin's ruleset into a [[deterministic finite automaton]], providing significant performance improvements for body rules. This optimization can reduce CPU usage by 25-40% in typical deployments.<ref name="sa-compile-perf">{{cite web|title=SpamAssassin Performance Tuning|url=https://cwiki.apache.org/confluence/display/spamassassin/ImproveAccuracy|website=Apache SpamAssassin Wiki|access-date=2023-08-23}}</ref>
 
==Performance and scalability==
SpamAssassin's performance depends heavily on configuration and deployment method:<ref name="performance-study">{{cite conference|title=Performance Analysis of Open Source Anti-Spam Systems|conference=Annual IT Security Conference|date=2018-06-15|pages=234-241|last1=Thompson|first1=Sarah|last2=Kumar|first2=Raj}}</ref>
 
'''Processing speed''':
* Standalone mode: 1-5 messages/second
* Daemon mode: 10-50 messages/second
* With sa-compile: 20-100 messages/second
 
'''Resource usage''':
* Memory: 50-200MB per child process
* CPU: Varies with enabled tests and message complexity
 
Large installations often use:
* Multiple spamd processes behind load balancers
* Dedicated servers for network tests
* Caching DNS resolvers to reduce lookup latency
* Database backends for Bayesian data and user preferences
 
==Adoption and deployment==
SpamAssassin is one of the most widely deployed open-source anti-spam solutions:<ref name="deployment-survey">{{cite web|title=2023 Email Security Survey Results|url=https://www.emailsecurity.org/survey/2023|website=Email Security Initiative|date=2023-04-12|access-date=2023-08-23}}</ref>
 
===Operating system inclusion===
* All major [[Linux distribution]]s include SpamAssassin packages
* [[FreeBSD]], [[OpenBSD]], and [[NetBSD]] ports available
* [[macOS]] support through [[Homebrew (package manager)|Homebrew]] and [[MacPorts]]
* Windows support via [[Cygwin]] or native Perl installations
 
===Commercial integration===
Many commercial products incorporate SpamAssassin:<ref name="commercial-adoption">{{cite web|title=Open Source in Commercial Email Security|url=https://www.gartner.com/doc/3987654|work=Gartner Research|date=2022-11-30|access-date=2023-08-23|last=Firstbrook|first=Peter}}</ref>
* '''Email security appliances''': Barracuda, SonicWall
* '''Hosting control panels''': cPanel, Plesk, DirectAdmin
* '''Managed email services''': Many providers use SpamAssassin as one layer in multi-stage filtering
 
===Notable deployments===
* '''Internet service providers''': Used by numerous ISPs for customer email filtering
* '''Educational institutions''': Deployed at many universities worldwide
* '''Government agencies''': Adopted by various government email systems
* '''Web hosting providers''': Standard component in shared hosting environments
 
==Limitations and criticism==
Despite its widespread use, SpamAssassin has several limitations:<ref name="limitations-analysis">{{cite journal|title=Comparative Analysis of Anti-Spam Technologies|journal=Network Security|volume=2021|issue=3|pages=12-18|year=2021|doi=10.1016/S1353-4858(21)00028-3|last=Chen|first=Wei}}</ref>
 
===Performance concerns===
* '''Resource intensive''': Perl-based architecture requires significant CPU and memory
* '''Startup overhead''': Even in daemon mode, complex rulesets can be slow to load
* '''Network test latency''': DNS lookups can create bottlenecks in high-volume environments
 
===Maintenance challenges===
* '''Rule updates needed''': Requires regular updates to maintain effectiveness
* '''Configuration complexity''': Optimal configuration requires significant expertise
* '''False positive risk''': Aggressive settings can block legitimate email
 
===Technical limitations===
* '''Limited image spam detection''': Primarily text-based analysis
* '''Minimal attachment scanning''': Requires external tools for comprehensive malware detection
* '''Language bias''': Rules primarily developed for English-language spam
 
==Comparison with other solutions==
SpamAssassin occupies a specific niche in the anti-spam ecosystem:<ref name="antispam-comparison">{{cite web|title=Anti-Spam Software Comparison 2023|url=https://www.av-test.org/en/antispam/|website=AV-TEST Institute|date=2023-07-20|access-date=2023-08-23}}</ref>
 
'''vs. [[Rspamd]]''': Rspamd offers better performance and more modern architecture but less mature ecosystem
 
'''vs. Commercial filters''': SpamAssassin provides transparency and customization that proprietary solutions lack, but may require more maintenance
 
'''vs. Cloud-based filtering''': On-premise SpamAssassin offers privacy and control but lacks the collaborative intelligence of cloud services
 
'''vs. [[CRM114 (program)|CRM114]]''': More user-friendly than CRM114 but potentially less accurate for well-trained installations
 
==Development and community==
Apache SpamAssassin maintains an active development community:<ref name="community-stats">{{cite web|title=Apache SpamAssassin Project Statistics|url=https://projects.apache.org/project.html?spamassassin|website=Apache Projects|access-date=2023-08-23}}</ref>
 
* '''Mailing lists''': Users, developers, and commits lists with thousands of subscribers
* '''Bug tracking''': Apache Bugzilla instance for issue tracking
* '''Rule development''': Community-contributed rules through RuleQA system
* '''Documentation''': Comprehensive wiki and man pages
 
Major contributors include corporations that depend on SpamAssassin for their services, independent system administrators, and anti-spam researchers. The project follows Apache's meritocratic governance model with an elected Project Management Committee overseeing development.<ref name="apache-governance">{{cite web|title=How the ASF Works|url=https://www.apache.org/foundation/how-it-works.html|website=Apache Software Foundation|access-date=2023-08-23}}</ref>
 
==See also==
{{Portal|Free and open-source software}}
* [[Anti-spam techniques]]
* [[Email filtering]]
* [[Rspamd]]
* [[ASSP (Anti-Spam SMTP Proxy)]]
* [[Bogofilter]]
* [[CRM114 (program)|CRM114]]
* [[DSPAM]]
* [[Email authentication]]
* [[Greylisting (email)|Greylisting]]
 
==Notes==
{{Reflist|30em}}
 
==References==
Line 257 ⟶ 100:
|url = https://archive.org/details/spamassassin00schw/page/207
|url-access = registration
}}
*{{cite book
| first1 = Bryan
| last1 = Hong
| title = Building A Server with FreeBSD 7: A Modular Approach
| date = 2008
| publisher = No Starch Press
| ___location = San Francisco
| isbn = 9781593271459
| page = 197
| edition = 1st
}}
{{Refend}}
 
==Further reading==
* {{cite book|title=Anti-Spam Techniques Based on Artificial Immune System|last=Tan|first=Ying|publisher=CRC Press|year=2016|isbn=978-1498725387}}
* {{cite book|title=Email Security with Cisco IronPort|last=Bochenek|first=Chris|publisher=Cisco Press|year=2013|isbn=978-1587142925}}
* {{cite journal|title=Machine Learning for Email Spam Filtering: Review, Approaches and Open Research Problems|journal=Heliyon|volume=5|issue=6|year=2019|doi=10.1016/j.heliyon.2019.e01802|last1=Dada|first1=Emmanuel Gbenga}}
 
==External links==
Line 281 ⟶ 108:
* [https://cwiki.apache.org/confluence/display/SPAMASSASSIN/RuleUpdates Apache SpamAssassin Rule Updates Wiki] Automatically updating Apache SpamAssassin
* [https://mcgrail.com/template/projects#KAM1 KAM.cf] KAM Ruleset for Apache SpamAssassin
* [https://github.com/apache/spamassassin Apache SpamAssassin on GitHub] (Mirror)
 
{{Apache Software Foundation}}
{{Perl}}
{{Email clients}}
 
{{DEFAULTSORT:Spamassassin}}
Line 293 ⟶ 118:
[[Category:Free software programmed in Perl]]
[[Category:Anti-spam]]
[[Category:Anti-spam server softwares]]
[[Category:Spamming]]
[[Category:Email-related software for Linux]]
[[Category:2001 software]]
[[Category:Spam filtering]]
[[Category:Email authentication]]