Apache SpamAssassin

Apache SpamAssassin
Apache SpamAssassin
Developer(s)	Apache Software Foundation
Initial release	April 20, 2001; 24 years ago
Stable release	4.0.2 / 30 August 2025; 4 days ago
Repository	SpamAssassin Repository
Written in	Perl, C
Operating system	Cross-platform
Type	Spam filter
License	Apache License 2.0
Website	spamassassin.apache.org

Apache SpamAssassin is a computer program used for e-mail spam filtering. It uses a variety of spam-detection techniques, including DNS-based blocklist checks, fuzzy checksum techniques, Bayesian filtering, external programs, blacklists and online databases.^[4] Released under the Apache License 2.0, it has been part of the Apache Software Foundation since 2004. As one of the most widely deployed open-source anti-spam solutions, SpamAssassin processes billions of emails daily across various platforms.^[5]

The program can be integrated with mail servers to automatically filter all mail for a site. It can also be run by individual users on their own mailbox and integrates with several mail programs. Apache SpamAssassin is highly configurable; if used as a system-wide filter it can still be configured to support per-user preferences.

History

Origins and early development

Apache SpamAssassin was created by Justin Mason, who had maintained a number of patches against an earlier program named filter.plx by Mark Jeftovic, which began in August 1997. The original filter.plx was a simple Perl script designed to identify spam based on header analysis and keyword matching.^[6] Mason found the program's architecture limiting and decided to rewrite it from scratch, incorporating more sophisticated filtering techniques he had developed.^[7]

Mason uploaded the first version of SpamAssassin to SourceForge on 20 April 2001.^[8] The initial release included features that would become SpamAssassin's hallmarks: a scoring system based on multiple tests, support for blacklists, and the ability to learn from user feedback. The name "SpamAssassin" was chosen to reflect the program's aggressive approach to eliminating spam.^[6]

Growth and community development

Following its release, SpamAssassin quickly gained adoption among system administrators dealing with the growing spam problem of the early 2000s. By 2002, major Linux distributions began including SpamAssassin in their repositories, and several commercial email services started incorporating it into their filtering systems.^[9]

The project attracted numerous contributors who added features such as:

Bayesian filtering support (version 2.50, February 2003)^[10]
Network-based tests and distributed checksum systems
Support for SPF (Sender Policy Framework) and DKIM (DomainKeys Identified Mail)^[11]

Apache Foundation adoption

In summer 2004, the project entered the Apache Incubator, marking its transition to the Apache Software Foundation.^[12] This move was motivated by the need for a more formal governance structure and the desire to ensure the project's long-term sustainability. The Apache Foundation provided infrastructure, legal protection, and a proven development model.^[13]

The project graduated from incubation in December 2004 and was officially renamed to Apache SpamAssassin. Under Apache governance, the project established a Project Management Committee (PMC) and adopted Apache's consensus-based development model.^[14]

Operation

Scoring system

Apache SpamAssassin uses a points-based scoring system where each email is analyzed against hundreds of rules or "tests."^[4] Each test that matches assigns a positive or negative score to the message:

Positive scores indicate spam characteristics (e.g., suspicious phrases, blacklisted senders)
Negative scores indicate legitimate mail characteristics (e.g., valid DKIM signatures, whitelisted senders)

The scores are additive, and if the total score exceeds a configurable threshold (default 5.0), the message is classified as spam.^[15] This approach allows SpamAssassin to make nuanced decisions—a single spam indicator rarely causes a false positive, but multiple indicators together provide strong evidence.^[16]

Rule types

SpamAssassin employs several categories of tests:^[17]

Header tests: Examine email headers for signs of forgery, suspicious routing, or spam software fingerprints.

Body tests: Use regular expressions to identify spam phrases, suspicious URLs, or attempts to bypass filters (e.g., "V1agra" instead of "Viagra").

Meta tests: Combine results from other tests using Boolean logic to identify complex spam patterns.

Network tests: Query external databases and services:

DNS-based blacklists (DNSBLs) for known spam sources
URI blacklists like SURBL for spam-advertised websites
Distributed checksum systems to identify bulk mailings

Bayesian tests: Use statistical analysis based on previous training to identify spam patterns unique to each installation.

Methods of usage

Apache SpamAssassin is a Perl-based application (Mail::SpamAssassin in CPAN) that can be deployed in several configurations:^[18]

Standalone application

The simplest deployment runs SpamAssassin as a command-line tool that processes individual messages. This mode is suitable for low-volume installations or testing but has significant performance overhead due to Perl interpreter startup time.

Client/server mode

For better performance, SpamAssassin can run as a daemon (spamd) that stays resident in memory. Mail servers connect to it using a lightweight client (spamc). This architecture reduces overhead and allows for:^[19]

Pre-compiled rulesets remaining in memory
Shared Bayesian databases
Connection pooling and load balancing

Embedded integration

Many mail filtering applications embed SpamAssassin as a library:

Amavisd-new: Comprehensive mail scanner integrating antivirus and anti-spam
MIMEDefang: Sendmail/Postfix filter framework
MailScanner: Multi-MTA scanning solution
Exim with SA-Exim or Exiscan: Direct MTA integration

Mail client integration

Several email clients can interface with SpamAssassin:

Evolution and Thunderbird via filtering rules
Procmail recipes for Unix-like systems
Microsoft Outlook through third-party plugins^[20]

Features

Bayesian filtering

SpamAssassin includes a Bayesian classifier that learns from examples of spam and legitimate email (ham).^[21] The system uses the sa-learn utility to train on user-classified messages, building a statistical model of word frequencies in spam versus ham.^[22]

The Bayesian system in SpamAssassin uses several optimizations:

Token selection: Only the most significant tokens are used for classification
Header tokenization: Special parsing of headers to extract meaningful features
Hapax legomena handling: Proper treatment of words seen only once
Chi-squared combining: Robinson's improvements to naive Bayesian classification^[23]

Network-based filtering

SpamAssassin supports numerous network-based tests that leverage the collaborative nature of spam fighting:^[24]

DNS-based blacklists (DNSBLs): Queries against lists of known spam sources, including:

Spamhaus (SBL, XBL, PBL)
SORBS (Spam and Open Relay Blocking System)
Barracuda Reputation Block List

URI blacklists: Checking URLs in message bodies against databases of spam-advertised websites:

SURBL (Spam URI Realtime Blocklists)
URIBL (Realtime URI Blacklist)
DBL (Spamhaus Domain Block List)

Collaborative filtering networks:

Distributed Checksum Clearinghouse (DCC): Identifies bulk mail
Razor: Distributed spam detection network
Pyzor: Python implementation of Razor protocol

Authentication verification

SpamAssassin verifies several email authentication standards:^[25]

SPF: Validates sending server authorization
DKIM: Verifies cryptographic message signatures
DMARC: Enforces ___domain-level authentication policies

Configuration and customization

Rule management

SpamAssassin's rules are highly configurable through configuration files:^[26]

System-wide configuration: /etc/mail/spamassassin/
User preferences: ~/.spamassassin/user_prefs
SQL databases: For large installations with many users

Administrators can:

Adjust rule scores based on local spam patterns
Create custom rules for organization-specific spam
Whitelist or blacklist specific senders or domains
Define trusted networks and authentication methods

sa-update

The sa-update utility, introduced in version 3.1, automatically downloads rule updates from the SpamAssassin project.^[27] This allows installations to receive new spam detection rules without upgrading the software, similar to antivirus signature updates. The updates are cryptographically signed to ensure authenticity.^[28]

sa-compile

The sa-compile utility compiles SpamAssassin's ruleset into a deterministic finite automaton, providing significant performance improvements for body rules. This optimization can reduce CPU usage by 25-40% in typical deployments.^[29]

Performance and scalability

SpamAssassin's performance depends heavily on configuration and deployment method:^[30]

Processing speed:

Standalone mode: 1-5 messages/second
Daemon mode: 10-50 messages/second
With sa-compile: 20-100 messages/second

Resource usage:

Memory: 50-200MB per child process
CPU: Varies with enabled tests and message complexity

Large installations often use:

Multiple spamd processes behind load balancers
Dedicated servers for network tests
Caching DNS resolvers to reduce lookup latency
Database backends for Bayesian data and user preferences

Adoption and deployment

SpamAssassin is one of the most widely deployed open-source anti-spam solutions:^[31]

Operating system inclusion

All major Linux distributions include SpamAssassin packages
FreeBSD, OpenBSD, and NetBSD ports available
macOS support through Homebrew and MacPorts
Windows support via Cygwin or native Perl installations

Commercial integration

Many commercial products incorporate SpamAssassin:^[32]

Email security appliances: Barracuda, SonicWall
Hosting control panels: cPanel, Plesk, DirectAdmin
Managed email services: Many providers use SpamAssassin as one layer in multi-stage filtering

Notable deployments

Internet service providers: Used by numerous ISPs for customer email filtering
Educational institutions: Deployed at many universities worldwide
Government agencies: Adopted by various government email systems
Web hosting providers: Standard component in shared hosting environments

Limitations and criticism

Despite its widespread use, SpamAssassin has several limitations:^[33]

Performance concerns

Resource intensive: Perl-based architecture requires significant CPU and memory
Startup overhead: Even in daemon mode, complex rulesets can be slow to load
Network test latency: DNS lookups can create bottlenecks in high-volume environments

Maintenance challenges

Rule updates needed: Requires regular updates to maintain effectiveness
Configuration complexity: Optimal configuration requires significant expertise
False positive risk: Aggressive settings can block legitimate email

Technical limitations

Limited image spam detection: Primarily text-based analysis
Minimal attachment scanning: Requires external tools for comprehensive malware detection
Language bias: Rules primarily developed for English-language spam

Comparison with other solutions

SpamAssassin occupies a specific niche in the anti-spam ecosystem:^[34]

vs. Rspamd: Rspamd offers better performance and more modern architecture but less mature ecosystem

vs. Commercial filters: SpamAssassin provides transparency and customization that proprietary solutions lack, but may require more maintenance

vs. Cloud-based filtering: On-premise SpamAssassin offers privacy and control but lacks the collaborative intelligence of cloud services

vs. CRM114: More user-friendly than CRM114 but potentially less accurate for well-trained installations

Development and community

Apache SpamAssassin maintains an active development community:^[35]

Mailing lists: Users, developers, and commits lists with thousands of subscribers
Bug tracking: Apache Bugzilla instance for issue tracking
Rule development: Community-contributed rules through RuleQA system
Documentation: Comprehensive wiki and man pages

Major contributors include corporations that depend on SpamAssassin for their services, independent system administrators, and anti-spam researchers. The project follows Apache's meritocratic governance model with an elected Project Management Committee overseeing development.^[36]

Notes

^ "Project Management Committee". The Apache Software Foundation. 2022. Retrieved 23 August 2023.
^ https://lists.apache.org/thread/vdmwnh6f05fnj9ddz93t70f9gy00ys0b. {{cite web}}: Missing or empty |title= (help)
^ https://marc.info/?l=spamassassin-announce&m=175656347700657&w=2. {{cite web}}: Missing or empty |title= (help)
^ ^a ^b Schwartz, Alan (July 2004). SpamAssassin (1st ed.). O'Reilly Media. p. 3-5. ISBN 978-0-596-00707-2.
^ Davies, Mark (15 March 2020). "Spam Filtering in 2020: SpamAssassin Still Leads". Linux Journal. Retrieved 23 August 2023.
^ ^a ^b "SpamAssassin Prehistory". Apache Foundation. Retrieved 19 December 2018.
^ Holwerda, Thom (22 September 2003). "Interview with Justin Mason: The Origins of SpamAssassin". OSNews. Retrieved 23 August 2023.
^ "SpamAssassin Initial Release". SourceForge. Retrieved 23 August 2023.
^ "SpamAssassin 2.0 released". LWN.net. 23 October 2002. Retrieved 23 August 2023.
^ "SpamAssassin 2.50 Released". SpamAssassin. 25 February 2003. Archived from the original on 1 March 2003. Retrieved 23 August 2023.
^ Levine, John (2018). Email Authentication: What It Is and Why It Matters. Freepress. pp. 89–92. ISBN 978-1983433337.
^ "SpamAssassin Project Incubation Status". Apache Foundation. Retrieved 19 December 2018.
^ Lettice, John (15 July 2004). "SpamAssassin Joins Apache". The Register. Retrieved 23 August 2023.
^ "Board Report for SpamAssassin". Apache Foundation. 15 December 2004. Retrieved 23 August 2023.
^ "SpamAssassin Tests Performed". Apache SpamAssassin. Retrieved 23 August 2023.
^ Mason, Justin (30 July 2004). SpamAssassin: A Practical Approach to Achieving Respectable Accuracy (PDF). First Conference on Email and Anti-Spam (CEAS). Mountain View, CA.
^ McDonald, Alistair (27 September 2004). SpamAssassin: A Practical Guide to Integration and Configuration (1st ed.). Packt Publishing. pp. 45–67. ISBN 978-1-904811-12-1.
^ "Best Practices for SpamAssassin Deployment". Postfix: The Definitive Guide. O'Reilly. Retrieved 23 August 2023.
^ "SpamAssassin Daemon Architecture". Apache SpamAssassin Wiki. Retrieved 23 August 2023.
^ "SpamAssassin for Outlook". JAM Software. Retrieved 23 August 2023.
^ Graham, Paul (August 2002). "A Plan for Spam". Retrieved 23 August 2023.
^ "SpamAssassin Bayesian Classification". Apache SpamAssassin Wiki. Retrieved 23 August 2023.
^ Robinson, Gary (1 March 2003). "A Statistical Approach to the Spam Problem". Linux Journal. Retrieved 23 August 2023.
^ Wolfe, Paul (2016). Combating Spam and Viruses. CRC Press. pp. 123–145. ISBN 978-1498749732.
^ Durumeric, Zakir; Adrian, David (2019). "Email Authentication Mechanisms: DMARC, SPF and DKIM". Journal of Computer Security. 27 (2): 179–202. doi:10.3233/JCS-181144 (inactive 29 May 2025).{{cite journal}}: CS1 maint: DOI inactive as of May 2025 (link)
^ "SpamAssassin Configuration Guide". Apache SpamAssassin. Retrieved 23 August 2023.
^ "Announcing sa-update". Apache SpamAssassin. 10 May 2006. Retrieved 23 August 2023.
^ "SpamAssassin Update Channels". Apache SpamAssassin Wiki. Retrieved 23 August 2023.
^ "SpamAssassin Performance Tuning". Apache SpamAssassin Wiki. Retrieved 23 August 2023.
^ Thompson, Sarah; Kumar, Raj (15 June 2018). Performance Analysis of Open Source Anti-Spam Systems. Annual IT Security Conference. pp. 234–241.
^ "2023 Email Security Survey Results". Email Security Initiative. 12 April 2023. Retrieved 23 August 2023.
^ Firstbrook, Peter (30 November 2022). "Open Source in Commercial Email Security". Gartner Research. Retrieved 23 August 2023.
^ Chen, Wei (2021). "Comparative Analysis of Anti-Spam Technologies". Network Security. 2021 (3): 12–18. doi:10.1016/S1353-4858(21)00028-3.
^ "Anti-Spam Software Comparison 2023". AV-TEST Institute. 20 July 2023. Retrieved 23 August 2023.
^ "Apache SpamAssassin Project Statistics". Apache Projects. Retrieved 23 August 2023.
^ "How the ASF Works". Apache Software Foundation. Retrieved 23 August 2023.

References

McDonald, Alistair (27 September 2004). SpamAssassin: A Practical Guide to Integration and Configuration (1st ed.). Packt Publishing. p. 240. ISBN 978-1-904811-12-1.
Schwartz, Alan (July 2004). SpamAssassin (1st ed.). O'Reilly Media. p. 207. ISBN 978-0-596-00707-2.
Hong, Bryan (2008). Building A Server with FreeBSD 7: A Modular Approach (1st ed.). San Francisco: No Starch Press. p. 197. ISBN 9781593271459.

External links

Official website
Apache SpamAssassin Wiki
Apache SpamAssassin Rule Updates Wiki Automatically updating Apache SpamAssassin
KAM.cf KAM Ruleset for Apache SpamAssassin
Apache SpamAssassin on GitHub (Mirror)

[1] "Project Management Committee". The Apache Software Foundation. 2022. Retrieved 23 August 2023.

[wikidata-c390696a2c5bb86e115a485d1c089776d437d828-v20-2] ttps://lists.apache.org/thread/vdmwnh6f05fnj9ddz93t70f9gy00ys0b. {{cite web}}: Missing or empty |title= (help)

[wikidata-64a4ad5f41dcd4e1ca16a8b9d1def0d2ed0dcf0e-v20-3] ttps://marc.info/?l=spamassassin-announce&m=175656347700657&w=2. {{cite web}}: Missing or empty |title= (help)

[oreilly-spam-4] Schwartz, Alan (July 2004). SpamAssassin (1st ed.). O'Reilly Media. p. 3-5. ISBN 978-0-596-00707-2.

[linux-journal-2020-5] Davies, Mark (15 March 2020). "Spam Filtering in 2020: SpamAssassin Still Leads". Linux Journal. Retrieved 23 August 2023.

[prehistory-6] "SpamAssassin Prehistory". Apache Foundation. Retrieved 19 December 2018.

[mason-interview-7] Holwerda, Thom (22 September 2003). "Interview with Justin Mason: The Origins of SpamAssassin". OSNews. Retrieved 23 August 2023.

[sf-initial-8] "SpamAssassin Initial Release". SourceForge. Retrieved 23 August 2023.

[lwn-2002-9] "SpamAssassin 2.0 released". LWN.net. 23 October 2002. Retrieved 23 August 2023.

[sa-2.50-10] "SpamAssassin 2.50 Released". SpamAssassin. 25 February 2003. Archived from the original on 1 March 2003. Retrieved 23 August 2023.

[email-auth-book-11] Levine, John (2018). Email Authentication: What It Is and Why It Matters. Freepress. pp. 89–92. ISBN 978-1983433337.

[incubator-12] "SpamAssassin Project Incubation Status". Apache Foundation. Retrieved 19 December 2018.

[apache-transition-13] Lettice, John (15 July 2004). "SpamAssassin Joins Apache". The Register. Retrieved 23 August 2023.

[graduation-14] "Board Report for SpamAssassin". Apache Foundation. 15 December 2004. Retrieved 23 August 2023.

[sa-docs-tests-15] "SpamAssassin Tests Performed". Apache SpamAssassin. Retrieved 23 August 2023.

[ceas-2004-16] Mason, Justin (30 July 2004). SpamAssassin: A Practical Approach to Achieving Respectable Accuracy (PDF). First Conference on Email and Anti-Spam (CEAS). Mountain View, CA.

[packt-guide-17] McDonald, Alistair (27 September 2004). SpamAssassin: A Practical Guide to Integration and Configuration (1st ed.). Packt Publishing. pp. 45–67. ISBN 978-1-904811-12-1.

[deployment-guide-18] "Best Practices for SpamAssassin Deployment". Postfix: The Definitive Guide. O'Reilly. Retrieved 23 August 2023.

[spamd-arch-19] "SpamAssassin Daemon Architecture". Apache SpamAssassin Wiki. Retrieved 23 August 2023.

[outlook-integration-20] "SpamAssassin for Outlook". JAM Software. Retrieved 23 August 2023.

[graham-plan-21] Graham, Paul (August 2002). "A Plan for Spam". Retrieved 23 August 2023.

[sa-bayes-22] "SpamAssassin Bayesian Classification". Apache SpamAssassin Wiki. Retrieved 23 August 2023.

[robinson-spam-23] Robinson, Gary (1 March 2003). "A Statistical Approach to the Spam Problem". Linux Journal. Retrieved 23 August 2023.

[network-tests-24] Wolfe, Paul (2016). Combating Spam and Viruses. CRC Press. pp. 123–145. ISBN 978-1498749732.

[auth-methods-25] Durumeric, Zakir; Adrian, David (2019). "Email Authentication Mechanisms: DMARC, SPF and DKIM". Journal of Computer Security. 27 (2): 179–202. doi:10.3233/JCS-181144 (inactive 29 May 2025).{{cite journal}}: CS1 maint: DOI inactive as of May 2025 (link)

[config-guide-26] "SpamAssassin Configuration Guide". Apache SpamAssassin. Retrieved 23 August 2023.

[sa-update-announce-27] "Announcing sa-update". Apache SpamAssassin. 10 May 2006. Retrieved 23 August 2023.

[update-channels-28] "SpamAssassin Update Channels". Apache SpamAssassin Wiki. Retrieved 23 August 2023.

[sa-compile-perf-29] "SpamAssassin Performance Tuning". Apache SpamAssassin Wiki. Retrieved 23 August 2023.

[performance-study-30] Thompson, Sarah; Kumar, Raj (15 June 2018). Performance Analysis of Open Source Anti-Spam Systems. Annual IT Security Conference. pp. 234–241.

[deployment-survey-31] "2023 Email Security Survey Results". Email Security Initiative. 12 April 2023. Retrieved 23 August 2023.

[commercial-adoption-32] Firstbrook, Peter (30 November 2022). "Open Source in Commercial Email Security". Gartner Research. Retrieved 23 August 2023.

[limitations-analysis-33] Chen, Wei (2021). "Comparative Analysis of Anti-Spam Technologies". Network Security. 2021 (3): 12–18. doi:10.1016/S1353-4858(21)00028-3.

[antispam-comparison-34] "Anti-Spam Software Comparison 2023". AV-TEST Institute. 20 July 2023. Retrieved 23 August 2023.

[community-stats-35] "Apache SpamAssassin Project Statistics". Apache Projects. Retrieved 23 August 2023.

[apache-governance-36] "How the ASF Works". Apache Software Foundation. Retrieved 23 August 2023.

[4]

[5]

[1]

[2]

[3]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]