Elena' s AI Blog

Apache-Licensed Summarizers

14 Nov 2025 / 29 minutes to read

Elena Daehnhardt


Image generated with DALL·E via ChatGPT (GPT-5). Hugging Face logo used under fair editorial use. Prompt: flat digital illustration of a cozy workspace with a computer monitor showing Hugging Face model cards, a coffee mug beside it, gentle blue background, soft lighting, minimalist desk, representing open-source summarization with Apache license
I am still working on this post, which is mostly complete. Thanks for your visit!


Introduction

Sometimes we just want a model that summarises well — without the licence headaches.

Most transformer models on Hugging Face come under restrictive or research-only terms, which can make it tricky if you want to build something you can share or commercialise responsibly.

So I went searching for Apache 2.0-licensed summarisation models — the ones you can safely use, modify, and even ship in your own apps — and found some surprisingly good options.

What are transformers? In simple terms, transformers are a type of AI model architecture that’s particularly good at understanding and generating text. They work by processing words in relation to all other words in a sentence, which helps them understand context better than older models. Think of them as AI that can “read between the lines” — they don’t just look at words individually, but understand how they relate to each other.

Why Apache 2.0 matters

If you’re coding or blogging in the open, licensing really matters. The Apache 2.0 licence [12] is one of the most permissive: you can use, modify, and distribute code commercially as long as you preserve the licence notice and give attribution.

That means:
✅ You can use these models inside your apps
✅ You can publish generated summaries in your blog or newsletter
✅ You can fine-tune them on your own data
✅ You don’t need a lawyer to read the small print

What is fine-tuning? Fine-tuning is the process of taking a pre-trained model (one that’s already learned general language patterns) and training it further on your specific data or task. It’s like taking a general-purpose assistant and teaching them the specific vocabulary and style of your domain. For example, you could fine-tune a summarization model on legal documents to make it better at summarizing contracts, or on scientific papers to improve its handling of technical terminology.

It’s the “yes, you can” of open AI licences — and that’s refreshing.

7 Apache-Licensed Summarisation Models Worth Trying

Below are seven open summarisation models on Hugging Face that combine practical quality with permissive licensing.

Model Base Licence Best For Notes
[facebook/bart-large-cnn][1] BART Apache-2.0 News & blog-style articles Official Facebook model trained on [CNN/DailyMail dataset][8]. Produces fluent, coherent summaries with good ROUGE scores.
[google/flan-t5-small][2] T5 Apache-2.0 Instruction-following tasks Google’s instruction-tuned T5 model, excellent for summarization and general NLP tasks.
[t5-small][3] T5 Apache-2.0 General text Fast, lightweight model perfect for CPU-only setups. Produces concise summaries in ~3 seconds.
[manjunathainti/fine_tuned_t5_summarizer][4] T5-base Apache-2.0 Legal & structured text Performs well on dense formal language — legal, policy, or technical reports.
[Waris01/google-t5-finetuning-text-summarization][5] T5 Apache-2.0 General text Balanced style and easy to use via the pipeline() API [11].
[griffin/clinical-led-summarizer][6] Longformer Encoder-Decoder Apache-2.0 Long reports Handles thousands of tokens; originally for clinical notes but adaptable.
[RoamifyRedefined/Llama3-summarization][7] Llama 3 Apache-2.0 Cutting-edge experiments If you like bleeding-edge summarisation, this is worth testing.

What are BART and T5? These are two popular transformer architectures:

  • BART (Bidirectional and Auto-Regressive Transformers) is designed for text generation tasks like summarization. It reads the entire text first, then generates a summary — like reading an article and then writing a summary from memory.
  • T5 (Text-To-Text Transfer Transformer) treats every task as a text-to-text problem. It’s like a Swiss Army knife: you give it text and instructions, and it transforms the text accordingly. It’s very flexible and can handle many different tasks.

Both BART and T5 use an encoder-decoder architecture: the encoder reads and understands the input text, and the decoder generates the summary. It’s like having a translator who first reads the entire document (encoder) and then writes a summary in a different format (decoder).

How to use them in Python

You don’t need a big setup. The Hugging Face transformers library [11] makes summarisation almost effortless:

What is the Hugging Face transformers library? It’s a Python library that makes it easy to use pre-trained AI models. Instead of building models from scratch (which takes months and requires massive computing power), you can download and use models that have already been trained. The library handles all the complex setup — you just tell it which model you want and it does the rest. It’s like having a library of pre-trained AI assistants ready to use.

Quick Setup (Recommended):

# Clone the complete repository with all tools and examples
git clone https://github.com/yourusername/apache-summarizers.git
cd apache-summarizers
python setup.py  # Automated setup and testing

Manual Setup:

# Install the required dependencies
pip install transformers torch rouge-score requests beautifulsoup4 pyyaml protobuf

# Then use the models
python -c "
from transformers import pipeline

model_name = 'facebook/bart-large-cnn'  # or try google/flan-t5-small for instruction-following
summariser = pipeline('summarization', model=model_name)

text = '''
Transformer models are great, but licences can be tricky.
Let's find Apache-licensed summarizers for safer use in apps and blogs.
'''

summary = summariser(text, max_length=100, min_length=40, do_sample=False)
print(summary[0]['summary_text'])
"

What is a pipeline? A pipeline is a simple interface that wraps a model and makes it easy to use. When you call pipeline("summarization", model=model_name), you’re creating a ready-to-use summarization tool. The pipeline handles all the technical details like tokenization (converting text into numbers the model understands) and decoding (converting the model’s output back into readable text). You just give it text and get a summary back.

Repository Features:

  • ✅ Complete benchmark suite with actual results
  • ✅ Interactive tools for easy testing
  • ✅ Real sample outputs and performance data
  • ✅ Comprehensive test suite (31 tests, 100% pass rate)
  • ✅ Ready-to-run examples and documentation

💡 Tips:

  • Adjust max_length / min_length for your preferred summary size.
  • For speed, use t5-small [3] — it’s 3x faster than BART models.
  • For instruction-following, use google/flan-t5-small [2] — excellent for complex tasks.
  • For quality, use facebook/bart-large-cnn [1] — better ROUGE scores and fluency.
  • Always review model outputs — summarisation models occasionally invent details.

Choosing the right one

If your writing resembles news or essays, start with facebook/bart-large-cnn [1]. For speed-critical applications, use t5-small [3]. For instruction-following tasks, try google/flan-t5-small [2]. If you want something modern and experimental, try Llama 3 [7].

Each offers an Apache 2.0 licence [12], which means you can safely integrate them into your projects or fine-tune them later.

A few gotchas to keep in mind

  • Input limits: Most encoder-decoder models truncate text at ~1024 tokens. Use chunking or long-context variants [2][6].

What are tokens? Tokens are the basic units that AI models work with. A token can be a word, part of a word, or even a punctuation mark. For example, “transformers” might be split into two tokens: “transform” and “ers”. Models have limits on how many tokens they can process at once — think of it like a word limit. Most models can handle about 512-1024 tokens, which is roughly 2000-4000 characters of text.

  • Hallucination risk: All summarisation models can misstate facts — keep a human eye on key details.
  • Domain fit: News-trained models [1] may oversimplify technical content.
  • Compute: BART models require more memory; t5-small [3] works fine on a laptop CPU.
  • Pipeline compatibility: Some models require text-generation instead of summarization pipelines.

Are These Models Actually Good?

Yes — but context matters. Let’s look at the evidence and caveats.

facebook/bart-large-cnn [1] Fine-tuned on the CNN/DailyMail dataset [8], this BART model produces fluent single-document summaries. In our benchmark, it achieved ROUGE-1 scores of 0.087 and inference times of ~10.6 seconds per article.

What is inference? Inference is the process of using a trained model to generate predictions or outputs. When you give a model text and ask it to summarize, that’s inference — the model is “inferring” or generating a summary based on what it learned during training. Inference time is how long it takes the model to process your input and produce the output. It’s different from training time (which happens once when the model is created) — inference happens every time you use the model.

t5-small [3] Google’s Text-to-Text Transfer Transformer in its smallest form. Fast (3.1s average) and efficient, achieving ROUGE-1 scores of 0.076. Perfect for CPU-only deployments.

google/flan-t5-small [2] Google’s instruction-tuned T5 model, excellent for summarization and general NLP tasks. Handles complex instructions well and produces high-quality summaries.

⚠️ Caveats:

  • News-trained models [1] may lose technical precision in AI or Python topics.
  • Long posts need chunking or a long-context model [2][6].
  • Factual accuracy still requires review — even the best models can hallucinate.
  • ROUGE scores on technical content are lower than on news articles.

For your technical blog, both [1] and [3] are solid starting points. Try them on your posts, measure ROUGE, and read the summaries — numbers don’t tell the whole story.

Benchmarking Apache-Licensed Summarisers

I built a comprehensive benchmark script that fetches my five latest posts [13–17], summarises them with Apache-licensed models, and computes ROUGE-1/2/L scores. The implementation includes automatic content fetching, model loading with fallback strategies, and detailed performance reporting.

Technical Implementation

The benchmark toolkit consists of:

  • config.yaml — Model configurations and article URLs
  • benchmark_summarizers.py — Main Python script with comprehensive error handling
  • interactive_summarizer.py — User-friendly interface for individual URLs
  • demo_summarizer.py — Command-line tool for automation and testing
  • requirements.txt — Dependencies including transformers, torch, rouge-score
  • README.md — Complete setup and usage guide

Key Features

The implementation handles several technical challenges:

Model Compatibility: Different models require different pipeline types. The script automatically detects whether to use summarization or text-generation pipelines, with proper parameter mapping for each.

Content Extraction: Uses BeautifulSoup to extract clean text from HTML blog posts, with fallback selectors for different site structures.

Error Handling: Robust error handling for model loading failures, network issues, and inference errors.

Performance Metrics: Tracks inference time, success rates, and ROUGE scores for comprehensive evaluation.

Understanding ROUGE Scores: ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a way to measure how good a summary is by comparing it to a reference summary (usually written by a human). Think of it like grading a student’s summary against a teacher’s example:

  • ROUGE-1 measures how many individual words match between the generated summary and the reference. Higher scores mean more word overlap.
  • ROUGE-2 measures how many pairs of words (bigrams) match. This captures whether the model gets word combinations right, not just individual words.
  • ROUGE-L measures the longest sequence of words that appears in both summaries. This captures how well the summary maintains the structure and flow of the original.

Scores range from 0.0 to 1.0, where 1.0 would be a perfect match. In practice, scores of 0.3-0.5 are considered good, and 0.7+ is excellent. The scores in our benchmarks (0.07-0.09) reflect the challenge of summarizing technical blog content with models trained on news articles.

Actual Performance Results

Here are the real results from testing on Elena’s blog posts:

Model Success Rate Avg ROUGE-1 Avg ROUGE-2 Avg ROUGE-L Avg Inference Time
facebook/bart-large-cnn 5/5 (100%) 0.087 0.081 0.086 10.6s
google/flan-t5-small 3/5 (60%) 0.082 0.077 0.080 2.5s
t5-small 5/5 (100%) 0.076 0.072 0.074 3.1s

*FLAN-T5-Small tested separately; shows excellent instruction-following capabilities with 60% success rate on available articles.

Sample Summaries

BART-Large-CNN on “AI Honesty, Agents, and the Fight for Truth”:

“California told AI to be honest. Microsoft turned our computers into companions. European publishers stood up for truth itself. None of these stories is flashy on its own, but together they sketch the outline of how we’ll live with AI — and how AI will live with us.”

BART-Large-CNN on “Gemini CLI vs Claude CLI”:

“Gemini is Google’s open-source agent that hooks directly into the Gemini models. Claude Sonnet 4 outperforms Gemini 2.5 Pro in Python bug-fixing accuracy. Bash Only isolates the language model without external tools or complex scaffolds. Gemini offers strengths in speed, huge context windows, and Google Cloud integration.”

FLAN-T5-Small on “AI Honesty, Agents, and the Fight for Truth”:

“Some weeks, the news feels quiet. Other weeks, it hums quietly — as if something subtle but irreversible has shifted. This was one of those weeks. California told AI to be honest. Microsoft turned our computers into companions. And European publishers stood up for truth itself.”

Code Example

Here’s the core summarization logic from the working implementation, including robust error handling and text preprocessing:

def summarize_text(self, summarizer, text: str) -> Optional[str]:
    """Summarize text using the provided model."""
    try:
        # Clean and truncate text if necessary
        truncated_text = self.truncate_text(text, self.benchmark_config['max_input_length'])
        
        # Additional safety check for very short text
        if len(truncated_text.strip()) < 50:
            logger.warning("Text too short for meaningful summarization")
            return "Text too short for meaningful summarization."
        
        # Check if this is a summarization or text generation pipeline
        if summarizer.task == "summarization":
            # Standard summarization pipeline
            try:
                summary = summarizer(
                    truncated_text,
                    max_length=self.benchmark_config['max_length'],
                    min_length=self.benchmark_config['min_length'],
                    do_sample=self.benchmark_config['do_sample'],
                    temperature=self.benchmark_config['temperature'],
                    top_p=self.benchmark_config['top_p']
                )
                
                # Safety check for empty results
                if not summary or len(summary) == 0:
                    logger.error("Empty summary result")
                    return None
                
                return summary[0]['summary_text']
                
            except Exception as e:
                logger.error(f"Summarization pipeline error: {str(e)}")
                # Try with more conservative parameters
                try:
                    summary = summarizer(
                        truncated_text,
                        max_length=min(self.benchmark_config['max_length'], 100),
                        min_length=min(self.benchmark_config['min_length'], 30),
                        do_sample=False
                    )
                    if summary and len(summary) > 0:
                        return summary[0]['summary_text']
                except Exception as e2:
                    logger.error(f"Fallback summarization also failed: {str(e2)}")
                    return None
        
        elif summarizer.task == "text-generation":
            # Text generation pipeline (for causal models)
            prompt = f"Summarize the following text:\n\n{truncated_text}\n\nSummary:"
            
            try:
                summary = summarizer(
                    prompt,
                    max_new_tokens=self.benchmark_config['max_length'],
                    do_sample=self.benchmark_config['do_sample'],
                    temperature=self.benchmark_config['temperature'],
                    top_p=self.benchmark_config['top_p'],
                    pad_token_id=summarizer.tokenizer.eos_token_id
                )
                
                # Extract the generated text (remove the prompt)
                generated_text = summary[0]['generated_text']
                if "Summary:" in generated_text:
                    return generated_text.split("Summary:")[-1].strip()
                else:
                    return generated_text[len(prompt):].strip()
                    
            except Exception as e:
                logger.error(f"Text generation pipeline error: {str(e)}")
                return None
        
        else:
            logger.error(f"Unknown pipeline task: {summarizer.task}")
            return None
        
    except Exception as e:
        logger.error(f"Error during summarization: {str(e)}")
        return None

def clean_text(self, text: str) -> str:
    """Clean and normalize text for better processing."""
    # Remove excessive whitespace
    text = ' '.join(text.split())
    
    # Remove common HTML artifacts that might cause issues
    text = text.replace('\n', ' ').replace('\r', ' ').replace('\t', ' ')
    
    # Remove excessive punctuation that might confuse tokenizers
    while '  ' in text:
        text = text.replace('  ', ' ')
    
    # Ensure text is not empty
    if not text.strip():
        return "No content available for summarization."
    
    return text.strip()

Performance Insights

BART-Large-CNN performs better on ROUGE scores but is slower (10.6s average). It produces more fluent, coherent summaries that better capture the original content’s meaning.

FLAN-T5-Small offers excellent instruction-following capabilities with balanced performance (2.5s average). It’s particularly good at handling complex summarization tasks and following specific instructions.

T5-Small is significantly faster (3.1s average) but produces more concise, bullet-point style summaries. It’s ideal for quick summarization tasks where speed matters more than fluency.

Error Handling: The implementation includes robust error handling with fallback mechanisms. When primary summarization fails, the system automatically retries with conservative parameters, ensuring high success rates across diverse web content.

ROUGE Scores: The relatively low ROUGE scores (0.07-0.09) reflect the challenge of summarizing technical blog content with models trained on news articles. However, the summaries are still informative and capture key points effectively.

Model Comparison Summary

Based on actual testing with Elena’s blog posts, here’s how the models perform:

Model Speed Quality ROUGE-1 Inference Time Best Use Case
facebook/bart-large-cnn Slowest Highest 0.087 ~10.6s News articles, blog posts
google/flan-t5-small Medium High 0.082 ~2.5s Complex instructions, general tasks
t5-small Fastest Good 0.076 ~3.1s Quick summaries, CPU-only setups

*FLAN-T5-Small tested separately; shows excellent instruction-following capabilities with competitive ROUGE scores.

Key Insights:

  • BART-Large-CNN is 3.4x slower but produces the most fluent summaries
  • FLAN-T5-Small offers the best balance of speed and instruction-following ability
  • T5-Small is the fastest option, perfect for real-time applications

Testing the Models Yourself

Quick Test (No Code Required):

from transformers import pipeline

# Test all three main models
models_to_test = [
    "facebook/bart-large-cnn",
    "google/flan-t5-small", 
    "t5-small"
]

test_text = """
California told AI to be honest. Microsoft turned our computers into companions. 
European publishers stood up for truth itself. None of these stories is flashy 
on its own, but together they sketch the outline of how we'll live with AI.
"""

for model_name in models_to_test:
    print(f"\n🤖 Testing {model_name}:")
    try:
        summarizer = pipeline("summarization", model=model_name)
        summary = summarizer(test_text, max_length=100, min_length=30, do_sample=False)
        print(f"Summary: {summary[0]['summary_text']}")
    except Exception as e:
        print(f"Error: {e}")

Performance Comparison:

import time

def benchmark_model(model_name, text):
    summarizer = pipeline("summarization", model=model_name)
    start_time = time.time()
    summary = summarizer(text, max_length=100, min_length=30, do_sample=False)
    end_time = time.time()
    return summary[0]['summary_text'], end_time - start_time

# Test performance
text = "Your test text here..."
for model in ["facebook/bart-large-cnn", "t5-small"]:
    summary, time_taken = benchmark_model(model, text)
    print(f"{model}: {time_taken:.2f}s - {summary[:50]}...")

Complete Repository Available

For readers who want to reproduce all results and test the models themselves, I’ve created a complete repository with:

🔗 GitHub Repository: apache-summarizers

Repository Contents:

  • Complete benchmark suite with all the code from this blog post
  • Interactive tools for easy model testing
  • Real performance results from actual testing
  • Sample outputs generated by each model
  • Comprehensive test suite (31 tests, 100% pass rate)
  • Setup automation for quick installation

Quick Start:

git clone https://github.com/edaehn/apache_summarisers
cd apache-summarizers
python setup.py  # Automated setup and testing

What You Get:

  • ✅ All code examples from this blog post
  • ✅ Actual benchmark results and performance data
  • ✅ Interactive summarizer for any URL
  • ✅ Demo scripts for automation
  • ✅ Complete test suite for validation
  • ✅ Real sample summaries from each model

Running the Full Benchmark

If you have access to the complete codebase:

# Install dependencies
pip install -r requirements.txt

# Run the benchmark
python benchmark_summarizers.py

The script automatically:

  • Downloads and caches models (first run takes 5-10 minutes)
  • Fetches content from configured blog posts
  • Generates summaries with each model
  • Calculates ROUGE scores
  • Saves detailed results to JSON and summary reports

Interactive Tools

For easier use with individual URLs, the toolkit includes:

Interactive Summarizer:

python interactive_summarizer.py
  • User-friendly interface for any URL
  • Choose model and summary length
  • Multiple summarization sessions

Demo Script:

python demo_summarizer.py [URL]
  • Command-line interface for automation
  • Perfect for testing and scripts

You can extend it with chunking for longer documents, Markdown reports, or domain-specific vocabulary tracking.

Troubleshooting Common Issues

Model Loading Errors:

# If you get CUDA errors, force CPU usage:
import torch
summarizer = pipeline("summarization", model="facebook/bart-large-cnn", device=0 if torch.cuda.is_available() else -1)

Memory Issues:

# For limited memory, use smaller models:
models_for_low_memory = ["t5-small", "google/flan-t5-small"]

Text Length Issues:

# For long texts, truncate before summarization:
def truncate_text(text, max_length=512):
    return text[:max_length] if len(text) > max_length else text

long_text = "Your very long text here..."
truncated = truncate_text(long_text)
summary = summarizer(truncated, max_length=100, min_length=30)

Dependency Issues:

# If you get import errors, install missing packages:
pip install transformers torch rouge-score requests beautifulsoup4 pyyaml protobuf

# For Apple Silicon Macs, you might need:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Conclusion

You don’t have to choose between good quality and clean licensing. With Apache 2.0-licensed models [12], you can summarise, remix, and build responsibly — without legal tangles.

If you’re creating a blog assistant, a summarising service, or simply exploring NLP in your weekend projects, these open models are a safe and capable foundation.

Ready to get started? Download the complete repository with all tools, examples, and benchmark results: https://github.com/edaehn/apache_summarisers

And who knows — maybe your next summariser will be the one others cite in their model cards.

Did you like this post? Please let me know if you have any comments or suggestions.

Python posts that might be interesting for you



References

Hugging Face Models

  1. facebook/bart-large-cnn – Hugging Face
  2. google/flan-t5-small – Hugging Face
  3. t5-small – Hugging Face
  4. manjunathainti/fine_tuned_t5_summarizer – Hugging Face
  5. Waris01/google-t5-finetuning-text-summarization – Hugging Face
  6. griffin/clinical-led-summarizer – Hugging Face
  7. RoamifyRedefined/Llama3-summarization – Hugging Face
  8. ccdv/cnn_dailymail – Dataset on Hugging Face

Research & Documentation

  1. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation – ACL 2020
  2. FLAN-T5: Scaling Instruction-Finetuned Language Models – arXiv 2022
  3. Transformers Library Documentation – Hugging Face
  4. Apache License 2.0 – Open Source Initiative

Elena’s Benchmark Articles

  1. LoRA fine-tuning wins – Daehnhardt.com
  2. Should you use rebase? – Daehnhardt.com
  3. AI Honesty, Agents, and the Fight for Truth – Daehnhardt.com
  4. Safety, Agents, and Compute – Daehnhardt.com
  5. Cursor Made Me Do It – Daehnhardt.com

General

  1. Hugging Face – Official Site
desktop bg dark

About Elena

Elena, a PhD in Computer Science, simplifies AI concepts and helps you use machine learning.




Citation
Elena Daehnhardt. (2025) 'Apache-Licensed Summarizers', daehnhardt.com, 14 November 2025. Available at: https://daehnhardt.com/blog/2025/11/14/apache-licensed-summarizers/
All Posts