refactoringengineeringprinciples

Refactoring: rereading the canon

A re-reading of Fowler, Martin, and the Gang of Four — what each book was reacting to, what holds up after twenty years, and where the cargo cult has drifted from the source.

May 18, 2026 · 10 min read

There is no shortage of blog posts about refactoring. Most are listicles — ten code smells, five SOLID letters, three rules from "Clean Code." They are not wrong, exactly. But they are also not what the people who wrote those books were arguing for.

This post is a re-reading. The goal is not "here is what to do." The goal is to look at where each of these ideas came from, what their authors were reacting against, where they hold up after twenty years, and where the cargo cult version has drifted from the source.

A short genealogy

Refactoring as a named practice starts with William Opdyke's 1992 PhD dissertation at the University of Illinois. Opdyke was working on tooling for C++ and codified behavior-preserving transformations on object-oriented programs. Kent Beck and Ward Cunningham had been doing the same kind of thing in Smalltalk for years; the term "refactoring" comes from that scene.

Martin Fowler's "Refactoring" (1999, 2nd ed 2018) put a name and a catalog on the practice for a mainstream audience. The book is structured as a reference: a list of named transformations, each with a motivation, a mechanics section, and an example.

Robert C. Martin's "Clean Code" (2008) is something different. Where Fowler is a catalog of transformations, Martin is a prescription of style. The distinction matters: one is a tool, the other is an aesthetic.

The Gang of Four's "Design Patterns" (1994) predates both, but it sits adjacent. It catalogs structures that recur in object-oriented programs — typically structures that exist to work around limitations in the language being used.

These four books form the canon. Most contemporary refactoring discourse is downstream of them, often without acknowledgment.

SOLID, letter by letter

Robert Martin compiled SOLID in the early 2000s. The letters did not all originate with him — LSP comes from Barbara Liskov, OCP from Bertrand Meyer — but the synthesis is his.

S — Single Responsibility

The original phrasing was: a class should have one reason to change. That is a precise claim — changes to the class's reason for existence should come from one place in the business. In a later refinement ("Clean Architecture," 2017), Martin reframed it as one actor: one stakeholder, one role, one source of change requests.

The cargo cult version is "a class should do one thing." This is too vague to be useful. One thing at what level? A function that parses a CSV does one thing; a class that orchestrates an entire HTTP request handler also does one thing.

In practice the principle is most useful as a check against accidental fan-in. If a single class has methods called by three different teams for three different reasons, that is the smell — not the line count.

O — Open/Closed

Meyer's original (in "Object-Oriented Software Construction," 1988) was about inheritance: you should be able to extend a class without modifying it. Martin reframed it via polymorphism — depend on abstractions, swap implementations.

This is where the cargo cult most actively harms working programmers. Over-applying OCP leads to speculative generality: abstract base classes and strategy hierarchies for variation that never materializes. The cost of a wrong abstraction is, as Sandi Metz puts it, far higher than the cost of duplication.

A modern reading: write the concrete code first. When the second use case arrives, refactor then. OCP is a property to aim for once the axes of variation are visible — not a prophylactic to apply up front.

L — Liskov Substitution

LSP is the most rigorous of the five. Liskov's 1987 keynote and her 1994 paper with Jeannette Wing define behavioral subtyping in formal terms: a subtype must preserve the supertype's preconditions (no stronger), postconditions (no weaker), and invariants.

In day-to-day code this collapses to: if your subclass overrides a method in a way that breaks the caller's assumptions, your inheritance is wrong. The classic example is Square extends Rectangle — setting width on a Square has side effects a Rectangle caller does not expect.

This principle survives without nuance. If you obey it, your inheritance hierarchies stay sane.

I — Interface Segregation

Martin's ISP was a reaction to fat C++ interfaces. Do not force a client to depend on methods it does not use.

In modern statically-typed languages with structural typing (Go, TypeScript) or rich generics, ISP often emerges naturally — interfaces tend to be small because they describe the minimum a consumer needs. In languages that lean heavily on nominal inheritance hierarchies, the principle still bites.

D — Dependency Inversion

DIP says high-level modules should not depend on low-level modules; both should depend on abstractions. This is not the same as dependency injection. DI is one common implementation; the principle is broader.

The most common misreading is "introduce an interface for everything you might want to mock." That is testability, not DIP. The real test of DIP is whether the high-level policy module compiles and reasons correctly without knowing about the low-level mechanism.

"Clean Code": what survives, what does not

"Clean Code" is the book most working programmers have read. It is also the book most working programmers, after ten years, mostly disagree with.

The enduring parts:

Names matter, a lot. Variables, functions, classes. The single highest-leverage thing you can do to a codebase is rename things to mean what they actually do.
Functions should do one thing at one level of abstraction. This is a real and useful constraint, even when the "one thing" is fuzzy.
Tests are first-class code. They get the same care, naming, and structure as production code.

The contested parts:

"Functions should be four lines." This is the part Martin took the most heat for, deservedly. It produces codebases of tiny single-call functions that destroy locality of reasoning — you have to chase across ten files to understand a single behavior. Casey Muratori's "Clean Code, Horrible Performance" talk is the standard counterargument from the performance side; DHH's critiques are from the readability side.
"Comments are failures." Sometimes. But there is no rename that explains why a piece of code does what it does. Comments that document intent are not failures.
The rigid OOP framing. Much of the book assumes a Java-shaped world where everything is a class. The advice does not transfer cleanly to functional, dynamic, or systems-level code.

Read "Clean Code" the way you would read a 2008 book on web development: take the durable principles, discard the era-specific orthodoxy.

Design patterns: which ones still earn their keep

The GoF book documents 23 patterns. About a third of them still come up in modern code. The rest were workarounds for what specific languages could not do at the time.

Patterns that survive:

Strategy. Still common, though in languages with first-class functions it collapses to "pass a function." In TypeScript, Strategy is often just a function-typed parameter.
Observer. Alive and well as event emitters, pub/sub, RxJS, React state subscriptions. The pattern outlasted the OOP framing.
Decorator. Middleware in HTTP frameworks (Express, Koa), higher-order components, function wrappers. Still useful when behavior composes.
Adapter. Universal. Any time two systems with mismatched interfaces need to talk, you write an adapter. The pattern does not need a class to be a pattern.

Patterns that have aged poorly:

Singleton. Almost always an anti-pattern in modern code. Hides dependencies, breaks testability, replaces explicit scope with implicit global state.
Abstract Factory. Overkill in most languages with first-class types and DI containers. The naked Factory pattern survives in narrow forms; AbstractFactoryFactoryBean does not.
Template Method. Usually loses to Strategy plus composition. Composition over inheritance, two decades later, still holds.

The pattern catalog is a reading list, not a checklist. Knowing the names helps you communicate with other engineers — "this is basically a Visitor." It does not mean every problem must be solved by reaching for one.

Code smells: heuristics, not rules

Kent Beck coined "code smell" as an explicit hedge. A smell is a hint that something might be wrong, not a finding. Fowler's catalog lists the famous ones: long method, large class, duplicate code, feature envy, primitive obsession, shotgun surgery.

The cargo cult applies them like compiler errors. Method over 20 lines? Refactor. Class over 200 lines? Split. This is missing the point.

A useful reframe: each smell is a hypothesis about cost.

Long method hypothesizes that maintenance will be cheaper if the method is split. Sometimes true. Sometimes the linear, top-to-bottom long method is easier to read than five short functions you have to jump between.
Duplicate code hypothesizes that extracting the duplication will reduce future change cost. Metz's counterpoint: a wrong abstraction is more expensive than duplication. Wait for the third occurrence before extracting.
Large class hypothesizes that the class has multiple responsibilities. Sometimes true. Sometimes the class is the right size because the domain concept it represents is that large.
Primitive obsession — passing strings and ints where a domain type would be clearer — is often a real smell. The fix (value objects, branded types, newtypes) is one of the highest-leverage refactorings in a typed codebase.

Smells are useful when they prompt the question "is this actually costing us?" — and dangerous when they answer it preemptively.

The refactoring techniques themselves

Fowler's catalog has dozens of named transformations. A handful do most of the work.

Extract Method

The most universally beneficial refactoring. Pulling a coherent block of code out into a named function gives you two things at once: a name (documentation), and a unit of reuse.

The caveat: the function you extract should be independently nameable. If the best name you can come up with is step2 or doIt, the block is not coherent — you are slicing arbitrarily. In that case the long inline code is more honest.

Extract Class

Extract Class is Extract Method's bigger sibling, and a riskier operation. You are not just naming a block; you are committing to a new boundary in the system. Get this wrong and you create a class that exists to satisfy a refactoring rule, not to represent a domain concept.

A heuristic: if you can describe the new class's purpose in one sentence using domain language ("InvoicePricer computes line totals for an invoice given a tax policy"), extract. If you find yourself saying "it is a helper that does some of the stuff in the original class," don't.

Rename

The most undervalued refactoring. data, info, obj, temp, result — these names give you nothing. A rename is a 30-second change with permanent positive return.

Two rules of thumb:

Variables should be named for what they mean in the domain, not for their type. userId over id. pendingInvoices over list.
Booleans should read as a question. isAdmin, hasFlag, shouldRetry. The line if (admin) and the line if (isAdmin) read differently in a way that compounds across a codebase.

Remove dead code

Dead code is more expensive than it looks. It costs reading time for every engineer who encounters it; it costs cognitive load to ask "is this still used?"; it confuses static analysis, search, and refactoring tools.

If your VCS will remember it (and it will), delete it. The "I will leave it commented out in case we need it" instinct is residue from a pre-Git era.

Simplify conditional expressions

Two specific moves carry most of the value:

Guard clauses. Replace nested if/else chains with early returns. if (!user) return null; collapses three levels of nesting at the top of a function.
Decompose conditional. if (isWeekend(date) && isBusinessHours(date)) reads better than if ((d.getDay() === 0 || d.getDay() === 6) && d.getHours() >= 9 && d.getHours() < 17).

The deeper move is to replace runtime conditionals with types — sum types, discriminated unions, exhaustive switches in a checker-aware language. That is not a refactoring; it is a redesign. But it is frequently the right answer when you find yourself simplifying the same conditional repeatedly.

Replace magic numbers with constants

The cargo cult version: every literal gets a named constant. That gives you const ZERO = 0; and similar absurdities.

The useful form: replace numbers that encode domain meaning with constants. MAX_RETRIES = 5 over a bare 5 in retry logic. SECONDS_PER_DAY = 86400 over the literal. Tax rates, port numbers, magic offset bytes — all benefit from a name.

Numbers that are mathematically meaningful in context (the 2 in x * 2, the 0 in a loop initializer) gain nothing from a name and lose readability.

Improve error handling

The biggest single readability win in error handling is stop using exceptions for control flow. Result types (Result<T, E> in Rust, Either in functional languages, discriminated unions in TypeScript) make failure paths visible at the type level.

For codebases stuck with exceptions: handle at the layer that has enough context to do something meaningful. Catching and re-throwing with no added context is worse than not catching.

How to actually decide

The canon is not a rulebook. It is a body of accumulated experience, written by careful people who were reacting to specific problems in specific contexts. Most of the time, the right move when reading any of it is to ask three questions:

What problem was the author actually solving?
Do I have that problem?
What does applying their fix here cost me?

Refactoring is a discipline of trade-offs, not a list of commandments. The senior engineer's superpower is being able to look at code that violates ten rules and recognize that it is still the right code for its context — and, conversely, to look at a textbook-clean codebase and notice the speculative abstractions weighing it down.

The books are worth reading. Reading them once and pattern-matching against your codebase forever is the trap.