CSSemgrep – Grep but CSS

This really is an April Fools experiment. I parse JavaScript with Babel, turn it into a DOM tree, and search it with CSS selectors.

Think of it as a cheap knock off of Semgrep for searching over code as a tree instead of text. It really is a silly idea and I know it. But it makes me giggle and also... I don’t know... ponder about taking advantage of existing, well-optimized engines rather than rolling our own.

Mini playground

A tiny taste of the nonsense before the full demo.

sample.js
const x = a + b;
eval(userInput);
element.innerHTML = html;
$("#out").html(location.hash.slice(1));
new Function(code);
rule.css
call-expression > callee > identifier[data-name="eval"]

The joke here is this: Semgrep searches code by structure and it’s so much better than grep as a result of it. I wondered — could I build something similar but simpler by leveraging the browser? Specifically, what would happen if I turned a JavaScript AST into a DOM tree and then searched it with CSS selectors? The answer turned out to be, as you would expect, something ridiculous but fun.

I built this on April Fools’ Day and am publishing it late, which feels on brand. We do Fool’s Day wrong — April 1 is for building the absurd thing. A week later is when you show it off to your tens of friends.

Technically I am not just a week late. Years ago, when Semgrep was just getting going, I joked with the founders about this exact kind of nonsense. It was a joke in the way terrible jokes are. But it does show off a core value of searching by structure rather than text. And using a language you already know. Designing a new domain specific language — a DSL — is hard. Designing one people actually want to learn and use is harder. This was my “what if we just abuse CSS?” moment.


On April Fools?

April Fools is a great excuse to build strange little contraptions that make you laugh first and think a beat later. I think the best joke projects are tiny toy models of real problems. You make a silly thing, and then halfway through laughing at it till you realise it... well you know.

That is what I wanted here. What if CSS were a static analysis language? Code really does have structure and searching that structure is useful. Building a language and getting people to adopt that language — the way Semgrep has — has a lot of value but also takes a lot of work.


What Semgrep is doing

Semgrep matches patterns against the structure of code rather than treating everything as glorified text search. That matters because a lot of interesting bugs are not “does this string appear somewhere?” problems. They are shape problems. Semgrep does a lot more and if you don’t know of them, you should check them out. I am just focused on the structured search aspect.

For example, this:

Example: eval()

The interesting part is not the letters. It is the fact that this is a call expression whose callee is the identifier eval.

eval(userInput);
call-expression > callee > identifier[data-name="eval"]

Example: innerHTML

This is interesting because it is an assignment to a particular property, not because the string innerHTML happens to exist somewhere in the file.

element.innerHTML = html;
assignment-expression > left > member-expression > property > identifier[data-name="innerHTML"]

That is the whole move: parse the code, get a tree, and match the tree rather than squinting at raw text.


What this project does

This project parses JavaScript with Babel in the browser, walks the AST, turns each node into a DOM element, and decorates it with attributes like data-name, data-value, and data-operator. At that point the AST has quietly become a DOM tree, which means CSS selectors can go wandering through it looking for trouble.

So instead of inventing a whole new pattern language, we can write selectors like:

call-expression > callee > identifier[data-name="eval"]
assignment-expression > left > member-expression > property > identifier[data-name="innerHTML"]
new-expression > callee > identifier[data-name="Function"]
binary-expression[data-operator="+"]

Example: addition

A tiny example, but useful because it shows operators can become searchable attributes.

const x = a + b;
binary-expression[data-operator="+"]

Example: new Function()

Another case where the structure matters more than plain text search.

new Function(code);
new-expression > callee > identifier[data-name="Function"]

Why CSS works surprisingly well here

CSS selectors already know how to match descendants, direct children, siblings, names, and attributes. That makes them great at querying tree-shaped data. Not only that, the browser ships with an engine for this that is really well optimized.

Example: jQuery html()

Here the selector follows the call shape down to the html property.

$("#out").html(location.hash.slice(1));
call-expression > callee > member-expression > property > identifier[data-name="html"]

Why not invent a new language?

It is very easy to say “just make a DSL.” It is much harder to make one people can learn, remember, debug, and continue using after the novelty wears off. You need syntax, semantics, error messages, good defaults, enough power to be useful, and enough restraint that simple things stay simple.

This project does the opposite: it abuses a language the browser already knows. That programmers already know. Porting programs is hard but much easier than porting human brains. If CSS can get this far, it tells you two things. First, CSS is a more capable tree query language than people give it credit for. Second, tools like Semgrep are solving an actually hard design problem. A DSL is not just syntax but ergonomics, readability, error messages, composability, and the long slow work of making a pattern language feel natural rather than ritualistic.

There is another aspect of CSS that I really wanted to demonstrate but really ran out of time. CSS is “cascading”. That’s one of the “C”s. It means you can compose rules from the system, company, department, individual teams and individuals. If enough of you poke me about it, I’ll demonstrate that too and, at least in my head, that unbuilt thing is awesome!


Shortcomings

This toy is decent at structural matching. It is bad at scope, bindings, data flow, taint tracking, and all the other awkward realities that appear the moment code stops being a tree and starts being a program.

That is fine. This is not trying to replace Semgrep. It is trying to make the core idea visible enough that you can poke it with a stick: parse code into a tree, then search the tree with patterns.


Try the toy

The playground includes examples for addition expressions, eval(), innerHTML assignment, jQuery html(), and new Function(). Use the example dropdown, the rule dropdown, or type your own selector and see what lights up.

Semgrep is the real thing. CSSemgrep is the toy one where the AST turns into a DOM tree and CSS goes hunting through it. I still find that sentence deeply silly, which is a good sign.

Playground

Try the fake CSS-powered grep toy here.