Gulp, Backwards

Posted on Aug 21, 2021 by Alexander Morse.

I’ve got a complicated history with Gulp. Back when I first got into web development, it was what I used to make my first grown-up development environment. At the time, it blew my mind how easily I could get features like transpiling, minification, and auto-reload into my project just by following a simple beginner’s tutorial. Unfortunately for me, that simplicity came at the price of actual understanding.

Even though Gulp was the first “real” developer’s tool I ever used, I can’t say that I ever really learned it. I’ve used it for maybe half a dozen projects over the years, and every time I’ve found myself having to re-learn how it works. As a result, I never really use Gulp well, and I tend to over-rely on pre-built plugins to get anything done...a bad habit that’s followed me through most of my development career.

This article might be a strange one. We’re going to cover all of Gulp’s constituent parts, but we won’t do it in the style of your typical quick-start guide. There are plenty of Gulp tutorials (and boilerplate templates) out there already. Rather, we’ll study Gulp for Gulp’s sake, taking a high-level view of what it actually does, and drilling down a bit further into a few of its most important features.

Because we’ll be focusing more on the principle than the practical, I would recommend having some knowledge of Gulp and its basic usage before reading.

Gulp’s Not a Task Runner

...in my opinion, anyway.

I most often hear Gulp referred to as a task-runner, but I don’t think that’s quite right. I think it’s more accurate to call it a collection of individual tools that is well-suited to task-running, but not just task-running. As we’ll see, most of the pieces that make up the library can be used entirely in isolation, and with more flexibility than is obvious from your average five-minute Gulp tutorial.

Setup

It won’t take much to follow along with this exploration. Just create a new npm project and install the ‘gulp’ package, like so:

mkdir gulp-backwards
cd gulp-backwards
npm init -y
npm install --save-dev gulp

Vinyl

Vinyl is a virtual file format, which really isn’t as impressive as it sounds. A Vinyl instance is just a Javascript object that tracks all the metadata associated with a file. It has a few methods used to keep details like path names and file extensions straight if we change them, but it otherwise doesn’t actually do anything by itself.

Let’s create a simple Vinyl instance to represent a text file: test.txt. We’ll use this a few times throughout the article.

const Vinyl = require("vinyl");

const file = new Vinyl({
  cwd: process.cwd(),
  base: process.cwd(),
  path: `${process.cwd()}/test.txt`,
  contents: Buffer.from("Hello from Vinyl!"),
});

console.log(file.path);     // <cwd>/test.txt
console.log(file.basename); // test.txt
console.log(file.extname);  // .txt

Once again, test.txt doesn’t actually exist anywhere — we didn’t create a permanent file just by making a Vinyl instance. On the other hand, we definitely could if we wanted to. The object already has all the information and content we need to do it. In fact, let’s do it all in one line using createWriteStream:

const { createWriteStream } = require("fs");

const Vinyl = require("vinyl");

const file = new Vinyl({
  cwd: process.cwd(),
  base: process.cwd(),
  path: `${process.cwd()}/test.txt`,
  contents: Buffer.from("Hello from Vinyl!"),
});

createWriteStream(file.path).write(file.contents);

If you’re following along, test.txt should now be saved to the root of your npm project.

If we can create a file from a Vinyl instance, it stands to reason that we should be able to create a Vinyl instance from a file. The following example gets the job done, by loading test.txt back into memory from the file system, but it’s a little underwhelming.

const { readFileSync } = require("fs");

const Vinyl = require("vinyl");

const file = new Vinyl({
  cwd: process.cwd(),
  base: process.cwd(),
  path: `${process.cwd()}/test.txt`,
  contents: readFileSync("test.txt"),
});

console.log(file.contents.toString()); // Hello from Vinyl!

(Note: I’m being lazy with my file-system logic here. If you do ever need to load individual Vinyl instances, check out the vinyl-file package.)

By the way, sometimes you might want to clone a Vinyl instance. Do that with the clone method:

const Vinyl = require("vinyl");

const file = new Vinyl({
  cwd: process.cwd(),
  base: "/",
  path: `${process.cwd()}/test.txt`,
  contents: Buffer.from("Hello from Vinyl!"),
});


const file2 = file.clone();
file2.basename = "test2.txt";

console.log(file2.path); // /path/to/test2.txt
console.log(file2.contents.toString()); // Hello from Vinyl!

Content Types

In the above examples, the contents property of our Vinyl instance has always been the same: a Buffer of data for a short, utf8-encoded string. Usually, this is exactly what we want — most Gulp plugins will only accept Vinyl instances with ‘Buffer’ contents, and throw an error otherwise.

However, contents could be set to two other types. Most simply, it could be set to null, which is useful if we don’t care about a file’s contents. The last type is for situations when a file’s contents is simply too large to fit into memory. In these cases, it might be set to an instance of a ReadableStream, which allows for smaller chunks of the file to be loaded in one at a time.

We’ll focus on Vinyl instances with Buffer contents for the rest of this article.

Vinyl Adapters

We’ve figured out how to work with Vinyl objects by hand, but it’s hard to see what they can actually do for us, since we seem to be doing most of the heavy lifting ourselves. From what we’ve seen, Vinyl isn’t so useful that it’s worth the overhead of manually creating instances every time we want to work with something on the file system.

Vinyl adapters exist to relieve us of that overhead. They’re going to do this in two ways. First, by automatically loading and saving Vinyl instances for us. Secondly, they’ll let us switch from working with individual instances to a stream of instances, which, as we’ll see, will speed up our workflow considerably.

At a high level, Vinyl adapters are anything that export two main functions: src and dest. Both of these functions return streams — one for fetching Vinyl instances from somewhere, and another for saving Vinyl instances back to the same place.

Usually, the adapter we’ll use is one called vinyl-fs. Gulp exports this by default, and it will load from and save to the local file system. Other adapters exist, but tend to be situational.

(Note: I’d recommend a cursory knowledge of Node.js streams to get the most out of this section. You can still follow along, but some of the stream methods we’ll use might feel a little mysterious. I’ve written A Passable Explanation: Node.js Streams, if you’re interested in learning more.)

`src`

src produces a stream that emits Vinyl instances fetched from some location. We can specify which files to load by passing one or more globs as the function’s first argument (multiple globs may be given as an array). We’ll gloss over globs for the most part, but to give you an idea of what they do: file.txt would select only the one file specified. However, *.txt would select every text file at the project root. We can take advantage of this to manipulate multiple files at once.

Let’s give src a try. Since it should still be sitting in our project root, let’s load in test.txt and read the output. Notice that, since we’ll be dealing with a stream, we need to listen on it’s data event in order to get the Vinyl instance.

const { src } = require("gulp");

// Note: we could provide a more flexible glob here to load
// multiple files (ex. "/src/**/*.js").
const vinylStream = src("test.txt");

vinylStream.on("data", (file) => {
  console.log(file.contents.toString()); // Hello from Vinyl!
});

If we gave src a more permissive glob, say **/*.txt, the stream would have multiple files to emit, and our listener would fire once for every text file anywhere in the project.

`dest`

dest is fairly simple: it accepts a path to a specific directory, and then returns a stream that we can write to. Whenever we write a Vinyl instance to it, that instance will automatically be saved to the directory.

Let’s try that now. We’ll create a dest stream that saves Vinyl to some new output/ directory, and then feed it a Vinyl instance:

const { dest } = require("gulp");
const Vinyl = require("vinyl");

const file = new Vinyl({
  cwd: process.cwd(),
  base: process.cwd(),
  path: `${process.cwd()}/customVinyl.txt`,
  contents: Buffer.from("I'm a custom Vinyl instance!"),
});

const destStream = dest("output/");
destStream.write(file);
destStream.end();

Pipelines

When we use Vinyl streams, they typically don’t look like the examples above. Usually, we’ll specify a pipeline of Vinyl instances, allowing us to load, transform, and save files in one continuous sequence.

If we combine the src and dest examples from above, we’ll get this simple pipeline, which copies test.txt into the output/ directory:

const { src, dest } = require("gulp");

src("test.txt").pipe(dest("output/"));

Other Vinyl Adapters

As I mentioned above, we could use Vinyl adapters to work with files of any type, not just those on the file system. S3, Dropbox, and FTP are all examples of this. For example, we could create an FTP adapter with a dest function that saves Vinyl directly to some specified FTP server.

But if you look around npm for a while, you might get the sense that Vinyl adapters as a concept haven’t really caught on, with the exception of vinyl-fs. There could be a few reasons for this, but one might be that the abstraction simply isn’t necessary. We could go to the trouble of writing (and testing, and maintaining) an adapter for S3, but it might just be easier to use pre-existing tools, like a CLI, to upload files after we’ve saved them to disk.

Plugins

Plugins are one of Gulp’s biggest draws, and they tend to be the first things that come to mind when I think of it. At first glance, they might as well be magic: you just install the right npm package, stick it into the pipeline of your choice, and it just works.

But there’s nothing magical about plugins. In fact, if you’re familiar with basic stream concepts, you already know them by their other name: Transforms.

As a reminder, a Transform is responsible for transforming data that passes through it. When we’re dealing with Vinyl pipelines, they’ll be operating on Vinyl instances. Often, we’ll be doing something to the instance’s contents property — perhaps compiling it somehow, or linting it for style enforcement.

Let’s make a simple plugin that takes text files and uppercases the contents. Then we’ll insert that plugin into the middle of our pipeline from earlier. Now, the pipeline should load test.txt, uppercase its contents, and then save the result to the output directory.

const { Transform } = require("stream");

const { src, dest } = require("gulp");

const uppercaseTransform = new Transform({
  // Note: Since the plugin reads in and writes out Vinyl
  // objects, we need to turn on objectMode for both halves.
  writableObjectMode: true,
  readableObjectMode: true,
  transform(file, _encoding, callback) {
    const contents = file.contents.toString();
    file.contents = Buffer.from(contents.toUpperCase());

    this.push(file);
    callback(null);
  },
});

src("test.txt")
  .pipe(uppercaseTransform)
  .pipe(dest("output/"));

This is a pretty trivial use case, but the majority of plugins work in exactly this way: read the contents, optionally modify them, and then pass the instance on.

Task Functions

At this point, we can take a break from Vinyl and streams. Instead, let’s switch gears to a more Gulp-specific concept: task functions. These will be important to know for the remaining sections, so let’s try to get a grasp on what they are.

Simply put, a task function has two jobs: to do some kind of task, and to let us know when that task is complete. Here’s a simple example. Notice how the function takes a single parameter (a callback function), that it calls after the main work is done:

function myTaskFunction(callback) {
  // Our "task" just logs this message.
  console.log("Doing some work.");

  // The task is marked as completed just as soon
  // as we call this callback.
  callback();
}

Going on the above example, it might not be immediately clear why the callback is necessary when all of the work seems to happen synchronously. But consider this modified example, which wraps the work in a three-second timeout. In this case, the function will finish long before the actual work is done:

function myAsyncTaskFunction(callback) {
  // Wait three seconds before doing the task.
  setTimeout(() => {
    console.log("Doing some asynchronous work.");

    callback();
  }, 3000);

  // At this point, the function call is complete, but the
  // actual *task* is not. Gulp won't consider the task
  // finished until the callback is invoked.
}

If you’re already familiar with callbacks and asynchronous programming, this example might feel a little drawn out. Sorry.

Completion signaling is a big deal, though. Because eventually Gulp will want to start composing task functions. For example, it might want to call Task A first, wait for it to finish, and then call Task B. Or maybe Task A and Task B run at the same time, and then Task C only runs once they’re both done. All of this is only possible if Gulp has some way of knowing exactly when “done” is.

There are alternative ways to signal task completion if you’d rather not use the callback. If a task function returns something, Gulp will try to use the return value to figure out when the task completes. Most often, tasks tend to return streams, since those tend to be what do the bulk of the work anyway. When the stream ends, Gulp knows that the task is complete:

const { src, dest } = require("gulp");

function myStreamTaskFunction() {
  // Example of a task function returning a stream
  // Copies all text files in the project into
  // a single folder.
  return src("**/*.txt").pipe(dest("textFiles/"));
}

Gulp recognizes a few other return types, too, like Promises (completes on resolution) and EventEmitters (completes on the ‘finish’ event). Check out the documentation on Async Completion for the full story.

File Watching

Most Gulp tutorials tend to cover file watching, so I won’t spend too long on it. The process here is pretty simple: specify one or more globs specifying which files we should watch for changes, and then specify a task function that should run when a change is detected.

This basic example watches for changes to test.txt, and then logs a helpful note in the console when it does:

const { watch } = require("gulp");

watch("test.txt", function (callback) {
  console.log("test.txt changed again, FYI");
  callback();
});

Most importantly, note that the function we provided is a task function. Further changes to the file won’t re-run the task until it signals completion, so make sure not to forget. Here’s another example that returns a stream instead of making use of the callback — whenever test.txt changes, we’ll copy it over to the output/ directory.

const { src, dest, watch } = require("gulp");

watch("test.txt", function () {
  console.log("test.txt was changed, BTW. Copying it to output.");
  return src("test.txt").pipe(dest("output/"));
});

Task Composition

I mentioned earlier that Gulp would eventually want to take task functions and start composing them. We can specify these compositions with series and parallel, which are higher-order task functions. Each of them accepts one or more task functions as arguments, and produces a new task function that combines them in some way.

series accepts one or more task functions, and then executes them one-by-one, only starting the next task after the current one has finished. Conversely, parallel accepts one or more task functions and runs all of them concurrently. Both functions only complete themselves after all of their given tasks are completed.

Here’s an example that shows how composition is typically used.

const { series, parallel } = require("gulp");

// Placeholder Tasks

function task1(cb) {
  cb();
}

function task2(cb) {
  cb();
}

function task3(cb) {
  cb();
}

// Create a task that runs task1, task2,
// and task3 one after the other, then completes.
const seriesTask = series(task1, task2, task3);

// Create a task that runs task1, task2,
// and task3 all at the same time, then completes>
const parallelTask = parallel(task1, task2, task3);

// We can nest compositions.
const nestedTask = series(
  parallel(task1, task2, task3), // Run these tasks all at once...
  series(task1, task2, task3) // ...then run these tasks one-by-one
);

Gulpfiles and the CLI

Welcome to the point where most Gulp tutorials start. Because we’ve studied all the major pieces already, there isn’t much new to learn here. We’ll keep it brief.

The Gulpfile (usually gulpfile.js in the project’s root directory) exports a number of named task functions, which we’ve seen several times now. Here’s an example of a Gulpfile that exports just one task function (a simple example from earlier) under the ‘mytask’ name.

function myTaskFunction(callback) {
  // Our "task" just logs this message.
  console.log("Doing some work.");

  // The task is marked as completed just as soon
  // as we call this callback.
  callback();
}

exports.mytask = myTaskFunction;

To run this task from the command line, we’ll use the Gulp CLI. We can access this tool in a couple of ways. The official documentation suggests installing the CLI globally, via npm install -g gulp-cli. If you do it this way, the task can be run with the following command:

gulp mytask

Alternatively, the gulp package includes the CLI as a local executable, without us needing to install anything globally. In this case, we use the npx command:

npx gulp mytask

Whichever way we do it, the effect is the same — the CLI will read the Gulpfile, and then attempt to run the specified task. If we don’t specify a task, then Gulp will look for one named ‘default’, and run that instead.

Conclusion

There’s a psychological phenomenon called semantic satiation, in which a word repeated over and over eventually loses all meaning, being perceived by the subject as meaningless sounds. Until I began writing this article, I hadn’t realized it was possible to experience that sensation with a Javascript library. I understand Gulp now better than I ever have before, and I don’t expect I’ll ever need to re-learn it. On the other hand, I’ve spent so long studying the theory that any practical use of Gulp — Sass, Typescript, and so on — just feels weird and foreign to me.

I feel like I’ve accomplished my goal of understanding Gulp independent of context, and hopefully it’s been of some use to you, too. But the fact remains that Gulp is a contextual toolkit. The best way to use it is how all of the tutorials teach you to use it — with pipes and plugins and Gulpfiles. While we’re using them, though, it’s worth taking a moment to remind ourselves that there is no need to feel mystified by these tools. We can always pause to study them, backwards if need be. Just try not to go insane in the process.