MNIST on Deno land 🦕
Famous MNIST dataset ported to Deno land.
Usage
Load MNIST dataset:
import { loadMnist } from "https://deno.land/x/deno-mnist@v1.0.0/mod.ts";
const mnist = await loadMnist();
Dataset is split up in two parts: train data (60,000 images) and test data (10,000) images. These arrays are, in turn ordered so that first part contains easier to recognize images, than the second part. Why is it so described on Yann LeCun's original page. So, yo probably want to shuffle those images first, for that there is a shuffle util:
import {
loadMnist,
shuffle,
} from "https://deno.land/x/deno-mnist@v1.0.0/mod.ts";
const mnist = await loadMnist();
const trainData = shuffle(mnist.train);
Each image array consist of pairs – image
and it's label
. Image is an array of 784 (28×28)
integers from 0 to 255. 0 represents clear paper, 255 – the deepest (black) ink. You can normalize
these images to values between 0 and 1 using normalize()
utility function:
const trainData = shuffle(mnist.train).map(d => {label: d.label, image: normalize(d.image)});
Label is, of course, the digit that is encoded in the array. You can look at what that digit looks
like using printDigit
function, e.g.:
console.log(printDigit(mnist.test[3378].image));
console.log(mnist.test[3378].label);
Will output:
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ▓ ▓ ░ ░ ░ ░ ░ ░ ░ ▒ ▓ ▓ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ █ █ ▓ █ █ █ █ █ █ █ █ █ ▓ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ █ █ █ █ █ █ █ █ ▓ █ █ █ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ▒ █ ▓ ░ ░ ░ ░ ░ ░ ▒ █ █ ▓ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ▓ █ ▓ ░ ░ ░ ░ ░ ░ █ █ ▓ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ▓ █ ▒ ░ ░ ░ ░ ░ ▓ █ █ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ▒ █ ░ ░ ░ ░ ░ ░ █ █ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ▒ █ ▓ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ █ █ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ▒ █ █ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ █ █ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ▒ █ █ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ▓ █ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ █ █ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ █ ▓ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ▒ █ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ █ █ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ▒ █ █ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ █ █ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ █ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
7
The only useful function that is left in utils is downscaleImage
it wil turn 784 (28×28) array into
196 (14×14) array:
console.log(downscaleImage(mnist.test[3378].image).length); // -> 196
console.log(printDigit(downscaleImage(mnist.test[3378].image)));
Down-scaled output is:
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ▒ ▓ ▒ ▒ ▓ █ ▓ ░ ░ ░
░ ░ ░ ░ ▓ ▓ ▓ ▓ ▓ █ ▒ ░ ░ ░
░ ░ ░ ░ █ ▒ ░ ░ ▓ ▓ ░ ░ ░ ░
░ ░ ░ ░ ▒ ░ ░ ░ █ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ ▓ ▓ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ░ █ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ▒ ▓ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ ▓ ▒ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ █ ░ ░ ░ ░ ░ ░ ░
░ ░ ░ ░ ░ ░ █ ░ ░ ░ ░ ░ ░ ░
Data is packed in gzip files and will be unpacked on first run, so don't forget to add
--allow-read
and --allow-write
flags when you first run your program that uses the dataset.