The **mnist** is the HelloWorld in deep learning. This article describes in detail how to start deep learning from scratch. The code is detailed and the explanation is comprehensive.

### 1. Software Architecture.

- Two very small networks are used to identify the MNIST dataset. The first is the simplest fully connected network, and the second is a convolutional network.
- MNIST data set is the entry data set, so there is no need to enhance the image or read it into the memory with the generator. You can directly use the simple
**fit()**command to train at one time.

### 2. How To Install.

- Note that the
**tensorflow**version cannot be 2. X - The main third-party libraries used are
**tensorflow1.x**,**Keras**based on TensorFlow, and basic libraries include**NumPy**,**Matplotlib**. - The installation method is also very simple, for example
`pip install numpy`

. - Run the below commands to install the TensorFlow and related Python libraries.
pip install numpy pip install matplotlib pip install keras pip install tensorflow

### 3. How To Use.

- First, we preview the dataset, run
**mnistplt.py**, and draw four images for training. - To train a fully connected network, run
**Densemnist.py**to get the weight**Dense.h5**, load the model and predict to run**Denseload.py**. - To train the convolutional network, run
**CNNmnist.py**to get the weight**CNN.h5**, load the model and run**CNNload.py**for prediction.

### 4. Training Process Python Code.

- Fully connected network training.
"""Multilayer perceptron training""" from tensorflow.examples.tutorials.mnist import input_data from keras.models import Sequential from keras.layers import Dense # Simulate raw gray data reading img_size=28 num=10 mnist=input_data.read_data_sets("./data",one_hot=True) X_train,y_train,X_test,y_test=mnist.train.images,mnist.train.labels,mnist.test.images,mnist.test.labels X_train=X_train.reshape(-1,img_size,img_size) X_test=X_test.reshape(-1,img_size,img_size) X_train=X_train*255 X_test=X_test*255 y_train=y_train.reshape(-1,num) y_test=y_test.reshape(-1,num) print(X_train.shape) print(y_train.shape) # Fully connected layer can only input one dimension num_pixels = X_train.shape[1] * X_train.shape[2] X_train = X_train.reshape(X_train.shape[0],num_pixels).astype('float32') X_test = X_test.reshape(X_test.shape[0],num_pixels).astype('float32') # normalization X_train=X_train/255 X_test=X_test/255 # one hot coding, edited here, omitted #y_train = np_utils.to_categorical(y_train) #y_test = np_utils.to_categorical(y_test) # Build a network def baseline(): ''' optimizer：Optimizer, such as Adam loss：Calculate the loss. When using the categorical_crossentropy loss function, the label should be a multi-class mode. For example, if you have 10 categories, The label of each sample should be a 10-dimensional vector, which is 1 at the index position corresponding to the value and the rest is 0 metrics: A list containing metrics to evaluate the performance of the model during training and testing. ''' model=Sequential() ''' The first step is to determine the number of input layers: use the input_dim parameter to determine when creating the model, for example, if there are 784 input variables, set it to num_pixels. The fully connected layer is defined by the Dense class: the first parameter is the number of neurons in this layer, and then the initialization method and activation function. The initialization method has a continuous uniform distribution from 0 to 0.05 (uniform Keras’s default method is also this, you can also use Gaussian distribution to initialize normal, the initialization is actually the initialization of the weight and bias of the layer connection) ''' model.add(Dense(num_pixels,input_dim=num_pixels,kernel_initializer='normal',activation='relu')) # softmax is an activation function that uses all neurons in this layer model.add(Dense(num,kernel_initializer='normal',activation='softmax')) # categorical_crossentropy is suitable for multi-classification problems and uses softmax as the activation function of the output layer model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy']) return model # Training model model = baseline() """ batch_size: Integer, the number of samples for each gradient update, if not specified then the default value is 32. epochs: Integer, the number of iterations of the training model. verbose: how to display logs, integer value. 0: Not output log information to the standard output stream. 1: Show progress bar. 2: Each epoch output one line of records. For a data set with 2000 training samples, divide the 2000 samples into a batch of 500, so 4 iterations are required to complete an epoch. """ model.fit(X_train,y_train,validation_data=(X_test,y_test),epochs=10,batch_size=200,verbose=2) # Model summary printing model.summary() #model.evaluate（）What is returned is the loss value and the indicator value you selected (for example, accuracy) """ verbose：How to display logs, an integer value. verbose = 0 Do not output log information on the standard output stream. verbose = 1 Output progress bar record. """ scores = model.evaluate(X_test,y_test,verbose=0) print(scores) # Save model. model_dir="./Dense.h5" model.save(model_dir)

- CNN training.
""" Model construction and training Sequential model structure: A linear stack of layers. It is a simple linear structure with no redundant branches and a stack of multiple network layers. Output feature maps according to the filter number, that is, the depth of the convolution kernel (filter). 3-channel RGB image, one filter has a small convolution kernel with 3 channels, but it still counts as 1 filter. """ import numpy as np from tensorflow.examples.tutorials.mnist import input_data from keras.models import Sequential from keras.layers import Dense from keras.layers import Dropout # The Flatten layer is used to "flatten" the input, that is, to make the multi-dimensional input one-dimensional, # Commonly used in the transition from the convolutional layer to the fully connected layer. from keras.layers import Flatten from keras.layers.convolutional import Conv2D from keras.layers.convolutional import MaxPooling2D # Simulate raw gray data reading img_size=28 num=10 mnist=input_data.read_data_sets("./data",one_hot=True) X_train,y_train,X_test,y_test=mnist.train.images,mnist.train.labels,mnist.test.images,mnist.test.labels X_train=X_train.reshape(-1,img_size,img_size) X_test=X_test.reshape(-1,img_size,img_size) X_train=X_train*255 X_test=X_test*255 y_train=y_train.reshape(-1,num) y_test=y_test.reshape(-1,num) print(X_train.shape) #(55000, 28, 28) print(y_train.shape) #(55000, 10) # The shape of the convolution input here should match the input_shape in the model X_train = X_train.reshape(X_train.shape[0],28,28,1).astype('float32') X_test = X_test.reshape(X_test.shape[0],28,28,1).astype('float32') print(X_train.shape)#(55000,28,28,1) # normalization X_train=X_train/255 X_test=X_test/255 # one hot coding, edited here, omitted. #y_train = np_utils.to_categorical(y_train) #y_test = np_utils.to_categorical(y_test) # Build a CNN network def CNN(): """ The first layer is the convolutional layer. This layer has 32 feature maps, as the input layer of the model, accepting input data of [pixels][width][height] size. The size of the feature map is 1*5*5, and its output is connected to a ‘relu’ activation function The next layer is the pooling layer, using MaxPooling, the size is 2*2 Flatten compresses one dimension as the input layer of the fully connected layer Next is the fully connected layer, with 128 neurons, and the activation function uses ‘relu’ The last layer is the output layer, there are 10 neurons, each neuron corresponds to a category, the output value represents the probability that the sample belongs to that category """ model = Sequential() model.add(Conv2D(32, (5, 5), input_shape=(img_size,img_size,1), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Flatten()) model.add(Dense(128, activation='relu')) model.add(Dense(num, activation='softmax')) # Compile model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) return model # Model training model=CNN() model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=5, batch_size=200, verbose=1) model.summary() scores = model.evaluate(X_test,y_test,verbose=1) print(scores) # Save model model_dir="./CNN.h5" model.save(model_dir)