C++でニューラルネットワークによる手書き文字認識（PythonのNetwork.pyをC++で書き起こし）その①　for VS2017 VC++

network.pyをVC++で書き起こしする第①回目。

C++でニューラルネットワーク　その①　・・PythonのコードをVC++に書き換え

C++でニューラルネットワーク　その②　・・PythonのコードをVC++に書き換え

C++でニューラルネットワーク　その③　・・重みとバイアスを外部に記録

C++でニューラルネットワーク　その④　・・オリジナル訓練データの追加

C++でニューラルネットワーク　その⑤　・・まとめ、文字認識アプリ

Pythonで書かれたニューラルネットワークは

ニューラルネットワークと深層学習で

紹介されていている

network.py

のコードを参考にしています。

このコードと対応するC++のコードを今回の記事では紹介します。

VC＋＋で作成したコードで隠れ層５００で試した正答率の図が以下。

>NNmodel.exe 784 500 10

f:id:hatakeka:20170713155848p:plain

これより、一応C++への落とし込みは成功したと思う。

実行する際は

train-images.idx3-ubyte　画像データ

train-labels.idx1-ubyte　　画像データの正解ラベル

を同ディレクトリに配置して下さい。

詳細はここを見てください。

引数は以下のような感じとなります。

>NNmodel.exe インプット層　「隠れ層・・・」　アウトプット層

インプット層は７８４で固定です。

なぜなら２８ｘ２８の画像データを訓練データとして用いているから

アウトプット層は１０で固定です。

なぜなら、文字列の解答は１０パターンしかないから。

隠れ層は任意の数字と個数で

>NNmodel.exe 784 30 20 10 10 10

と隠れ層を複数指定出来ます。が、処理が激重になるので注意・・( ﾟДﾟ)。

実行時はReleaseモードでやってみて下さい。

debugモードでは処理が遅いです。

無難にニューラルネットワークと深層学習で紹介されている

隠れ層３０で実行してみた場合以下のような感じとなります。

>NNmodel.exe 784 30 10

f:id:hatakeka:20170704202511p:plain

その他のハイパーパラメータについてはハードコーディングしています。

Debugモードでコンパイルすると激重いのですが数日かかるものが

Releaseモードでコンパイルすると１，２時間で終わる。

こんなに違うものか・・知らなかった。

いつも、Debugモードでコンパイルして

実行していたのでDebugモードとReleaseモードのコンパイルの違い

気にも留めていませんでした。( ﾟДﾟ)

実行例

訓練データ６万、テストデータ１万の場合

>NNmodel.exe 784 30 10

Epoch {0} : {4133} / {10000}
Epoch {1} : {4647} / {10000}
Epoch {2} : {4312} / {10000}
Epoch {3} : {4690} / {10000}
Epoch {4} : {4312} / {10000}
Epoch {5} : {4500} / {10000}
Epoch {6} : {4305} / {10000}
Epoch {7} : {4687} / {10000}
Epoch {8} : {4747} / {10000}
Epoch {9} : {4764} / {10000}
Epoch {10} : {4812} / {10000}
Epoch {11} : {4609} / {10000}
Epoch {12} : {4149} / {10000}
Epoch {13} : {4547} / {10000}
Epoch {14} : {4407} / {10000}
Epoch {15} : {4444} / {10000}
Epoch {16} : {4669} / {10000}
Epoch {17} : {4703} / {10000}
Epoch {18} : {4839} / {10000}
Epoch {19} : {4507} / {10000}
Epoch {20} : {4679} / {10000}
Epoch {21} : {4683} / {10000}
Epoch {22} : {4425} / {10000}
Epoch {23} : {4614} / {10000}
Epoch {24} : {4756} / {10000}
Epoch {25} : {4130} / {10000}
Epoch {26} : {4573} / {10000}
Epoch {27} : {4733} / {10000}
Epoch {28} : {4848} / {10000}
Epoch {29} : {4667} / {10000}

f:id:hatakeka:20170712222701p:plain

訓練データ１万、テストデータ１千の場合

>NNmodel.exe 784 30 10

Epoch {0} : {429} / {1000}
Epoch {1} : {417} / {1000}
Epoch {2} : {422} / {1000}
Epoch {3} : {365} / {1000}
Epoch {4} : {400} / {1000}
Epoch {5} : {404} / {1000}
Epoch {6} : {411} / {1000}
Epoch {7} : {375} / {1000}
Epoch {8} : {401} / {1000}
Epoch {9} : {429} / {1000}
Epoch {10} : {458} / {1000}
Epoch {11} : {428} / {1000}
Epoch {12} : {380} / {1000}
Epoch {13} : {444} / {1000}
Epoch {14} : {430} / {1000}
Epoch {15} : {434} / {1000}
Epoch {16} : {403} / {1000}
Epoch {17} : {390} / {1000}
Epoch {18} : {413} / {1000}
Epoch {19} : {431} / {1000}
Epoch {20} : {391} / {1000}
Epoch {21} : {393} / {1000}
Epoch {22} : {426} / {1000}
Epoch {23} : {436} / {1000}
Epoch {24} : {466} / {1000}
Epoch {25} : {429} / {1000}
Epoch {26} : {434} / {1000}
Epoch {27} : {466} / {1000}
Epoch {28} : {397} / {1000}
Epoch {29} : {408} / {1000}

f:id:hatakeka:20170712222457p:plain

訓練データ１万、テストデータ１千の場合

>NNmodel.exe 784 500 10

Epoch {0} : {346} / {1000}
Epoch {1} : {441} / {1000}
Epoch {2} : {590} / {1000}
Epoch {3} : {670} / {1000}
Epoch {4} : {670} / {1000}
Epoch {5} : {671} / {1000}
Epoch {6} : {685} / {1000}
Epoch {7} : {679} / {1000}
Epoch {8} : {687} / {1000}
Epoch {9} : {766} / {1000}
Epoch {10} : {765} / {1000}
Epoch {11} : {764} / {1000}
Epoch {12} : {751} / {1000}
Epoch {13} : {774} / {1000}
Epoch {14} : {764} / {1000}
Epoch {15} : {761} / {1000}
Epoch {16} : {757} / {1000}
Epoch {17} : {765} / {1000}
Epoch {18} : {770} / {1000}
Epoch {19} : {745} / {1000}
Epoch {20} : {778} / {1000}
Epoch {21} : {781} / {1000}
Epoch {22} : {764} / {1000}
Epoch {23} : {774} / {1000}
Epoch {24} : {770} / {1000}
Epoch {25} : {764} / {1000}
Epoch {26} : {771} / {1000}
Epoch {27} : {764} / {1000}
Epoch {28} : {775} / {1000}
Epoch {29} : {755} / {1000}

f:id:hatakeka:20170713155848p:plain

>NNmodel.exe 784 30 10

訓練データ６万、テストデータ１万の場合６世代目

Epoch {6} : {4305} / {10000}

正答率４３．０５

>NNmodel.exe 784 100 10

訓練データ１万、テストデータ１千の場合０世代目

Epoch {0} : {778} / {1000}

正答率７７．８％

>NNmodel.exe 784 30 30 10

訓練データ１万、テストデータ１千の場合０世代目

Epoch {0} : {434} / {1000}

正答率４３．４％

>NNmodel.exe 784 3 60 8 10

訓練データ１万、テストデータ１千の場合０世代目

Epoch {0} : {122} / {1000}

正答率１２．２％

>NNmodel.exe 784 200 10

訓練データ１万、テストデータ１千の場合０世代目

Epoch {0} : {693} / {1000}

正答率６９．３％

ざっくりテストした感じでは一番正答率が高そうなのは

>NNmodel.exe 784 100 10

で、正答率が約８割。

これは中々いい数字で、PythonのコードからC++への落とし込みが、

一応成功しているといえるのではないでしょうか？

（勘違いがなければ・・）

Pythonの場合隠れ層３０で正答率がかなり高くなっているのに比べて

VC++では隠れ層３０では正答率が４０％台で低く、

処理に疑いを持った。そこで、隠れ層５００で試したところ、

学習回数に従い正答率が上がっていった。

この違いはなんだろうか？。

Network.の構成で正答率が変わるみたいで

これを機械的に見つける仕組み作った方がよさげなような・・・( ﾟДﾟ)

コードを紹介する前にざっくりとした説明をします。

以下の訓練データ

train-images.idx3-ubyte　画像データ

train-labels.idx1-ubyte　　画像データの正解ラベル

から、２８ｘ２８の入力データ（ｘ）と正解ラベル出力（ｙ）と

任意の隠れ層を作成して、ネットワークの重み＝ｗとバイアス＝ｂの

変数を持つコスト関数 C(w,b) を減少させ、

ネットワークの重みを更新させてる。詳細はここを参照。

ニューラルネットワーク中の重みの表記は f:id:hatakeka:20170623113805p:plain として表す。

Lは階層

ｊはL層のニューロン

ｋは（L－１）層のニューロン番号。

以下の図を参照して下さい。

多分、ニューラルネットワークを理解するのに図を見るのが一番の近道だと思う。

ピンとこなければ、実際に紙に書いてみるのを強くお勧めします。

f:id:hatakeka:20170623162429p:plain

活性化のベクトルは以下の数式で表される。

f:id:hatakeka:20170623105558p:plain

数式の詳細については過去記事を参照して下さい。

Python（Anaconda3）をインストールしscikit-learnでニューラルネットワーク - barus's diary

Python(Anaconda3)でDeepLearningPython35を使用してニューラルネットワークで手書き数字を認識する - barus's diary

C++でコードにする際、バイアスや重み f:id:hatakeka:20170706091012p:plain のデータを

どのような構造体にするのかが、一番の問題になるかと思う。

私は以下のような構造体を用意した。

struct WEIGHT
{
　vector<vector<KATA>> J;
};

struct Delta
{
　vector<vector<KATA>> _biases;

　vector<WEIGHT> _weights;
};

例えば

net = network.Network([784, 3, 4, 10])

のような、インプット層７８４、隠れ層３，４、アウトプット１０の場合

バイアス（_biases）は、 [array[3],array[4],array[10]]

例：

[array([[ 1.60680811],
[-0.91776427],
[ 2.39356009]]),
array([[-0.36326466],
[-0.77759203],
[ 0.80462692],
[ 0.55610158]]),
array([[-2.89399282],
[ 1.12189747],
[-0.09063474],
[ 0.19409676],
[-0.52739619],
[-1.41494714],
[ 0.83719107],
[ 0.29296509],
[-0.9796348 ],
[-0.9919744 ]])]

のような入れ子のデータを持つ。重み（_weight）は、少々複雑だが

[[array[3]x784],[array[4]x3],[array[10]x4]]

例：

[array([
[-0.51721912, -1.32663246, 0.61860472, ..., -0.14808348, 0.47365758, -1.67055342],
[-1.82324755, 2.45671693, -0.4074048 , ..., -1.22299543, -0.60028638, -0.72343723],
[ 0.13751237, -2.20392184, -0.5635813 , ..., 1.47505309, -1.08143278, 0.55418631]]),
array([[ 1.01398068, 0.71675172, -0.98611525],
[-0.91658198, -0.89796351, -0.51228011],
[ 0.53529015, 0.04383624, -0.41019621],
[-0.64914817, 1.29586205, -0.95359815]]),
array([[ 1.49524657, -0.11535072, -0.29128393, 0.60778149],
[ 0.22230999, 0.63995748, -0.43956315, -1.01877885],
[-0.37976193, 0.18921609, -1.20175081, -0.42824686],
[-0.27321251, 0.86174003, 0.51706354, -0.99385186],
[-0.5942666 , -0.72616386, 0.10107404, 1.81730394],
[ 1.72687002, -2.42612697, 1.57296988, -0.50415091],
[-0.36752011, -0.00892462, -1.78011645, -0.98530298],
[-0.26814078, -1.0660397 , -2.05035737, -0.7745915 ],
[ 0.34035524, 0.2669593 , -1.22003159, -0.70460095],
[-2.27451737, 0.94988465, -0.15863418, -2.20939092]])]

となります。

構造体が決まれば、プログラムの半分以上終わったようなものですね！( ﾟДﾟ)。

参考書の多くはデータ構造が書かれていない場合が多い。

プログラムの経験がある著者が書かれている場合であれば

まず、データ構造を書くべきだと思う。

私が参考にしているニューラルネットワークと深層学習では

データ構造が書かれていないが、Pythonで書かれたコードがあるので

コードをみれば自明である。

もしPythonで書かれたコードがなければ、重み f:id:hatakeka:20170706091012p:plain のデータを

どういった構造体にしたらいいか分からなかったと思います。

あとはこのデータをこねくり回すだけですね！。

（こねくり回すだけなのだが、行列の処理部分がくそ面倒くさい・・・( ﾟДﾟ)）

ではVC++でコンソールプロジェクト（NNmodel）を作成して下さい。

以下のようなクラスとヘッダを作成しました。

mnist.cpp mnist.h　　　　・・・MNISTのdatasetを読み込む

network.cpp network.h　　・・・ネットワーク処理

neurons.cpp neurons.h　　・・・ニューロンの構造体

NNmodel.cpp　　　　　　・・・メイン処理

プロジェクト作成した概要は以下のような感じ

f:id:hatakeka:20170706100552p:plain

最初に訓練データを読み込むmnist.cppについて紹介しましょう。

これは前回の記事で訓練データ(MNISTのdataset)を読み込む例で紹介しました。

今回は、訓練データから、ランダムに訓練データ(trainingdata)を選択しその先頭から

１００個程度をテストデータ(test_data)として、訓練データからテストデータを排除します。

構造体と定義を示すと以下のようにした。

struct MNIST_compact
{
int index; //何番目の要素か
vector<vector<Pixel>> images; //< The training images
vector<Label> labels; //< The training labels
};

vector<MNIST_compact> trainingdata: //訓練データ

vector<MNIST_compact> test_data;　　//テストデータ

訓練データとテストデータ共に構造体はMNIST_compactで

何番目の画像データ(index)

画像データ（images）

ラベル（labels）

の情報を持つことにしています。

お気づきだろうが、Vectorが便利なのでこれを多用している。

これがスピードがゲキオモの原因かもしれないが、

とにかく動くことを優先した。( ﾟДﾟ)

ではコードを見ていきましょう。

NNmodel.cpp

// NNmodel.cpp : コンソール アプリケーションのエントリ ポイントを定義します。
//

#include "stdafx.h"

#include <stdio.h>
#include <stdlib.h> // exit()
#include <fcntl.h>

#include <iostream>  // for debug writing
#include <string>    // useful for reading and writing

#include <fstream>   // ifstream, ofstream
#include <sstream>   // istringstream


#include "mnist.h"
#include "network.h"


void nntest(vector<int> nets);

/*
VC++ for VS2017
実行
>readmnist.exe 20
*/
int main(int argc, char **argv) {

	vector<int> nets;
	for (int i = 1; argc > i; i++)
	{
		nets.push_back(atoi(argv[i]));
	}

	//----------------------------
	//ニューラルネットワーク
	//----------------------------
	nntest(nets);



}


//----------------------------
//ニューラルネットワーク
//----------------------------
void nntest(vector<int> nets)
{

	//----------------------------
	//ネットワークを作成
	//#net = network.Network([784, 30, 10])
	//#net.SGD(training_data, 30, 10, 3.0, test_data=test_data)
	//----------------------------
	network net;
	net.Network(nets);

	cout << "nntest1" << endl;

	net.print_weight(net._neurons[0]._weights);//重み表示
	net.print_biase(net._neurons[0]._biases);//バイアス表示

											 
	cout << "nntest2" << endl;
	//----------------------------
	//トレーニングデータ読み込み
	//----------------------------
	mnist mnist;
	int testdata_n = 1000; //訓練データの上限
	MNIST_dataset training_data;
	int index = 3;
	cout << "tradingdata reading.. index=" << index << endl;
	training_data.training_images = mnist.read_training_images("train-images.idx3-ubyte", index);
	training_data.training_labels = mnist.read_training_labels("train-labels.idx1-ubyte", index);
	//indexを割り振る
	vector<int> indexint;
	for (int i = 0; training_data.training_images.size() > i; i++)training_data.index.push_back(i);
	//イメージとラベルをマージする
	vector<MNIST_compact> trainingdata_ = net.marge(&training_data);
	// シャッフル
	std::shuffle(trainingdata_.begin(), trainingdata_.end(), std::mt19937());
	//先頭１００を取り出し
	vector<MNIST_compact> trainingdata;
	for (int i = 0; testdata_n > i; i++) trainingdata.push_back(trainingdata_[i]);
	cout << "trainingdata.size=" << trainingdata.size() << endl;


	cout << "nntest3" << endl;
	//----------------------------
	//テストデータ作成
	//テスト用画像をトレーニングデータからn_test個、ランダムに抽出
	//----------------------------
	int n_test = 100;//テストデータ数
	vector<MNIST_compact> test_data;
	test_data = mnist.test_images_rnd(trainingdata, n_test);
	cout << "nntest3.5" << endl;
	//トレーニングデータから選択したテストデータを除外
	net.delete_test_images(&trainingdata, test_data);

	cout << "nntest4" << endl;
	//----------------------------
	//ニューラルネットワーク
	//Train the neural network using mini-batch stochastic gradient descent.
	//
	//	net.SGD(training_data, 30, 10, 3.0, test_data = test_data)
	//	net.SGD(training_data, 3, 10, 3.0, test_data = test_data)
	//----------------------------
	net.SGD(trainingdata, 30, 10, 3.0, test_data);

}

network.cpp

#include "stdafx.h"
#include "network.h"


network::network()
{

}


network::~network()
{
}


vector<KATA> makerandom(int howmany)
{
	//------------------------
	//乱数発生
	//------------------------
	std::random_device rnddev;

	//メルセンヌ・ツイスターの使用
	std::mt19937 mt(rnddev());

	//-1～１の乱数
	std::uniform_real_distribution<double> rnd(-1, 1);
	
	vector<KATA> data;
	data.reserve(howmany);
	for (int i = 0; howmany > i; i++)data.push_back(rnd(mt));

	return data;

}

void network::Network(vector<int> nets)
{
	neurons neurons;
	
	int i = 0;
	

	//----------------------
	//ニューロンの階層
	//----------------------
	neurons._num_layers = nets.size();
	for (i = 0; neurons._num_layers > i; i++)cout << "i=" << nets[i] << endl;

	//----------------------
	//ニューロンの階層における要素数
	//net.network(784, 3, 8, 10);
	//----------------------
	for (i = 0; neurons._num_layers > i; i++)
	{
		neurons._sizes.push_back(nets[i]);
	}
	
	//----------------------
	//バイアス 
	//3,   8,   10
	//----------------------
	for (int i=1; neurons._sizes.size()>i; i++)
	{
		vector<KATA> biase_j;
		biase_j = makerandom(neurons._sizes.at(i));
		neurons._biases.push_back(biase_j);
	}
	
	//----------------------
	//重み作成  
	//              k  k  k     
	//                 j  j  j
	//net.network(784, 3, 8, 10);
	//  kxj
	//784x3, 3x8, 8x10
	//----------------------
	cout << "makeing weight.." << endl;
	for (int j = 1; neurons._sizes.size() > j; j++)
	{
		WEIGHT weight_j;
		for (int k = 0; neurons._sizes.at(j) > k; k++)
		{
			vector<KATA> weight_k;
			weight_k = makerandom(neurons._sizes.at(j-1));
			weight_j.J.push_back(weight_k);
		}
		neurons._weights.push_back(weight_j);
	}

	_neurons.push_back(neurons);

}



//重みの表示
void network::print_weight(vector<WEIGHT> wei) {

	cout << "重みの表示" << endl;
	for (int L = 0; wei.size()>L; L++)
	{
		cout << "array([" << endl;
		WEIGHT weight = wei[L];
		for (int j = 0; weight.J.size()>j; j++)
		{
			cout << "[";
			vector<KATA> weight2 = weight.J[j];
			for (int k = 0; weight2.size()>k; k++)
			{
				cout << weight2[k] << ",";
			}
			cout << "]" << endl;
		}
		cout << "])" << endl;
	}

	cout << endl;

}

//指定L層のバイアスの値の表示
void network::print_biase(int Layer, Delta biase) {

	if (Layer == NULL)
	{
		print_biase(biase._biases);
		return;
	}

	int d = biase._biases.size();

	if (Layer < 0)d = d + Layer;
	else d = Layer;


	int L = d;//for (int L = 0; biase.size()>L; L++)
	{
		cout << d << "層要素" << biase._biases[L].size() << "のバイアス表示" << endl;
		cout << "array([" << endl;
		for (int j = 0; biase._biases[L].size()>j; j++)
		{
			cout << biase._biases[L][j] << ",";
		}
		cout << "])" << endl;
	}

	cout << endl;

}


//バイアスの値の表示
void network::print_biase(vector<vector<KATA>> biase){
	cout << "バイアスの表示" << endl;
	
	for (int L = 0; biase.size()>L; L++)
	{
		cout << "array([" << endl;
		vector<KATA> biases = biase[L];
		for (int j = 0; biases.size()>j; j++)
		{
			cout << biases[j] << ",";
		}
		cout << "])" << endl;
	}

	

	cout << endl;

}


/*
Train the neural network using mini-batch stochastic
gradient descent.  The ``training_data`` is a list of tuples
``(x, y)`` representing the training inputs and the desired
outputs.  The other non-optional parameters are
self-explanatory.  If ``test_data`` is provided then the
network will be evaluated against the test data after each
epoch, and partial progress printed out.  This is useful for
tracking progress, but slows things down substantially."""
"""
ミニバッチ確率論勾配降下を用いてニューラルネットワークを訓練する。
`` training_data``はタプルのリストです
訓練の入力と出力を表す ``（x、y） ``出力する。
その他の非オプションのパラメータは次のとおりです。
自明である。 `` test_data``が与えられていると、
ネットワークはそれぞれの後にテストデータに対して評価されます
エポック、部分的な進捗状況が表示されます。これは、
進行状況を追跡しますが、大幅に遅くなります。

Pythonより
*/
void network::SGD(vector<MNIST_compact> training_data,	//訓練データ
	int epochs,											//世代（学習回数）
	int mini_batch_size,								//ミニバッチサイズ
	double eta,											//学習率
	vector<MNIST_compact> test_data)					//テストデータ
{

	//----------------------------------------------
	//トレーニング画像データの要素数
	//----------------------------------------------
	int n = training_data.size();
	cout << "トレーニング画像データの要素数=" << n << endl;

	//----------------------------------------------
	//テスト画像データの要素数
	//----------------------------------------------
	int n_test = test_data.size();
	cout << "テスト画像データの要素数=" << n_test << endl;


	//----------------------------------------------
	//学習回数
	//----------------------------------------------
	_strEpochresult[0] = '\0';
	vector<string> rlt;
	for (int i = 0; epochs > i; i++)
	{
		cout << "epochs:" <<  i << "/" << epochs << endl;
		//----------------------------------------------
		// シャッフル
		//----------------------------------------------
		std::shuffle(training_data.begin(), training_data.end(), std::mt19937());

		//----------------------------------------------
		//# 0～nまでをmini_batch_sizeずつとばしで繰り返す
		//mini_batch_size毎に要素を切り出して格納
		//mini_batches = [
		//		training_data[k:k + mini_batch_size]
		//		for k in range(0, n, mini_batch_size)]
		//----------------------------------------------
		vector<vector<MNIST_compact>> mini_batches;
		for (int j = 0; n > j; j += mini_batch_size)
		{
			vector<MNIST_compact> mini_batch;
			for (int k = 0; mini_batch_size > k; k++)
			{
				if (k+j > n - 1)break;
				mini_batch.push_back(training_data.at(k + j));
			}
			if(j % 1000 == 0)fprintf(stderr, "make mini_batches %d\r", j);
			mini_batches.push_back(mini_batch);
		}

		cout << "mini_batches.size()=" << mini_batches.size() << endl;
		//----------------------------------------------
		//バッチの更新
		//for mini_batch in mini_batches :
		//	self.update_mini_batch(mini_batch, eta)
		//----------------------------------------------
		Delta delta;
		for (int j = 0; mini_batches.size() > j; j++)
		{
			//if(j % 50 == 0)fprintf(stderr, "update_mini_batch %d\r", j);
			vector<MNIST_compact> mini_batch = mini_batches.at(j);
			for (int k = 0; mini_batch.size() > k; k++)
			{
				delta = update_mini_batch(mini_batch, eta);
			}

		}
			sprintf(_strEpochresult, "Epoch {%d} : {%d} / {%d}\n", i, evaluate(test_data), n_test);
			rlt.push_back(_strEpochresult);
		
		//print_weight(delta._weights);	//重みの表示
		//print_biase(delta._biases);		//バイアス表示
	}

	cout << "complete." << endl;
	for (int i = 0; rlt.size() > i; i++)cout << rlt[i].c_str();

}

/*
"""Return the number of test inputs for which the neural
network outputs the correct result. Note that the neural
network's output is assumed to be the index of whichever
neuron in the final layer has the highest activation."""
テスト入力の数を返します。
ネットワークは正しい結果を出力します。
なお、最終層のニューロンは最も高い活性化を有する。
	
	//np.argmax:最大値となる要素のインデックスを求める
	//y = convert_to_arry(mini.labels.at(i)); //indexを要素に変換

	def evaluate(self, test_data):
		test_results = [(np.argmax(self.feedforward(x)), y)
		for (x, y) in test_data]
		return sum(int(x == y) for (x, y) in test_results)

	def feedforward(self, a):
		"""Return the output of the network if ``a`` is input."""
		for b, w in zip(self.biases, self.weights):
		a = sigmoid(np.dot(w, a)+b)
		return a

*/
//テストデータの評価
int network::evaluate(vector<MNIST_compact> test_data)
{
	vector<TEST_RESLT> test_results;
	for (int i = 0; test_data.size() > i; i++)
		test_results.push_back(feedforward(test_data[i]));
	
	//正解の合計
	int sum = 0;
	for (int i = 0; test_results.size() > i; i++)
		if (test_results[i].x == test_results[i].y)sum++;
	return sum;
}

//テストデータの評価
TEST_RESLT network::feedforward(MNIST_compact mini)
{
	TEST_RESLT rlt;
	vector<double> activation; //=x
	//vector<double> y; //回答

	//トレーニングデータ(1つ)
	for (int i = 0; mini.images.size() > i; i++)
	{
		for (int j = 0; mini.images[i].size() > j; j++)
		{
			double data = static_cast<double>(mini.images[i][j]);
			if (data > 0)data = 1;//入力マス
			else data = 0;
			activation.push_back(data);
		}
	}

	//z = np.dot(w, activation) + b
	vector<vector<double>> activations;
	auto itr_neurons = _neurons.begin();
	neurons neuron = *itr_neurons;
	//最後のLayer層まで発火させる。
	for (int layer = 0; neuron._biases.size() > layer; layer++)
	{
		vector<double> z;
		//a = sigmoid(np.dot(w, a) + b)の　np.dot(w, a)部分
		matrix_dot(layer, &z, &activation, neuron);
		activation.clear();  // 先頭から末尾まで削除
		//a = sigmoid(np.dot(w, a) + b)の　np.dot(w, a)部分
		matrix_sigmoid(&activation, z);	//ｚ値のsigmoidしたものがL層のactivationになる。
	}
	
	//表示
	double max = 0;
	int index_y = 0;
	cout << "activation." << endl;

	//-----------------------------------------
	//テストデータ表示
	cout << "テストデータ表示" << endl;
	for (int i = 0; mini.images.size() > i; i++)
	{
		print_images(mini.images.at(i), mini.labels.at(i));
		cout << mini.index << endl;
	}
	cout << "--" << endl;
	//-----------------------------------------
	for (int i = 0; activation.size() > i; i++)
	{
		if (max < activation[i]) { max = activation[i]; index_y = i; }
		printf("%f,", activation[i]);
	}
	cout << endl;
	
	rlt.x = index_y;		//発火させた結果
	rlt.y = mini.labels[0]; //解答　テストの要素はひとつだけ

	printf("発火させた結果=%d, 解答=%d\n", rlt.x, rlt.y);
	cout << _strEpochresult << endl;

	return rlt;
}

/*
"""Update the network's weights and biases by applying
gradient descent using backpropagation to a single mini batch.
The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``
is the learning rate."""
"""
逆伝播を用いた勾配降下の単一のミニバッチ適用することによって
ネットワークの重みと偏りを更新する
`` mini_batch``はタプル ``（x、y） ``のリスト
`` eta``は学習率です。

struct MNIST_compact
{
vector<int> indexs;				//何番目の要素か
vector<vector<Pixel>> images;	//< The training images
vector<Label> labels;			//< The training labels
};

Python より
*/
Delta network::update_mini_batch(vector<MNIST_compact> mini_batch, double eta)
{

	Delta delta;


	//-------------------------------------
	//#nabla_b nabla_wを０で初期化
	//nabla_b = [np.zeros(b.shape) for b in self.biases]
	//nabla_w = [np.zeros(w.shape) for w in self.weights]
	//-------------------------------------
	auto itr_neurons = _neurons.begin();
	neurons neuron = *itr_neurons;
	delta._biases  = neuron._biases;
	delta._weights = neuron._weights;
	//deltaの要素に０をセットする。
	matrix_zeros(&delta);
	//print_weight(delta._weights);
	//print_biase(delta._biases);

	//-------------------------------------
	//xはinput,yはoutput
	//for x, y in mini_batch :
	//   delta_nabla_b, delta_nabla_w = self.backprop(x, y)
	//-------------------------------------
	for (int i = 0; mini_batch.size() > i; i++)
	{
		Delta delta_nabla = backprop(mini_batch.at(i));
		matrix_plus(&delta, delta_nabla);
	}

	//ネットワークのバイアスと重みを更新
	matrix_update_wb(delta, eta, mini_batch.size());

	//更新したバイアスと重みを表示
//	print_weight(_neurons[0]._weights);
//	print_biase(_neurons[0]._biases);

	return delta;
}


//ネットワークのバイアスと重みを更新
void network::matrix_update_wb(Delta delta, double eta, int mini_batch_len)
{
	for (int i = 0; delta._weights.size() > i; i++) //L層
	{
		for (int j = 0; delta._weights.at(i).J.size() > j; j++) //L層のJ番目
		{
			double nb = delta._biases[i][j];
			double b  = _neurons[0]._biases[i][j];
			_neurons[0]._biases[i][j] = b - (eta / mini_batch_len)*nb;//バイアス更新
			for (int k = 0; delta._weights[i].J[j].size() > k; k++)//L層のJ番目の重みｋ
			{
				double nw = delta._weights[i].J[j][k];
				double w = _neurons[0]._weights[i].J[j][k];
				_neurons[0]._weights[i].J[j][k] = w - (eta / mini_batch_len)*nw;//重み更新
			}
		}
	}
}

Delta network::backprop(MNIST_compact mini)
{
	Delta delta;
	//-------------------------------------
	//#nabla_b nabla_wを０で初期化
	//nabla_b = [np.zeros(b.shape) for b in self.biases]
	//nabla_w = [np.zeros(w.shape) for w in self.weights]
	//-------------------------------------
	auto itr_neurons = _neurons.begin();
	neurons neuron = *itr_neurons;
	delta._biases = neuron._biases;
	delta._weights = neuron._weights;
	//deltaの要素に０をセットする。
	matrix_zeros(&delta);
	

	vector<double> activation;
	vector<double> y; //回答
	//トレーニングデータ(1つ)
	for (int i = 0; mini.images.size() > i; i++)
	{
		for (int j = 0; mini.images[i].size() > j; j++)
		{
			double data = static_cast<double>(mini.images[i][j]);
			if (data > 0)data = 1;//入力マスを１に
			else data = 0;
			activation.push_back(data);
		}
		y = convert_to_arry(mini.labels.at(i));
	}
	/*
	  //-----------------------------------------
	  //訓練データ表示
		cout << "mini.images" << endl;
		for (int i = 0; mini.images.size() > i; i++)
		{
			print_images(mini.images.at(i), mini.labels.at(i));
			cout << mini.indexs.at(i) << endl;
		}
		cout << "--" << endl;
	  //-----------------------------------------
	*/

	/*
	//-------------------------------------
	//	#ここですべての層（Input層以外）のバイアスとｗを計算
	//	#　2, 3, ・・・L層のそれぞれの階のｚ値をappendで
	//	#zs.append(z)に収納
	//	for b, w in zip(self.biases, self.weights) :
	//		z = np.dot(w, activation) + b
	//		zs.append(z)
	//		activation = sigmoid(z)
	//		activations.append(activation)
	//-------------------------------------
	*/
	vector<vector<double>> zs;				//1からL層のｚ値を格納
	vector<vector<double>> activations;		//1からL層のactivation値を格納
	activations.push_back(activation);
	for (int i = 0; neuron._biases.size() > i; i++)
	{
		
		vector<double> z;
		//i=L層の w0*a0+b, w1*a1+b,  w2*a2+b,  wN*aN+b　の計算
		matrix_dot(i, &z, &activation, neuron);
		zs.push_back(z);
		activation.clear();  // 先頭から末尾まで削除
		matrix_sigmoid(&activation, z);	//ｚ値のsigmoidしたものがL層のactivationになる。
		activations.push_back(activation);
	}

	/*
	  //-----------------------------------------
		# backward pass
			delta = self.cost_derivative(activations[-1], y) * sigmoid_prime(zs[-1])
			nabla_b[-1] = delta
			nabla_w[-1] = np.dot(delta, activations[-2].transpose())
	
		yは正解
		Pythonの transpose()　行列の転置
		arr ==
		[[0  1  2  3]
		 [4  5  6  7]
		 [8  9 10 11]
		 [12 13 14 15]]
		arr.transpose() == 
		[[0  4  8 12]
		 [1  5  9 13]
		 [2  6 10 14]
		 [3  7 11 15]]
	  //-----------------------------------------
	*/
	vector<double> out_z;
	vector<double> out_activation;
	vector<double> delta_c, sigmoidprime;

	out_z = matrix_Layer_z(-1, zs); //zs.back();	//最後のL層のアウトプット
	out_activation = activations.back();			//最後のL層に入るActivation
	sigmoidprime = matrix_sigmoid_prime(out_z);

	//delta = self.cost_derivative(activations[-1], y) * sigmoid_prime(zs[-1])
	delta_c = cost_derivative(out_activation, y, sigmoidprime);

	//最後のL層のバイアスと重みを更新
	//nabla_b[-1] = delta
	matrix_Layer_update_biases(-1, &delta, delta_c); //L層のバイアス(delta._biases)更新
	
	//nabla_w[-1] = np.dot(delta, activations[-2].transpose())
	vector<double> atctivation_front = matrix_Layer_Atctivation(-2, activations);
	matrix_Layer_update_weights(-1, &delta, delta_c, atctivation_front);//L層の重み(delta._weight)更新

	//更新したバイアス表示
	//print_biase(-1, delta);
	
	/*
	  //-----------------------------------------
		Note that the variable l in the loop below is used a little
		differently to the notation in Chapter 2 of the book.Here,
		l = 1 means the last layer of neurons, l = 2 is the
		second - last layer, and so on.It's a renumbering of the
		scheme in the book, used here to take advantage of the fact
		that Python can use negative indices in lists.
		以下のループの変数lは少し使用されていることに注意してください
		この本の第2章の記法とは異なります。ここに、
		l = 1はニューロンの最後の層を意味し、l = 2は
		最後の2番目のレイヤーなどがあります。これは、
		本のスキームは、ここで事実を利用するために使用されています
		Pythonはリスト内で負のインデックスを使用できます。

		for l in range(2, self.num_layers):
			z = zs[-l]
			sp = sigmoid_prime(z)
			delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
			nabla_b[-l] = delta
			nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
	  //-----------------------------------------	*/

	//上記で、最後のL層のバイアスと重みを更新したので
	//残りのバイアスと重みをL層から１へ逆に更新する。
	int len = neuron._biases.size();
	for (int layer = 2; len + 1> layer; layer++)
	{
		//cout << "layer=" << layer << endl;
		vector<double> out_z2;
		vector<double> out_activation2;
		vector<double> sigmoidprime2;
		out_z2 = matrix_Layer_z(-layer, zs);
		sigmoidprime2 = matrix_sigmoid_prime(out_z2);

		//転置
		Delta delta_t = matrix_transpose_weight(-layer + 1, delta);
		//delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
		matrix_Layer_update_delta_c(-layer + 1, delta_t, &delta_c, sigmoidprime2);//delta更新
		//nabla_b[-l] = delta
		matrix_Layer_update_biases(-layer, &delta, delta_c); //L層のバイアス(delta._biases)更新
		//Delta delta_t = matrix_transpose_weight(delta);
		//nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
		//activationsのて転置は１行N列だから
		vector<double> atctivation_front2 = matrix_Layer_Atctivation(-layer-1, activations);
		matrix_Layer_update_weights(-layer, &delta, delta_c, atctivation_front2);//L層の重み(delta._weight)更新
	}


	return delta;
}


//現在の階層
int network::pLayer(int Layer)
{
	int d = _neurons[0]._biases.size();

	if (Layer < 0)d = d + Layer;
	else d = Layer;
	return d;

}


//行列の転置
//行列（row,col）の長い方をMAXLENとして
//転置側の配列に[x][y]代入する。  x = p % MAXLEN; y = p / MAXLEN;
//小さいROWから大きいROW変換時はROWで
//大きいROWから小さいROW変換時はCOLで
Delta network::matrix_transpose_weight(int Layer, Delta delta)
{
	
	Delta rlt;
	vector<KATA> dumy;

	int d = pLayer(Layer);//現在の階層
	
	if (d < 0)return rlt;

	
	//重み
	int L = d; //for (int L = 0; delta._weights.size() > L; L++)	//L層
	{
		/*
		for (int j = 0; delta._weights[L].J.size() > j; j++)		//L層のJ番目
			for (int k = 0; delta._weights[L].J[j].size() > k; k++)	//L-1層からのK番目
				delta._weights[L].J[j][k];
		*/
		int r = delta._weights[L].J.size();
		int c = delta._weights[L].J[0].size();//J[j]のj=0...sizeの要素はすべて同じなので

	//	printf("row=%d, col=%d\n", r, c);

		int maxlen = r;
		//------------------------
		//転置側のメモリー確保
		//------------------------
		rlt._weights.resize(1);
		rlt._weights[0].J.resize(c);
		for (int j = 0; c > j; j++)		//L層のJ（＝ROW）
			rlt._weights[0].J[j].resize(r);//Jの要素j番目の要素数（＝COL）確保

	//	cout << "weight." << endl;

		int p = 0;
		for (int j = 0; r > j; j++)		//L層のJ番目			
			for (int k = 0; c > k; k++)	//L-1層からのK番目
			{
				double d = delta._weights[L].J[j][k];
				int x = p % c;
				int y = p / c;
				p++;
			//	printf("p=%d [%02d][%02d]=%f\n", p, x, y, d);
				rlt._weights[0].J[x][y] = d;
			}
	}
	return rlt;
}


//指定L層の、delta更新 delta_tは転置行列で
//delta_t._weights[0]の一つのみ
void network::matrix_Layer_update_delta_c(int Layer, Delta delta_t, vector<double> *delta_c, vector<double> sigmoidprime2)
{
	int d = pLayer(Layer);//現在の階層

	vector<double> rlt_delta_c = *delta_c;
	delta_c->clear();

	/*
		for l in range(2, self.num_layers):
			z = zs[-l]
			sp = sigmoid_prime(z)
			delta = np.dot(self.weights[-l+1].transpose(), delta) * sp
			nabla_b[-l] = delta
			nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
	*/
	int row = delta_t._weights[0].J.size();
	int col = delta_t._weights[0].J[0].size();
	//重み
	int L = d;//for (int i = 0; delta->_weights.size() > i; i++)	//L層
	{
		for (int j = 0; delta_t._weights[0].J.size() > j; j++)		//L層のJ番目
		{
			double s = sigmoidprime2[j];
			for (int k = 0; delta_t._weights[0].J[j].size() > k; k++)	//L-1層からのK番目
			{
				double w = delta_t._weights[0].J[j][k];
				double d = rlt_delta_c.at(k);
				double val = w*d*s;
			//	printf("[%02d][%02d]=%f\n", j, k, val);
				delta_c->push_back(val);
			}
		}
	}

	//delta_c->clear();
	//*delta_c = rlt_delta_c;

}



//指定L層の、deltaの重み更新
//重み更新には、L-1層のatctivation_frontを使う
//Layer 0,1,2,...　最初から
//      -1,-2,...最後から
void network::matrix_Layer_update_weights(int Layer, Delta *delta, vector<double> delta_c, vector<double> atctivation_front)
{
	
	int d = pLayer(Layer);//現在の階層

	//重み
	int L = d;//for (int i = 0; delta->_weights.size() > i; i++)	//L層
	{
		for (int j = 0; delta->_weights[L].J.size() > j; j++)		//L層のJ番目
		{
			double c = delta_c[j];//L層のJ番目
			for (int k = 0; delta->_weights[L].J[j].size() > k; k++)	//L-1層からのK番目
			{
				delta->_weights[L].J[j][k] = c*atctivation_front[k];
			}
		}
	}
}




//指定したLayer層のzを取得
vector<double> network::matrix_Layer_z(int Layer, vector<vector<double>> z)
{

	vector<double> rlt;

	
	int d = z.size();
	if (Layer < 0)d = d + Layer;
	else d = Layer;
	
	int i = d;//for(int i=0; activations.size()>i; i++)	//L層
	{
		//for (int j = 0; activations[i].size() > j; j++)	
		rlt = z[i];
	}

	return rlt;

	
}


//指定したLayer層のActivationを取得
vector<double> network::matrix_Layer_Atctivation(int Layer, vector<vector<double>> activations)
{
	vector<double> rlt;

	rlt = matrix_Layer_Cut(Layer, activations);

	return rlt;
}


//vector<vector<double>>の型の、指定したLayer層を取得
vector<double> network::matrix_Layer_Cut(int Layer, vector<vector<double>> activations)
{
	vector<double> rlt;

	int d = activations.size();
	if (Layer < 0)d = d + Layer;
	else d = Layer;

	int i = d;//for(int i=0; activations.size()>i; i++)	//L層
	{
		//for (int j = 0; activations[i].size() > j; j++)	
		rlt = activations[i];
	}

	return rlt;
}



//指定L層の、deltaのバイアス更新
//Layer 0,1,2,...　最初から
//      -1,-2,...最後から
void network::matrix_Layer_update_biases(int Layer, Delta *delta, vector<double> delta_c)
{
	
	int d = pLayer(Layer);//現在の階層

	int L = d;//for (int i = 0; delta->_biases.size() > i; i++) //L層
	{
		delta->_biases[L] = delta_c;
	}
}


vector<double> network::cost_derivative(vector<double> out_activation, vector<double> y, vector<double> sigmoidprime)
{
	vector<double> rlt;

	for (int j = 0; y.size() > j; j++)			//L層のJ番目のサイズ
	{
		double outy = y[j];				//y解答
		double output_activation = out_activation[j];
	    
		rlt.push_back((output_activation - outy)*sigmoidprime[j]);		//activationを更新
	}
	return rlt;

}

vector<double> network::matrix_sigmoid_prime(vector<double> zs) {

	vector<double> rlt;
	for (int i = 0; zs.size() > i; i++)			
	{
		double zs_val = sigmoid_prime(zs[i]);
		rlt.push_back(zs_val);
	}
	return rlt;
}

//トレーニングデータの正解をvector<int>型の配列に変換
vector<double> network::convert_to_arry(int label)
{
	vector<double> rlt;
	for (int i = 0; 10 > i; i++)
	{
		if (label == i)rlt.push_back(1);
		else rlt.push_back(0);
	}
	return rlt;
}





/*
sigmoid z=[[-1.38832798]
[-1.66134227]
[ 0.93018146]
[ 0.55425694]]
*/
void network::matrix_sigmoid(vector<double> *activation, vector<double> z) {
	
	for (int j = 0; z.size() > j; j++)	//L層のJ番目のサイズ
	{
		double zz = z[j];						//_biasesに入れた値が次のActivation
		activation->push_back(sigmoid(zz));//activationを更新
	}
	
}

//w0*a0+b, w1*a1+b,  w2*a2+b,  wN*aN+b　の計算
void network::matrix_dot(int l, vector<double> *z, vector<double> *activation, neurons neuron)
{
	int L = l;//for (int J = 0; weigth->_weights.size() > J; J++) //L層
	{
		//cout << "L=" << L << "層は" << neuron._biases[L].size() << "個の要素を持つ" << endl;
		//cout << "一つ前のActibation数は" << activation->size() << "個" <<endl;
		for (int j = 0; neuron._biases[L].size() > j; j++) //L層のJ番目のサイズ
		{
			double w, a, b;
			double sum=0.0;
			for (int k = 0; activation->size() > k; k++)	//L層のJ番目に入ってくるActivation数
			{
				w = neuron._weights[L].J[j][k];		//		L層のJ番目の重みｋ
				a = activation->at(k);				// kはL-1層のJ番目のactivationに相当する。
				sum += w*a;
			}
				b = neuron._biases[L][j];			//		L層のJ番目のバイアス
				z->push_back(sum + b);				//次のActivation
		}
	}

}


/*
	def sigmoid_prime(z):
	"""Derivative of the sigmoid function."""
	return sigmoid(z)*(1-sigmoid(z))
*/
double network::sigmoid_prime(double z) {
	return sigmoid(z)*(1 - sigmoid(z));
}


/**
 シグモイド関数を計算する

 """The sigmoid function."""
 return 1.0/(1.0+np.exp(-z))
*/
double network::sigmoid(double z) {
	return 1.0 / (1.0 + exp(-z));
}

//deltaの要素に０をセットする。
void network::matrix_zeros(Delta *delta)
{
        
	//バイアス
	for (int i = 0; delta->_biases.size() > i; i++)
		for (int j = 0; delta->_biases[i].size() > j; j++)
			delta->_biases[i][j] = 0;

	//重み
	for (int i = 0; delta->_weights.size() > i; i++)
	{
		for (int j = 0; delta->_weights[i].J.size() > j; j++)
			for (int k = 0; delta->_weights[i].J[j].size() >k; k++)
				delta->_weights[i].J[j][k] = 0;
	}
}


//deltaの要素に＋する。
//nablaとdelta_nablaの配列要素は同じものとする。
void network::matrix_plus(Delta *nabla, Delta delta_nabla)
{

	for (int i = 0; nabla->_weights.size() > i; i++) //L層
	{
		for (int j = 0; nabla->_weights.at(i).J.size() > j; j++) //L層のJ番目
		{
			nabla->_biases.at(i).at(j) += delta_nabla._biases.at(i).at(j);
			for (int k = 0; nabla->_weights.at(i).J.at(j).size() > k; k++)//L層のJ番目の重みｋ
				nabla->_weights.at(i).J.at(j).at(k) += delta_nabla._weights.at(i).J.at(j).at(k);
		}
	}
}




vector<MNIST_compact> network::marge(MNIST_dataset* training_data)
{
	cout << "marge()" << endl;
	//一つの構造体にマージ
	vector <MNIST_compact> trainingdata;
	for (int i = 0; training_data->training_images.size() > i; i++)
	{
		MNIST_compact com;
		com.images.push_back(training_data->training_images.at(i));
		com.labels.push_back(training_data->training_labels.at(i));
		com.index = training_data->index.at(i);
		trainingdata.push_back(com);
		if(i % 1000 == 0)fprintf(stderr, "%d\r", i);
	}

	//削除
	training_data->training_images.clear();
	training_data->training_labels.clear();
	training_data->index.clear();

	return trainingdata;
}

//トレーニングデータから選択したテストデータを削除
void network::delete_test_images(vector<MNIST_compact> *traingdata, vector<MNIST_compact> test_data)
{
	cout << "delete_test_images()" << endl;
	for (int i = 0; test_data.size() > i; i++)
	{
		cout << "i=" << i << endl;
		int index = test_data[i].index;
		cout << "index=" << index << endl;
		for (int j = 0; traingdata->size() > j; j++)
		{
			if (traingdata->at(j).index == index) {
				cout << " delete " << index << endl;
				traingdata->erase(traingdata->begin() + j);
				break;
			}
		}
	}
}

network.h

#pragma once
#include "neurons.h"
#include "mnist.h"

/*
  ニューラルネットワークを作成する。
  network net;
  net.network(784, 3, 8, 10);
  とするとInput784,  隠層　3, 8,  Out10　を作成する。

  ２番目の要素からニューロンを作成する
  neurons neu;
  neu._num_layers  //レイヤー　4
  neu._sizes       //          784, 3, 8, 10
  neu._biases();   //バイアス　3,   8,   10
  neu._weights();  //重み      784x3, 3x8, 8x10    

  neu.zs();		   //一つ前のl-1層のactivationsの要素を作成
  neu.sig();	   //一つ前のl-1層のactivationsのsigが入る要素を作成
*/

#include <random>
#include <algorithm>

#include <string>
#include <iostream>
#include <fstream>
#include <vector>
using namespace std;

vector<KATA> makerandom(int howmany);

struct TEST_RESLT {
	int x;
	int y;
};

class network :
	public neurons,mnist
{
public:
	char _strEpochresult[200];

	vector<neurons> _neurons;

	network();
	~network();

	void Network(vector<int> nets);
	//net.SGD(training_data, 3, 10, 3.0, test_data = test_data)
	void SGD(vector<MNIST_compact> training_data,	//訓練データ
			 int epochs,	//世代（学習回数）
			 int mini_batch_size,	//ミニバッチサイズ
			 double eta,	//学習率
			 vector<MNIST_compact> test_data);//テストデータ
	
	//ネットワークの重みと偏りを更新する
	Delta update_mini_batch(vector<MNIST_compact> mini_batch, double eta);

	//Input x=mini.images, Output y=mini.lavel
	Delta backprop(MNIST_compact mini);
	
	//指定L層の、delta更新
	void matrix_Layer_update_delta_c(int Layer, Delta delta, vector<double> *delta_c, vector<double> sigmoidprime2);

	//deltaの計算
	vector<double> cost_derivative(vector<double> out_activation, vector<double> y, vector<double> sigmoidprime);

	//L層のバイアス更新
	void matrix_Layer_update_biases(int Layer, Delta *delta, vector<double> delta_c);

	//L層の重み更新
	void matrix_Layer_update_weights(int Layer, Delta *delta, vector<double> delta_c, vector<double> atctivation_front);
	
	//ネットワークのバイアスと重みを更新
	void matrix_update_wb(Delta delta, double eta, int mini_batch_len);

	//テストデータの評価
	int evaluate(vector<MNIST_compact> test_data);

	//テストデータの評価
	TEST_RESLT feedforward(MNIST_compact com);

	//行列の転置
	Delta matrix_transpose_weight(int Layer, Delta delta);

	//指定したLayer層のActivationを取得
	vector<double> matrix_Layer_Atctivation(int Layer, vector<vector<double>> activations);

	//指定したLayer層のzを取得
	vector<double> matrix_Layer_z(int Layer, vector<vector<double>> z);

	//vector<vector<double>>の型の、指定したLayer層を取得(Activationsとzsから取り出すとき用いる)
	vector<double> matrix_Layer_Cut(int Layer, vector<vector<double>> activations);

	//トレーニングデータの正解をvector<int>型の配列に変換
	vector<double> convert_to_arry(int label);

	//activationの更新
	void matrix_sigmoid(vector<double> *activation, vector<double> z);

	//delta_cの計算で用いるbackfward pass 
	vector<double> matrix_sigmoid_prime( vector<double> zs);

	//シグモイド関数
	double sigmoid(double z);
	
	double sigmoid_prime(double z);

	//deltaの要素に０をセットする。
	void matrix_zeros(Delta *delta);

	//deltaのバイアスと重みに要素に＋する。
	void matrix_plus(Delta *nabla, Delta delta_nabla);

	//L層の　w0*a0+b, w1*a1+b,  w2*a2+b,  wN*aN+b　の計算
	void matrix_dot(int L, vector<double> *z, vector<double> *activation, neurons neuron);

	//トレーニングデータから選択したテストデータを除外
	void delete_test_images(vector<MNIST_compact> *traingdata, vector<MNIST_compact> test_data);

	//トレーニングデータとラベルを一つの構造体にマージ
	vector<MNIST_compact> marge(MNIST_dataset* training_data);

	//重みの表示
	void print_weight(vector<WEIGHT> wei);

	//バイアスの表示
	void print_biase(vector<vector<KATA>> biase);

	//指定L層のバイアスの表示
	void print_biase(int Layer, Delta biase);

	//現在の階層
	int pLayer(int Layer);

//	void print_weight();
//	void print_biase();

};

neurons.cpp

#include "stdafx.h"
#include "neurons.h"


neurons::neurons()
{
}


neurons::~neurons()
{
}

void neurons::Network(char** argv)
{


}

neurons.h

#pragma once
//#include "mystr.h"
/*
 ニューラルネットワークを構成する要素
*/

#include <string>
#include <iostream>

#include <vector>


using namespace std;

#define KATA double

struct WEIGHT
{
	vector<vector<KATA>> J;
};

struct Delta
{
	vector<vector<KATA>> _biases;	//Layer　要素  ２つ目から [array[3],array[8],array[10]]
	vector<WEIGHT> _weights;		//Layer　要素  ２つ目から  [array[3]x784,array[8]x3,array[10]x8]
};


class neurons
{
public:
	/*
	net = network.Network([784, 3, 4, 10])
	バイアスが、３，４、１０個作られている。
	print ("sizes={0}".format( sizes ))
	sizes=[784, 3, 4, 10]
	print ("init len(sizes)={0}".format(len(sizes)))
	init len(sizes)=4
	for y in sizes[1:]:print ("sizes={0}".format(y))
	sizes=3
	sizes=4
	sizes=10


	print (self.biases) #debug add
	[array([[ 1.60680811],
	[-0.91776427],
	[ 2.39356009]]),
	array([[-0.36326466],
	[-0.77759203],
	[ 0.80462692],
	[ 0.55610158]]),
	array([[-2.89399282],
	[ 1.12189747],
	[-0.09063474],
	[ 0.19409676],
	[-0.52739619],
	[-1.41494714],
	[ 0.83719107],
	[ 0.29296509],
	[-0.9796348 ],
	[-0.9919744 ]])]

	net = network.Network([784, 3, 4, 10])
	Weightの場合７８４ｘ３,３ｘ４、４ｘ１０
	print (self.weights)
	[array([
	[-0.51721912, -1.32663246,  0.61860472, ..., -0.14808348,  0.47365758, -1.67055342],
	[-1.82324755,  2.45671693,  -0.4074048 , ..., -1.22299543, -0.60028638, -0.72343723],
	[ 0.13751237, -2.20392184, -0.5635813  , ...,  1.47505309, -1.08143278,  0.55418631]]),
	array([[ 1.01398068,  0.71675172, -0.98611525],
	[-0.91658198, -0.89796351, -0.51228011],
	[ 0.53529015,  0.04383624, -0.41019621],
	[-0.64914817,  1.29586205, -0.95359815]]),
	array([[ 1.49524657, -0.11535072, -0.29128393,  0.60778149],
	[ 0.22230999,  0.63995748, -0.43956315, -1.01877885],
	[-0.37976193,  0.18921609, -1.20175081, -0.42824686],
	[-0.27321251,  0.86174003,  0.51706354, -0.99385186],
	[-0.5942666 , -0.72616386,  0.10107404,  1.81730394],
	[ 1.72687002, -2.42612697,  1.57296988, -0.50415091],
	[-0.36752011, -0.00892462, -1.78011645, -0.98530298],
	[-0.26814078, -1.0660397 , -2.05035737, -0.7745915 ],
	[ 0.34035524,  0.2669593 , -1.22003159, -0.70460095],
	[-2.27451737,  0.94988465, -0.15863418, -2.20939092]])]
	*/
	/*
	1      2  ....   L

	1     W(J,L)k
	2
	:
	:
	J

	重みのWjlkのKは後ろの要素
	*/
	/*
	ニューラルネットワークを作成する。
	network net;
	net.network(784, 3, 8, 10);
	とするとInput784,  隠層　3, 8,  Out10　を作成する。

	２番目の要素からニューロンを作成する
	neurons neu;
	neu._num_layers  //レイヤー　4
	neu._sizes       //          784, 3, 8, 10
	neu._biases();   //バイアス　3,   8,   10
	neu._weights();  //重み      784x3, 3x8, 8x10

	neu.zs();		   //一つ前のl-1層のactivationsの要素を作成
	neu.sig();	   //一つ前のl-1層のactivationsのsigが入る要素を作成
	*/

public:
	//Network[784, 3, 8, 10]の場合
	int _num_layers;				//Layer　　　　4
	vector<int> _sizes;				//Layer　要素　784, 3, 8, 10
	vector<vector<KATA>> _biases;	//Layer　要素  ２つ目から [array[3],array[8],array[10]]
	vector<WEIGHT> _weights;		//Layer　要素  ２つ目から  [array[3]x784,array[8]x3,array[10]x8]

	neurons();
	~neurons();
	void Network(char** argv);
	
};

トレーニングデータの読み込みクラス

mnist.cpp mnist.h

は前回の記事に載せてあるコードをそのまま利用しているので

そちらを参照して下さい。

コードの説明

１．ネットワークの構築

void network::Network(vector<int> nets)

では、ネットワークを構築するための

バイアスと重みの入れ物を作成しています。

この際に初期値として、－１～１の乱数を

std::mt19937 mt(rnddev());

メルセンヌ・ツイスターを使用して作成しています。

Pythonで書かれたコードではガウス分布でした。

２．訓練データの作成

入れ物を作ったら今度は訓練データを読み込みます。

mnistクラスより、６万行の訓練データを読み込んで

vector<MNIST_compact> trainingdata_ = net.marge(&training_data);

にて、画像データ、ラベル、インデックスをMNIST_compactの型に

まとめていれて操作しやすくしました。

訓練データを

std::shuffle(trainingdata_.begin(), trainingdata_.end(), std::mt19937());

シャフルして、指定数の訓練データを取り出します。

for (int i = 0; testdata_n > i; i++) trainingdata.push_back(trainingdata_[i]);

なぜなら６万行だと処理がゲキオモになるのでデバッグする際に

チャチャとやりたいからです。( ﾟДﾟ)

３．テストデータの作成

test_data = mnist.test_images_rnd(trainingdata, n_test);

訓練データからランダムにn_test個抜き出してこれをテストデータとしています。

net.delete_test_images(&trainingdata, test_data);

抜き出したテストデータを、訓練データから除外しています。

４．ニューラルネットワークの実行

net.SGD(trainingdata, 30, 10, 3.0, test_data);

Pythonで書かれたコードでは

net.SGD(training_data, 30, 10, 3.0, test_data = test_data)

に相当する。

５．ニューラルネットワークの処理

void network::SGD(
　　vector<MNIST_compact> training_data, //訓練データ
　　int epochs, //世代（学習回数）
　　int mini_batch_size, //ミニバッチサイズ
　　double eta, //学習率
　　vector<MNIST_compact> test_data) //テストデータ

記事も長くなったので分割します。

参考URL

ニューラルネットワークと深層学習

以上