ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement LearningPreferred Networks
Introduction of Deep Reinforcement Learning, which was presented at domestic NLP conference.
言語処理学会第24回年次大会(NLP2018) での講演資料です。
http://www.anlp.jp/nlp2018/#tutorial
This document provides an overview of POMDP (Partially Observable Markov Decision Process) and its applications. It first defines the key concepts of POMDP such as states, actions, observations, and belief states. It then uses the classic Tiger problem as an example to illustrate these concepts. The document discusses different approaches to solve POMDP problems, including model-based methods that learn the environment model from data and model-free reinforcement learning methods. Finally, it provides examples of applying POMDP to games like ViZDoom and robot navigation problems.
This document introduces deep reinforcement learning and provides some examples of its applications. It begins with backgrounds on the history of deep learning and reinforcement learning. It then explains the concepts of reinforcement learning, deep learning, and deep reinforcement learning. Some example applications are controlling building sway, optimizing smart grids, and autonomous vehicles. The document also discusses using deep reinforcement learning for robot control and how understanding the principles can help in problem setting.
This document introduces deep reinforcement learning and provides some examples of its applications. It begins with backgrounds on the history of deep learning and reinforcement learning. It then explains the concepts of reinforcement learning, deep learning, and deep reinforcement learning. Some example applications are controlling building sway, optimizing smart grids, and autonomous vehicles. The document also discusses using deep reinforcement learning for robot control and how understanding the principles can help in problem setting.
SMX London 2012 presentation - Understanding Google PenaltiesSimon Penson
This document discusses Google's Penguin algorithm update and the importance of understanding backlinks. It notes that Penguin is a filter that targets spam more aggressively than Panda. The author warns that link penalties from Penguin can override content rules. The rest of the document provides advice on researching anchor text, avoiding low-quality links, and testing backlink profiles to avoid being penalized by Penguin.
The document discusses Gomory's Cutting Plane Method for solving integer programming problems (IPPs). It begins by introducing all-integer linear programs (AILPs) and mixed-integer linear programs (MILPs). It then describes how Gomory's method works by taking the linear programming (LP) relaxation of an IPP, obtaining the fractional solution, deriving a cutting plane constraint, and adding it to strengthen the LP relaxation until an optimal integer solution is found. The key steps are to decompose the LP into basic and non-basic variables, derive cutting plane coefficients from the LP tableau, and add constraints of the form [yij]xj + xBi - yi0 ≤ 0 to eliminate fractional solutions.
The document discusses current trends in supply chain optimization. It notes that logistics costs have increased more than 50% from 2002-2007 due to globalization and long lead times. As a result, sustainability in the face of disruptions has become more important than cost minimization alone. The document also outlines the speaker's research history, including work on the traveling salesman problem, vehicle routing, supply chain optimization, and supply chain risk management. It briefly describes an ongoing project modeling supply chain languages.
Mr. Python asked the God of Python how to tidy up his messy room of toys. The God taught Mr. Python to use the list method to store his toys in boxes that fell from heaven, allowing him to access toys by their position in the list. The God then suggested using a dictionary to allow Mr. Python to immediately retrieve his toys without sorting. Using these new organizational tools, Mr. Python was finally able to efficiently manage his toys.
13. 辞書を用いたワイン製造モデル (2)
x = {}
for j in Blends:
x[j] = model.addVar(vtype="C", name="x[%s]"%j)
model.update()
model.setObjective(quicksum(Profit[j]*x[j] for j in Blends),
GRB.MAXIMIZE)
for i in Grapes:
model.addConstr(quicksum(Use[i,j]*x[j] for j in Blends)
<= Inventory[i], name="use[%s]"%i)
model.optimize()
17. Python での実装(2)
変数オブジェクトの追加
I=range(n) “B” は 0-1 整数 (binary) 変数
J=range(n) ( GRB.BINARY でも良い)
for j in J:
y[j] = model.addVar(vtype="B", name="y[%s]"%j)
for i in I:
x[i,j] =model.addVar( vtype="B",name="x[%s,%s]"%(i,j))
model.update()
目的関数の設定
model.setObjective(quicksum(c[i,j]*x[i,j] for i in I for j in J))
18. Python での実装(3)
for i in I:
model.addConstr(quicksum(x[i,j] for j in J) = = 1, "Assign[%s]"%i)
for j in J:
model.addConstr(x[i,j] <= y[j], "Strong[%s,%s]"%(i,j))
model.addConstr(quicksum(y[j] for j in J) = = k, "k_median")
19. Python での実装(4)
…
model.optimize()
print “Opt.value=”,model.ObjVal
edge=[]
for (i,j) in x:
if x[i,j].X= =1:
edge.append((i,j))
return edge
20. Python での実装(5)
import networkx as NX #networkX module
import matplotlib.pyplot as P #prepare drawing
P.ion()
G = NX.Graph() #graph object
G.add_nodes_from(range(n)) #add nodes
for (i,j) in edge: #add edges
G.add_edge(i,j)
NX.draw(G)
21. 弱い定式化での結果
n=200,k=20
Optimize a model with 401 Rows, 40200 Columns and 80400
NonZeros
中略
Explored 1445 nodes (63581 simplex iterations) in 67.08 seconds
Thread count was 2 (of 2 available processors)
Optimal solution found (tolerance 1.00e-04)
Best objective 1.0180195861e+01, best bound 1.0179189780e+01,
gap 0.0099%
Opt.value= 10.1801958607
23. 強い定式化での結果
Optimize a model with 40401 Rows, 40200 Columns and 160400
NonZeros
中略
Explored 0 nodes (1697 simplex iterations) in 3.33 seconds
( 分枝しないで終了!)
Thread count was 2 (of 2 available processors)
Optimal solution found (tolerance 1.00e-04)
Best objective 1.0180195861e+01, best bound 1.0180195861e+01,
gap 0.0%
Opt.value= 10.1801958607
24. 知見
• Big M を用いない強い定式化が望まし
い.
• この程度の式なら,必要な式のみを切
除平面として追加するような小細工は
必要なし.
(ただし,式の数が増え退化するので
,大規模なLPを高速に解けるソル
バーが前提)
34. AMPL での実装(1)
param n >=0;
set V := 1..n ; # 点集合
set V0 := 2..n; # 出発地点 1 以外の点集合
set A :=V cross V; # 枝集合 = 点集合の直積( 2 つ組)
param c { A } >= 0; # 枝の距離
var x { A } binary ; # 枝を使うとき 1 ,それ以外のとき 0 の 0-1 変
数
var u { V0 } >=1,<=n-1; # 点のポテンシャル
minimize total_cost:
sum {(i,j) in A} c[i,j] * x[i,j];
35. AMPL での実装(2)
Degree1 {i in V}:
sum {(i,j) in A } x[i,j] =1 ; # 出次数制約
Degree2 {i in V}:
sum {(j,i) in A } x[j,i] =1 ; # 入次数制約
MTZ{ (i,j) in A: i != j and j!=1 and i!=1}:
u[i]+1 -(n-1)*(1-x[i,j]) + (n-3)*x[j,i]<=u[j]; # 持ち上げ MTZ 制約
LiftedLB{ i in V0}:
1+(1-x[1,i]) +(n-3)*x[i,1] <= u[i]; # 持ち上げ下界制約
LiftedUB{ i in V0}:
u[i] <=(n-1)-(1-x[i,1])-(n-3)*x[1,i]; # 持ち上げ上界制約
36. 上界と下界の変化
45
(80点, Euclid TSP )
40
強化した式
35
30 でないと...
1日まわして
Obj. Func . Va lue
25
Out of Memory!
20
15
10
5
0
0 50 100 150 200 250 300 350 400
C PU
37. 結果
Optimize a model with 6480 Rows, 6400 Columns and 37762 NonZeros
中略
Cutting planes:
Gomory: 62
Implied bound: 470
MIR: 299
Zero half: 34
Explored 125799 nodes (2799697 simplex iterations) in 359.01 seconds
Optimal solution found (tolerance 1.00e-04)
Best objective 7.4532855108e+00, best bound 7.4525704995e+00, gap 0.0096%
Opt.value= 7.45328551084
44. 上界と下界の変化(標準定式
化) 1 32000
1 30000
1 28000
1 26000
Ob j. Fu n c . Va lu e
1 24000
1 22000
1 20000
1 18000
1 16000
1 14000
1 12000
0 200 400 600 800 1000 1200 1400 1600 1800 2000
C PU
1800 秒で最適解;これ以上大きな問題例は無理!
45. 上界と下界の変化(施設配置定式
1 14050
化)
1 14000
1 13950
Ob j. Fu n c . Va l.
1 13900
1 13850
1 13800
1 13750
1 13700
1 13650
1 13600
0 5 10 15 20 25 30 35 40 45
C PU
40 秒で最適解; T=100 でも大丈夫!
49. Python での実装(1)
from gurobipy import *
model=Model("gcp")
x={}
y={}
for i in range(n):
for k in range(K):
x[i,k]=model.addVar(obj=0, vtype="B",name="x"+str(i)+str(k))
for k in range(K):
y[k]=model.addVar(obj=1,vtype=“B”,name="y"+str(k))
model.update()
50. Python での実装(2)
for i in range(n):
L=LinExpr()
for k in range(K):
L.addTerms(1,x[i,k])
model.addConstr(lhs=L,sense= " = ",rhs=1,name="const"+str(i))
中略
model.optimize()
print "Opt.value=",model.ObjVal
for v in model.getVars():
if v.X>0.001:
print v.VarName,v.X
51. 上界と下界の変化(原定式
化)
12
点数 n=40 ,彩色数上限 Kmax=10
10
Obj. Func . Value
8
6
4
2
0
0 200 400 600 800 1000 1200 1400
CPU Time
Optimize a model with 3820 Rows, 410 Columns and 11740 NonZeros
Explored 17149 nodes (3425130 simplex iterations) in 1321.63 seconds