Programming, Linear and Nonlinear
Programming, Linear and Nonlinear
FIRST-ORDER OPTIMALITY CONDITIONS
The need for extremal values (maxima or minima) of a function whose variables might have to satisfy certain constraints leads to a problem of optimization. In its most general form, an optimization problem is ordinarily expressed as maximize F(x) subject to x ∊ X, the symbol ∊ standing for “is an element of.” Posing the problem in terms of maximization is not restrictive since min x ∊ X F(x) = –max x ∊ X {– F (x )}.
Such extremal problems are classified according to the mathematical properties of the objective function F and the feasible region X. The optimization problem is said to be feasible if and only if its feasible region is nonempty. This entry will restrict attention to the case where X is a subset of a finite-dimensional vector space, say Euclidean n -space, E n so that x = (x1, x 2, …, x n ) denotes a vector of n variables. Problems of this sort are also called mathematical programs, and their study is called mathematical programming. When X = En, the problem is said to be unconstrained. Finding relative minima and maxima in elementary differential calculus ordinarily deals with unconstrained optimization problems on the real line. In advanced calculus, one encounters multivariate unconstrained optimization. A related problem class is that in which X is the solution set of a system of equations g i (x) = 0, i = 1, m. Symbolically, X = {x: gi (x ) = 0, i = 1,…, m }. When suitable differentiability and linear independence conditions hold, optimization problems of this form can be handled by the method of Lagrange multipliers, an approach dating back to the eighteenth century (long before the term programming was used).
LINEAR PROGRAMMING
Among constrained optimization problems, the linear programming problem is the leading representative. Linear programming (LP) can be described as the problem of maximizing a linear function F over a polyhedron X. This means that the objective function has the form F (x ) = c 1 x 1 + … + c n x n, where c 1, …, cn are given constants, and the polyhedron X is the set of solutions of a system of linear inequalities, say:
In many practical applications, the variables x j are required to satisfy xj ≥ 0. Such a nonnegativity constraint is equivalent to –xj ≤ 0, a special case of (1).
Economics is the foremost social science field of application for LP (and nonlinear programming as well). The optimal allocation of scarce resources is a typical example. For instance, suppose a firm manufactures n products from m resources. Each unit of product j consumes a ij units of resource i, of which there are bi units available. It is assumed that the consumption of the resources by product j is proportional to the level x j of the corresponding production activity; it is assumed further that the consumption effects are additive. In that case, the total amount of resource i consumed by the n production levels x 1, x 2, … x n, is a i 1, x 1 + a i 2 x 2 + … + a in x n , each of these sums must not exceed the amount available, namely b i . The unknown production levels x j are naturally nonnegative. Finally, it is assumed that the aim is to maximize the profit resulting from the production and that it is of the form p 1 x 1 + p 2 x 2 + … + p n x n. Thus, the optimization problem is:
In some instances, it is appropriate to introduce other linear constraints in (2). These could be output requirements a i 1 x 1 + a i 2 x 2 + … + a in x n ∊ b i , i ∊ I 2, or material balance equations a i 1 x 1 + a i 2 x 2 + … + a in x n = b i, i ∊ I 3. What results is still a linear programming problem.
Two historically important cost-minimization problems illustrate further applications of linear programming in economics and planning. The first of these is called the diet problem, which calls for a plan whereby a given set of nutritional requirements can be met from a given set of ingredients at minimum cost. In this case, one has nutritional requirements r 1, r 2, …, rm for vitamins, minerals, calories, and so on. These requirements are to be met by combining appropriate amounts of n ingredients j = 1, …, n having unit costs c 1, …, cn, respectively. Each unit of ingredient j contributes a ij units of nutrient i. The proportionality and additivity of these contributions is assumed. Thereby, the problem can be stated as:
The formulation can be modified by including constraints such as upper bounds on the amounts of individual ingredients. This model is widely applied in making animal feed.
The second of these applications is called the transportation problem:
which arises in the context of shipping but, in fact, finds use in other areas as well. In this case, a single commodity is to be “shipped” from m “origins” to n “destinations.” At origin i = 1, …, m, there is a supply of a i units of the commodity. At destination j = 1,…, n, there is a demand for d j units of the commodity. To meet the demands from the supplies it is necessary that a 1 + … + a m ≥ d 1 + … + d n . The cost of shipping a unit of the commodity from i to j is denoted c ij. Under the assumption of linearity, the cost of shipping x ij units of the commodity from origin i to destination j will be c ij x ij. The total cost is c 11 x 11 + … + cmn x mn. The assignment problem is a special case of the transportation problem in which m = n and a i = d j = 1 for all i and j. Feasibility considerations then force the “≤” inequalities to hold as equations. The assignment problem arises in both minimization and maximization situations.
In addition to the aforementioned applications, one finds the computation of optimal mixed strategies in finite, two-person, zero-sum (matrix) games, problems of optimal blending, electric power dispatch, manpower planning, investment planning, and many more. Such topics are covered in George B. Dantzig’s Linear Programming and Extensions (1963).
Standard Form For computational purposes, the constraints of a linear program are usually expressed as a system of linear equations in nonnegative variables. This is called standard form. Any legitimate linear program can easily be brought to standard form. Thus a i 1 x 1 + … + a in x n ≤ b i becomes a i 1 x 1 + … + a in x n + s i = b i , with s i ≥ 0, and a i 1 x 1 + … + a in x n ≥ b i becomes a i l x 1 + … + a in x n – s i = b i , with s i ≥ 0. Any variable xj that is not sign-restricted can be replaced by the difference of two nonnegative variables: xj=’xj - “j. A variable x j constrained to be nonpositive can be replaced by x̄j = -xj.
Duality To every linear programming problem there corresponds a dual problem constructed from the same data used in a different way. The dual problem is again a linear program whose form depends on that of the given problem, which in this relationship is called the primal problem. Thus, if the primal problem is (2), the dual problem is:
The dual of a linear program in standard form is like (3) except for the fact that its variables y i are free (not sign-restricted).
These (and other) pairs of primal and dual linear programs are intimately linked in the following way.
Duality Theorem of Linear Programming Let X and Y denote the feasible regions of (2) and (3), respectively. Then
for any x ∊ X and any y ∊ Y. If equality holds in (4), then x and y are optimal solutions of their respective programs. Moreover, the primal problem has an optimal solution x̄ if and only if the dual problem has an optimal solution ȳ ; when this is the case, equality holds in (4). If the feasible regions X and Y are both nonempty, then both problems have optimal solutions. If only one of the two problems is feasible, then the objective function of that problem is unbounded in the direction of extremization.
Simplex Method The general formulation of the linear programming problem and the simplex method for its solution were advanced by the American mathematician George B. Dantzig (1914-2005) in 1947. This method continues to be the main algorithm for solving problems of this class.
The simplex method works with linear programs in standard form and takes advantage of the fact that in order to find an optimal solution, it suffices to confine attention to the vertices of the polyhedral feasible region, X. Starting from a known vertex of X, the method generates a path along the edges of X as it moves from one vertex to an adjacent vertex. As long as “cycling” does not occur, the process is guaranteed to terminate after visiting finitely many vertices. When the algorithm finds an optimal solution of the primal problem, it reveals an optimal solution of the dual problem as well. However, it may happen that a vertex of X is not known in advance and will have to be found. Techniques are available for finding a vertex of X (if one exists) and for preventing cycling.
Sensitivity Analysis The optimal value of the objective function of an LP, say (2), depends on the input-output coefficients a ij , the right-hand side constants b i , and the objective function coefficients p j . The study of how changes in these parameters affect the optimal value is called sensitivity analysis or postoptimality analysis. The most frequently used fact emerging from this kind of analysis is that (in the nondegenerate case) the rate of change of the optimal value z with respect to a change in the i th right-hand side parameter b i equals the dual variable y i corresponding to the i th constraint; in symbols ∂z/∂bi = yi.
Large-scale Linear Programming Many practical applications engender very large-scale models, often running to tens or hundreds of thousands of equations and a million or more variables. Solving large-scale problems requires powerful computers and specialized algorithms that often exploit the structure of the coefficient matrix corresponding to the constraints. The decomposition principle of Dantzig and Philip Wolfe (1960) is of this type. It is intended for problems with block-angular structure in which there are many organizational units whose corresponding variables appear in a relatively small number of coupling constraints that represent the central administration and otherwise only in constraints that pertain to that unit.
NONLINEAR PROGRAMMING
An optimization model with an objective function or at least one constraint function that is not linear is called a nonlinear programming problem (NLP).
Nonlinearity can arise in many ways, thereby yielding diverse types of NLPs. The oldest and easiest to comprehend are quadratic programming problems, wherein the constraints are just like those of an LP but the objective function is quadratic (rather than linear). The first to identify (and name) this problem class was the economist Robert Dorfman (1916–2002) in his monograph Application of Linear Programming to the Theory of the Firm (1951). In analyzing production scheduling for monopolized products, the assumption of constant marginal production costs may not hold. As the monopolist’s production increases, it may begin to saturate the market, thereby lowering the price at which products can be sold. Alternatively, the monopolist can restrict production in order to uphold the price of the product. Dorfman’s model covers those cases where the firm produces m monopolized products jointly and where it faces linearly decreasing marginal revenues. When the prices at which the products can be sold are decreasing linear functions of the outputs y i, such as p i (y ) = f i – g i y i , where f i and g i are positive constants, the revenue from the sale of the output vectory = (y1,…,ym) .
The inputs to the n production processes are assumed to be x 1 …, x n measured in monetary units. The input-output relations are assumed to be given by the linear equations
The sum of the x j represents the cost of the production, so that is the profit resulting from an expenditure of on the production. The equations permit one to substitute for the y i thereby obtaining the objective function in terms of the x j alone. The x j might need to satisfy linear constraints, such as (2). The result is then a linearly constrained maximization problem with a quadratic objective function, that is, a quadratic program.
Sometimes optimization problems arise in purely technical settings. For instance, it is often necessary to minimize the distance from a point x̄ to a set X. When the measure of distance is the Euclidean metric and the set in question is defined through a system of linear inequalities, the resulting problem is a quadratic program with objective function:
Convexity and Convex Programming One of the most important concepts in linear and nonlinear programming is that of convexity. A set X is said to be convex if it contains the entire line segment between any two of its points. The line segment between the points u and v in a real vector space is {x : x =λu + (1 – λ)v, for all 0 ≥ λ ≥ 1} . All polyhedral sets (1) are easily seen to be convex. A function f defined on a convex set X is said to be convex there if the inequality f (λu + (1 - λ)v ) ≥ λ f (u ) + (1 - λ)f (v ) holds for all vectors u, v ∊ X and all scalars λ satisfying 0 ≤ λ ≤ 1.
All linear functions are convex. A notable example of a nonlinear convex function is the quadratic function given in (5). The nonnegatively weighted sum of a finite number of convex functions is again convex. A concave function is one whose negative is convex. In minimization problems, it is desirable to have a convex minimand defined on a convex set, in which case the problem is called a convex program. (In the case of maximization problems, one would analogously prefer a concave maximand defined on a convex set. Such problems are called concave programs.) The main reason for this is that local minima (maxima) in convex (concave) programs are global minima (maxima). This property need not hold in so-called nonconvex programs.
FIRST-ORDER OPTIMALITY CONDITIONS
In all optimization problems, it is essential to have a clear, testable definition of what is meant by the (local) optimality of a candidate solution to the problem. Since LPs and NLPs normally have infinitely many feasible solutions, it is important to know what extra conditions must be satisfied by those that are (locally) optimal. In the case of linear programming—where the simplex method is used—one essentially has information that reveals whether passage from a particular vertex to any adjacent vertex will improve the objective function value. If not, the vertex yields an optimal solution. This information is obtained by checking the signs of certain numbers in the algebraic representation of the problem.
In nonlinear programming, the matter is far less simple. A major tool used for finding a local minimum is the following result.
Theorem (Karush-Kuhn-Tucker Theorem) Suppose x̄ = (x̄1, …, x̄ n ) is a local maximum for the nonlinear program
and that the functions F, g 1, …, g m are differentiable at x̄. If the gradients ∇g i (x̄ ) of the constraint functions at x satisfy a suitable regularity condition (one such being linear independence of the gradients), then there exist multipliers y 1, …, ym ≥ 0 such that
Collectively, (7) and (8) are called the Karush-Kuhn-Tucker (KKT) conditions. Because g i (x̄ ) ≥ 0 ≥ yi for each i, the conditions (8) imply that at least one of y i and g i (x̄ ) must be zero. Accordingly, these are called complementary slackness conditions. The stationarity condition (7) then says that, at a local maximum, the gradient of the objective function is a linear combination of the gradients of the constraints for which g i(x̄ ) = 0; the corresponding y i are called generalized Lagrange multipliers.
It bears repeating that when the assumptions of the KKT theorem hold, conditions (7) and (8) necessarily hold. Since these assumptions do hold in most instances, the KKT theorem provides a mechanism for seeking candidate local maxima. In the absence of further hypotheses, it is not guaranteed that a solution of the KKT conditions will be a local maximum of the NLP. However, when the objective function and the constraint functions are all concave, a solution of the KKT conditions must be a global maximum of the optimization problem. In this instance, the KKT conditions are necessary and sufficient for the global optimality of x̄. Beyond the first-order optimality conditions given above, there exist second-order optimality conditions, but the discussion of such matters exceeds the scope of this entry.
Lagrangian Duality Relative to (6) with feasible region X, there is the corresponding Lagrangian function
From the Lagrangian function, one defines the Lagrangian dual problem, which calls for minimizing the function φ(y ) = sup x ∊ X L (x, y ). An immediate consequence of this definition is sup x ∊ X F (x ) ≥ inf y ≥ 0 φ(y ). If F (X ) = φ (y ) where x ∊ X and y ≤ 0, then x and y are optimal for their respective programs. R. Tyrrell Rockafellar’s Convex Analysis (1970) gives a comprehensive theory of duality for convex programming.
Nonlinear Programming Algorithms Nonlinear programs can differ greatly in the mathematical properties possessed (or lacked) by the functions through which they are defined. Examples of such properties include convexity and differentiability. NLPs whose constraint functions are affine, that is, of the form a i 1 x 1 + … + a in x n – b i for i = 1, …, m, constitute a special class of which the quadratic programming problem is a notable member. Another property is that of problem structure: Some are very large and “sparse”; others are small and “dense.” A dense problem is one in which each of the functions actually depends on a high percentage of the total set of variables. Some nonlinear programs include (linear or nonlinear) equations among the constraints. The methods used to solve NLPs tend to take advantage of problem characteristics, and for this reason, differ greatly from one another. Even so, they share some common properties.
A common feature of mathematical programming algorithms is their iterative nature. Starting from a trial solution x 0, a sequence of trial solutions x 1, x 2, … is generated until one of them, say x k satisfies a specified criterion indicating that it is an (approximately) optimal solution or that some other reason for halting the process holds. The steps of the algorithm constitute the process by which one passes from a given iterate to its successor. In some cases, this may entail the solution of one or more optimization subproblems. For linear programming and certain instances of quadratic programming, there are algorithms that generate a finite sequence of trial solutions ending with an optimal solution or evidence that none exists. Algorithms for nonlinear programming are more often convergent than finite. When the KKT theorem is applicable, the matter of finding a solution boils down to solving a system of equations (for which numerical methods are available). But before this can be done, it is necessary to try to identify exactly which equations need to be solved. The optimality conditions, and especially the information provided by Lagrange multipliers, play a large part in algorithmic strategies.
SEE ALSO Hamilton’s Rule; Linear Systems; Manifolds; Matrix Algebra; Maximin Principle; Maximization; Minimization; Nonlinear Systems; Objective Function; Optimizing Behavior
BIBLIOGRAPHY
Dantzig, George B. 1963. Linear Programming and Extensions. Princeton, NJ: Princeton University Press.
Dantzig, George B., and Mukund N. Thapa. 1997. Linear Programming 1: Introduction. New York: Springer.
Dantzig, George B., and Philip Wolfe. 1960. Decomposition Principle for Linear Programs. Operations Research 8: 101-111.
Dorfman, Robert. 1951. Application of Linear Programming to the Theory of the Firm. Berkeley: University of California Press.
Fiacco, Anthony V. 2001. Nonlinear Programming. In Encyclopedia of Operations Research and Management Science, 2nd ed., eds. Saul I. Gass and Carl M. Harris, 572–580. Boston: Kluwer.
Gill, Philip E., Walter Murray, and Margaret H. Wright. 1981. Practical Optimization. London: Academic Press.
Hillier, Frederick S., and Gerald J. Lieberman. 2005. Introduction to Operations Research. 8th ed. Boston: McGraw-Hill.
Nemhauser, George L., Alexander Rinnooy Kan, and Michael J. Todd, eds. 1989. Optimization. Amsterdam: North-Holland.
Nocedal, Jorge, and Stephen J. Wright. 1999. Numerical Optimization. New York: Springer.
Rockafellar, R. Tyrrell. 1970. Convex Analysis. Princeton, NJ: Princeton University Press.
Richard W. Cottle