class LogisticRegression Found at: sklearn.linear_model._logisticclass LogisticRegression(BaseEstimator, LinearClassifierMixin, SparseCoefMixin): """ Logistic Regression (aka logit, MaxEnt) classifier. In the multiclass case, the training algorithm uses the one-vs-rest (OvR) scheme if the 'multi_class' option is set to 'ovr', and uses the cross-entropy loss if the 'multi_class' option is set to 'multinomial'. (Currently the 'multinomial' option is supported only by the 'lbfgs', 'sag', 'saga' and 'newton-cg' solvers.) This class implements regularized logistic regression using the 'liblinear' library, 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers. **Note that regularization is applied by default**. It can handle both dense and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit floats for optimal performance; any other input format will be converted (and copied). The 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization with primal formulation, or no regularization. The 'liblinear' solver supports both L1 and L2 regularization, with a dual formulation only for the L2 penalty. The Elastic-Net regularization is only supported by the 'saga' solver. Read more in the :ref:`User Guide <logistic_regression>`. Parameters ---------- penalty : {'l1', 'l2', 'elasticnet', 'none'}, default='l2' Used to specify the norm used in the penalization. The 'newton-cg', 'sag' and 'lbfgs' solvers support only l2 penalties. 'elasticnet' is only supported by the 'saga' solver. If 'none' (not supported by the liblinear solver), no regularization is applied. .. versionadded:: 0.19 l1 penalty with SAGA solver (allowing 'multinomial' + L1) dual : bool, default=False Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features. tol : float, default=1e-4 Tolerance for stopping criteria. C : float, default=1.0 Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. fit_intercept : bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function. intercept_scaling : float, default=1 Useful only when the solver 'liblinear' is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a "synthetic" feature with constant value equal to intercept_scaling is appended to the instance vector.The intercept becomes ``intercept_scaling * synthetic_feature_weight``. Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased. class_weight : dict or 'balanced', default=None Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified. .. versionadded:: 0.17 *class_weight='balanced'* random_state : int, RandomState instance, default=None Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the data. See :term:`Glossary <random_state>` for details. solver : {'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'}, \ default='lbfgs' Algorithm to use in the optimization problem. - For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones. - For multiclass problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs' handle multinomial loss; 'liblinear' is limited to one-versus-rest schemes. - 'newton-cg', 'lbfgs', 'sag' and 'saga' handle L2 or no penalty - 'liblinear' and 'saga' also handle L1 penalty - 'saga' also supports 'elasticnet' penalty - 'liblinear' does not support setting ``penalty='none'`` Note that 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing. .. versionadded:: 0.17 Stochastic Average Gradient descent solver. .. versionadded:: 0.19 SAGA solver. .. versionchanged:: 0.22 The default solver changed from 'liblinear' to 'lbfgs' in 0.22. max_iter : int, default=100 Maximum number of iterations taken for the solvers to converge. multi_class : {'auto', 'ovr', 'multinomial'}, default='auto' If the option chosen is 'ovr', then a binary problem is fit for each label. For 'multinomial' the loss minimised is the multinomial loss fit across the entire probability distribution, *even when the data is binary*. 'multinomial' is unavailable when solver='liblinear'. 'auto' selects 'ovr' if the data is binary, or if solver='liblinear', and otherwise selects 'multinomial'. .. versionadded:: 0.18 Stochastic Average Gradient descent solver for 'multinomial' case. .. versionchanged:: 0.22 Default changed from 'ovr' to 'auto' in 0.22. verbose : int, default=0 For the liblinear and lbfgs solvers set verbose to any positive number for verbosity. warm_start : bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See :term:`the Glossary <warm_start>`. .. versionadded:: 0.17 *warm_start* to support *lbfgs*, *newton-cg*, *sag*, *saga* solvers. n_jobs : int, default=None Number of CPU cores used when parallelizing over classes if multi_class='ovr'". This parameter is ignored when the ``solver`` is set to 'liblinear' regardless of whether 'multi_class' is specified or not. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary <n_jobs>` for more details. l1_ratio : float, default=None The Elastic-Net mixing parameter, with ``0 <= l1_ratio <= 1``. Only used if ``penalty='elasticnet'``. Setting ``l1_ratio=0`` is equivalent to using ``penalty='l2'``, while setting ``l1_ratio=1`` is equivalent to using ``penalty='l1'``. For ``0 < l1_ratio <1``, the penalty is a combination of L1 and L2. Attributes ---------- classes_ : ndarray of shape (n_classes, ) A list of class labels known to the classifier. coef_ : ndarray of shape (1, n_features) or (n_classes, n_features) Coefficient of the features in the decision function. `coef_` is of shape (1, n_features) when the given problem is binary. In particular, when `multi_class='multinomial'`, `coef_` corresponds to outcome 1 (True) and `-coef_` corresponds to outcome 0 (False). intercept_ : ndarray of shape (1,) or (n_classes,) Intercept (a.k.a. bias) added to the decision function. If `fit_intercept` is set to False, the intercept is set to zero. `intercept_` is of shape (1,) when the given problem is binary. In particular, when `multi_class='multinomial'`, `intercept_` corresponds to outcome 1 (True) and `-intercept_` corresponds to outcome 0 (False). n_iter_ : ndarray of shape (n_classes,) or (1, ) Actual number of iterations for all classes. If binary or multinomial, it returns only 1 element. For liblinear solver, only the maximum number of iteration across all classes is given. .. versionchanged:: 0.20 In SciPy <= 1.0.0 the number of lbfgs iterations may exceed ``max_iter``. ``n_iter_`` will now report at most ``max_iter``. See Also -------- SGDClassifier : Incrementally trained logistic regression (when given the parameter ``loss="log"``). LogisticRegressionCV : Logistic regression with built-in cross validation. Notes ----- The underlying C implementation uses a random number generator to select features when fitting the model. It is thus not uncommon, to have slightly different results for the same input data. If that happens, try with a smaller tol parameter. Predict output may not match that of standalone liblinear in certain cases. See :ref:`differences from liblinear <liblinear_differences>` in the narrative documentation. References ---------- L-BFGS-B -- Software for Large-scale Bound-constrained Optimization Ciyou Zhu, Richard Byrd, Jorge Nocedal and Jose Luis Morales. http://users.iems.northwestern.edu/~nocedal/lbfgsb.html LIBLINEAR -- A Library for Large Linear Classification https://www.csie.ntu.edu.tw/~cjlin/liblinear/ SAG -- Mark Schmidt, Nicolas Le Roux, and Francis Bach Minimizing Finite Sums with the Stochastic Average Gradient https://hal.inria.fr/hal-00860051/document SAGA -- Defazio, A., Bach F. & Lacoste-Julien S. (2014). SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives https://arxiv.org/abs/1407.0202 Hsiang-Fu Yu, Fang-Lan Huang, Chih-Jen Lin (2011). Dual coordinate descent methods for logistic regression and maximum entropy models. Machine Learning 85(1-2):41-75. https://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf
|
|
def __init__(self, penalty='l2', *, dual=False, tol=1e-4, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=100, multi_class='auto', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None): self.penalty = penalty self.dual = dual self.tol = tol self.C = C self.fit_intercept = fit_intercept self.intercept_scaling = intercept_scaling self.class_weight = class_weight self.random_state = random_state self.solver = solver self.max_iter = max_iter self.multi_class = multi_class self.verbose = verbose self.warm_start = warm_start self.n_jobs = n_jobs self.l1_ratio = l1_ratio def fit(self, X, y, sample_weight=None): """ Fit the model according to the given training data.
Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.
y : array-like of shape (n_samples,) Target vector relative to X.
sample_weight : array-like of shape (n_samples,) default=None Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
.. versionadded:: 0.17 *sample_weight* support to LogisticRegression.
Returns ------- self Fitted estimator.
Notes ----- The SAGA solver supports both float64 and float32 bit arrays. """ solver = _check_solver(self.solver, self.penalty, self.dual) if not isinstance(self.C, numbers.Number) or self.C < 0: raise ValueError( "Penalty term must be positive; got (C=%r)" % self.C) if self.penalty == 'elasticnet': if (not isinstance(self.l1_ratio, numbers.Number) or self.l1_ratio < 0 or self.l1_ratio > 1): raise ValueError( "l1_ratio must be between 0 and 1;" " got (l1_ratio=%r)" % self.l1_ratio) elif self.l1_ratio is not None: warnings.warn("l1_ratio parameter is only used when penalty is " "'elasticnet'. Got " "(penalty={})". format(self.penalty)) if self.penalty == 'none': if self.C != 1.0: # default values warnings.warn("Setting penalty='none' will ignore the C and l1_ratio " "parameters") # Note that check for l1_ratio is done right above C_ = np.inf penalty = 'l2' else: C_ = self.C penalty = self.penalty if not isinstance(self.max_iter, numbers.Number) or self.max_iter < 0: raise ValueError("Maximum number of iteration must be positive;" " got (max_iter=%r)" % self.max_iter) if not isinstance(self.tol, numbers.Number) or self.tol < 0: raise ValueError("Tolerance for stopping criteria must be " "positive; got (tol=%r)" % self.tol) if solver == 'lbfgs': _dtype = np.float64 else: _dtype = [np.float64, np.float32] X, y = self._validate_data(X, y, accept_sparse='csr', dtype=_dtype, order="C", accept_large_sparse=solver != 'liblinear') check_classification_targets(y) self.classes_ = np.unique(y) multi_class = _check_multi_class(self.multi_class, solver, len(self.classes_)) if solver == 'liblinear': if effective_n_jobs(self.n_jobs) != 1: warnings.warn("'n_jobs' > 1 does not have any effect when" " 'solver' is set to 'liblinear'. Got 'n_jobs'" " = {}.". format(effective_n_jobs(self.n_jobs))) self.coef_, self.intercept_, n_iter_ = _fit_liblinear(X, y, self.C, self. fit_intercept, self.intercept_scaling, self.class_weight, self.penalty, self. dual, self.verbose, self.max_iter, self.tol, self.random_state, sample_weight=sample_weight) self.n_iter_ = np.array([n_iter_]) return self if solver in ['sag', 'saga']: max_squared_sum = row_norms(X, squared=True).max() else: max_squared_sum = None n_classes = len(self.classes_) classes_ = self.classes_ if n_classes < 2: raise ValueError( "This solver needs samples of at least 2 classes" " in the data, but the data contains only one" " class: %r" % classes_[0]) if len(self.classes_) == 2: n_classes = 1 classes_ = classes_[1:] if self.warm_start: warm_start_coef = getattr(self, 'coef_', None) else: warm_start_coef = None if warm_start_coef is not None and self.fit_intercept: warm_start_coef = np.append(warm_start_coef, self.intercept_[:np.newaxis], axis=1) self.coef_ = list() self.intercept_ = np.zeros(n_classes) # Hack so that we iterate only once for the multinomial case. if multi_class == 'multinomial': classes_ = [None] warm_start_coef = [warm_start_coef] if warm_start_coef is None: warm_start_coef = [None] * n_classes path_func = delayed(_logistic_regression_path) # The SAG solver releases the GIL so it's more efficient to use # threads for this solver. if solver in ['sag', 'saga']: prefer = 'threads' else: prefer = 'processes' fold_coefs_ = Parallel(n_jobs=self.n_jobs, verbose=self.verbose, ** _joblib_parallel_args(prefer=prefer))( path_func(X, y, pos_class=class_, Cs=[C_], l1_ratio=self.l1_ratio, fit_intercept=self.fit_intercept, tol=self.tol, verbose=self.verbose, solver=solver, multi_class=multi_class, max_iter=self.max_iter, class_weight=self.class_weight, check_input=False, random_state=self.random_state, coef=warm_start_coef_, penalty=penalty, max_squared_sum=max_squared_sum, sample_weight=sample_weight) for (class_, warm_start_coef_) in zip(classes_, warm_start_coef)) fold_coefs_, _, n_iter_ = zip(*fold_coefs_) self.n_iter_ = np.asarray(n_iter_, dtype=np.int32)[:0] n_features = X.shape[1] if multi_class == 'multinomial': self.coef_ = fold_coefs_[0][0] else: self.coef_ = np.asarray(fold_coefs_) self.coef_ = self.coef_.reshape(n_classes, n_features + int(self.fit_intercept)) if self.fit_intercept: self.intercept_ = self.coef_[:-1] self.coef_ = self.coef_[::-1] return self def predict_proba(self, X): """ Probability estimates.
The returned estimates for all classes are ordered by the label of classes. For a multi_class problem, if multi_class is set to be "multinomial" the softmax function is used to find the predicted probability of each class. Else use a one-vs-rest approach, i.e calculate the probability of each class assuming it to be positive using the logistic function. and normalize these values across all the classes.
Parameters ---------- X : array-like of shape (n_samples, n_features) Vector to be scored, where `n_samples` is the number of samples and `n_features` is the number of features.
Returns ------- T : array-like of shape (n_samples, n_classes) Returns the probability of the sample for each class in the model, where classes are ordered as they are in ``self.classes_``. """ check_is_fitted(self) ovr = self.multi_class in ["ovr", "warn"] or (self.multi_class == 'auto' and (self.classes_.size <= 2 or self.solver == 'liblinear')) if ovr: return super()._predict_proba_lr(X) else: decision = self.decision_function(X) if decision.ndim == 1: # Workaround for multi_class="multinomial" and binary outcomes # which requires softmax prediction with only a 1D decision. decision_2d = np.c_[-decisiondecision] else: decision_2d = decision return softmax(decision_2d, copy=False) def predict_log_proba(self, X): """ Predict logarithm of probability estimates.
The returned estimates for all classes are ordered by the label of classes. Parameters ---------- X : array-like of shape (n_samples, n_features) Vector to be scored, where `n_samples` is the number of samples and `n_features` is the number of features.
Returns ------- T : array-like of shape (n_samples, n_classes) Returns the log-probability of the sample for each class in the model, where classes are ordered as they are in ``self.classes_``. """ return np.log(self.predict_proba(X))
|
|