IMPROVING THE ACCURACY OF VARIABLE SELECTION USING THE WHOLE SOLUTION PATH

IMPROVING THE ACCURACY OF VARIABLE SELECTION USING THE WHOLE SOLUTION PATH

Author: Yang Liu

Publisher:

Published: 2015

Total Pages: 100

ISBN-13:

DOWNLOAD EBOOK

The performances of penalized least squares approaches profoundly depend on the selection of the tuning parameter; however, statisticians did not reach consensus on the criterion for choosing the tuning parameter. Moreover, the penalized least squares estimation that based on a single value of the tuning parameter suffers from several drawbacks. The tuning parameter selected by the traditional selection criteria such as AIC, BIC, CV tends to pick excessive variables, which results in an over-fitting model. On the contrary, many other criteria, such as the extended BIC that favors an over-sparse model, may run the risk of dropping some relevant variables in the model. In the dissertation, a novel approach for the feature selection based on the whole solution paths is proposed, which significantly improves the selection accuracy. The key idea is to partition the variables into the relevant set and the irrelevant set at each tuning parameter, and then select the variables which have been classified as relevant for at least one tuning parameter. The approach is named as Selection by Partitioning the Solution Paths (SPSP). Compared with other existing feature selection approaches, the proposed SPSP algorithm allows feature selection by using a wide class of penalty functions, including Lasso, ridge and other strictly convex penalties. Based on the proposed SPSP procedure, a new type of scores are presented to rank the importance of the variables in the model. The scores, noted as Area-out-of-zero-region Importance Scores (AIS), are defined by the areas between the solution paths and the boundary of the partitions over the whole solution paths. By applying the proposed scores in the stepwise selection, the false positive error of the selection is remarkably reduced. The asymptotic properties for the proposed SPSP estimator have been well established. It is showed that the SPSP estimator is selection consistent when the original estimator is either estimation consistent or selection consistent. Specially, the SPSP approach on the Lasso has been proved to be consistent over the whole solution paths under the irrepresentable condition. Additionally, a number of simulation studies have been conducted to illustrate the performance of the proposed approachs. The comparison between the SPSP algorithm and the existing selection criteria on the Lasso, the adaptive Lasso, the SCAD and the MCP were provided. The results showed the proposed method outperformed the existing variable selection methods in general. Finally, two real data examples of identifying the informative variables in the Boston housing data and the glioblastoma gene expression data are given. Compared with the models selected by other existing approaches, the models selected by the SPSP procedure are much simpler with relatively smaller model errors.


Variable Ranking by Solution-path Algorithms

Variable Ranking by Solution-path Algorithms

Author: Bo Wang

Publisher:

Published: 2011

Total Pages: 40

ISBN-13:

DOWNLOAD EBOOK

Variable Selection has always been a very important problem in statistics. We often meet situations where a huge data set is given and we want to find out the relationship between the response and the corresponding variables. With a huge number of variables, we often end up with a big model even if we delete those that are insignificant. There are two reasons why we are unsatisfied with a final model with too many variables. The first reason is the prediction accuracy. Though the prediction bias might be small under a big model, the variance is usually very high. The second reason is interpretation. With a large number of variables in the model, it's hard to determine a clear relationship and explain the effects of variables we are interested in. A lot of variable selection methods have been proposed. However, one disadvantage of variable selection is that different sizes of model require different tuning parameters in the analysis, which is hard to choose for non-statisticians. Xin and Zhu advocate variable ranking instead of variable selection. Once variables are ranked properly, we can make the selection by adopting a threshold rule. In this thesis, we try to rank the variables using Least Angle Regression (LARS). Some shrinkage methods like Lasso and LARS can shrink the coefficients to zero. The advantage of this kind of methods is that they can give a solution path which describes the order that variables enter the model. This provides an intuitive way to rank variables based on the path. However, Lasso can sometimes be difficult to apply to variable ranking directly. This is because that in a Lasso solution path, variables might enter the model and then get dropped. This dropping issue makes it hard to rank based on the order of entrance. However, LARS, which is a modified version of Lasso, doesn't have this problem. We'll make use of this property and rank variables using LARS solution path.


Pattern Recognition and Image Analysis

Pattern Recognition and Image Analysis

Author: Aythami Morales

Publisher: Springer Nature

Published: 2019-09-21

Total Pages: 657

ISBN-13: 3030313328

DOWNLOAD EBOOK

This 2-volume set constitutes the refereed proceedings of the 9th Iberian Conference on Pattern Recognition and Image Analysis, IbPRIA 2019, held in Madrid, Spain, in July 2019. The 99 papers in these volumes were carefully reviewed and selected from 137 submissions. They are organized in topical sections named: Part I: best ranked papers; machine learning; pattern recognition; image processing and representation. Part II: biometrics; handwriting and document analysis; other applications.


Data Science Live Book

Data Science Live Book

Author: Pablo Casas

Publisher:

Published: 2018-03-16

Total Pages:

ISBN-13: 9789874273666

DOWNLOAD EBOOK

This book is a practical guide to problems that commonly arise when developing a machine learning project. The book's topics are: Exploratory data analysis Data Preparation Selecting best variables Assessing Model Performance More information on predictive modeling will be included soon. This book tries to demonstrate what it says with short and well-explained examples. This is valid for both theoretical and practical aspects (through comments in the code). This book, as well as the development of a data project, is not linear. The chapters are related among them. For example, the missing values chapter can lead to the cardinality reduction in categorical variables. Or you can read the data type chapter and then change the way you deal with missing values. You¿ll find references to other websites so you can expand your study, this book is just another step in the learning journey. It's open-source and can be found at http://livebook.datascienceheroes.com


Fuzzy Systems and Data Mining IV

Fuzzy Systems and Data Mining IV

Author: A.J. Tallón-Ballesteros

Publisher: IOS Press

Published: 2018-11-06

Total Pages: 990

ISBN-13: 1614999279

DOWNLOAD EBOOK

Big Data Analytics is on the rise in the last years of the current decade. Data are overwhelming the computation capacity of high performance servers. Cloud, grid, edge and fog computing are a few examples of the current hype. Computational Intelligence offers two faces to deal with the development of models: on the one hand, the crisp approach, which considers for every variable an exact value and, on the other hand, the fuzzy focus, which copes with values between two boundaries. This book presents 114 papers from the 4th International Conference on Fuzzy Systems and Data Mining (FSDM 2018), held in Bangkok, Thailand, from 16 to 19 November 2018. All papers were carefully reviewed by program committee members, who took into consideration the breadth and depth of the research topics that fall within the scope of FSDM. The acceptance rate was 32.85% . Offering a state-of-the-art overview of fuzzy systems and data mining, the publication will be of interest to all those whose work involves data science.


Handbook of Graphs and Networks

Handbook of Graphs and Networks

Author: Stefan Bornholdt

Publisher: John Wiley & Sons

Published: 2006-03-06

Total Pages: 417

ISBN-13: 3527606335

DOWNLOAD EBOOK

Complex interacting networks are observed in systems from such diverse areas as physics, biology, economics, ecology, and computer science. For example, economic or social interactions often organize themselves in complex network structures. Similar phenomena are observed in traffic flow and in communication networks as the internet. In current problems of the Biosciences, prominent examples are protein networks in the living cell, as well as molecular networks in the genome. On larger scales one finds networks of cells as in neural networks, up to the scale of organisms in ecological food webs. This book defines the field of complex interacting networks in its infancy and presents the dynamics of networks and their structure as a key concept across disciplines. The contributions present common underlying principles of network dynamics and their theoretical description and are of interest to specialists as well as to the non-specialized reader looking for an introduction to this new exciting field. Theoretical concepts include modeling networks as dynamical systems with numerical methods and new graph theoretical methods, but also focus on networks that change their topology as in morphogenesis and self-organization. The authors offer concepts to model network structures and dynamics, focussing on approaches applicable across disciplines.


Encyclopedia of Biopharmaceutical Statistics - Four Volume Set

Encyclopedia of Biopharmaceutical Statistics - Four Volume Set

Author: Shein-Chung Chow

Publisher: CRC Press

Published: 2018-09-03

Total Pages: 2434

ISBN-13: 1351110268

DOWNLOAD EBOOK

Since the publication of the first edition in 2000, there has been an explosive growth of literature in biopharmaceutical research and development of new medicines. This encyclopedia (1) provides a comprehensive and unified presentation of designs and analyses used at different stages of the drug development process, (2) gives a well-balanced summary of current regulatory requirements, and (3) describes recently developed statistical methods in the pharmaceutical sciences. Features of the Fourth Edition: 1. 78 new and revised entries have been added for a total of 308 chapters and a fourth volume has been added to encompass the increased number of chapters. 2. Revised and updated entries reflect changes and recent developments in regulatory requirements for the drug review/approval process and statistical designs and methodologies. 3. Additional topics include multiple-stage adaptive trial design in clinical research, translational medicine, design and analysis of biosimilar drug development, big data analytics, and real world evidence for clinical research and development. 4. A table of contents organized by stages of biopharmaceutical development provides easy access to relevant topics. About the Editor: Shein-Chung Chow, Ph.D. is currently an Associate Director, Office of Biostatistics, U.S. Food and Drug Administration (FDA). Dr. Chow is an Adjunct Professor at Duke University School of Medicine, as well as Adjunct Professor at Duke-NUS, Singapore and North Carolina State University. Dr. Chow is the Editor-in-Chief of the Journal of Biopharmaceutical Statistics and the Chapman & Hall/CRC Biostatistics Book Series and the author of 28 books and over 300 methodology papers. He was elected Fellow of the American Statistical Association in 1995.


Artificial Intelligence for Intrusion Detection Systems

Artificial Intelligence for Intrusion Detection Systems

Author: Mayank Swarnkar

Publisher: CRC Press

Published: 2023-10-11

Total Pages: 241

ISBN-13: 1000967581

DOWNLOAD EBOOK

This book is associated with the cybersecurity issues and provides a wide view of the novel cyber attacks and the defense mechanisms, especially AI-based Intrusion Detection Systems (IDS). Features: • A systematic overview of the state-of-the-art IDS • Proper explanation of novel cyber attacks which are much different from classical cyber attacks • Proper and in-depth discussion of AI in the field of cybersecurity • Introduction to design and architecture of novel AI-based IDS with a trans- parent view of real-time implementations • Covers a wide variety of AI-based cyber defense mechanisms, especially in the field of network-based attacks, IoT-based attacks, multimedia attacks, and blockchain attacks. This book serves as a reference book for scientific investigators who need to analyze IDS, as well as researchers developing methodologies in this field. It may also be used as a textbook for a graduate-level course on information security.


Logic-Based Program Synthesis and Transformation

Logic-Based Program Synthesis and Transformation

Author: Maribel Fernández

Publisher: Springer Nature

Published: 2021-02-12

Total Pages: 345

ISBN-13: 3030684466

DOWNLOAD EBOOK

This book constitutes the refereed proceedings of the 30th International Conference on Logic-Based Program Synthesis and Transformation, LOPSTR 2020, which was held during September 7-9, 2020. The 15 papers presented in this volume were carefully reviewed and selected from a total of 31 submissions. The book also contains two invited talks in full paper length. The contributions were organized in topical sections named: rewriting; unification; types; verification; model checking and probabilistic programming; program analysis and testing; and logics.


Encyclopedia of Information Science and Technology, Fifth Edition

Encyclopedia of Information Science and Technology, Fifth Edition

Author: Khosrow-Pour D.B.A., Mehdi

Publisher: IGI Global

Published: 2020-07-24

Total Pages: 1966

ISBN-13: 1799834808

DOWNLOAD EBOOK

The rise of intelligence and computation within technology has created an eruption of potential applications in numerous professional industries. Techniques such as data analysis, cloud computing, machine learning, and others have altered the traditional processes of various disciplines including healthcare, economics, transportation, and politics. Information technology in today’s world is beginning to uncover opportunities for experts in these fields that they are not yet aware of. The exposure of specific instances in which these devices are being implemented will assist other specialists in how to successfully utilize these transformative tools with the appropriate amount of discretion, safety, and awareness. Considering the level of diverse uses and practices throughout the globe, the fifth edition of the Encyclopedia of Information Science and Technology series continues the enduring legacy set forth by its predecessors as a premier reference that contributes the most cutting-edge concepts and methodologies to the research community. The Encyclopedia of Information Science and Technology, Fifth Edition is a three-volume set that includes 136 original and previously unpublished research chapters that present multidisciplinary research and expert insights into new methods and processes for understanding modern technological tools and their applications as well as emerging theories and ethical controversies surrounding the field of information science. Highlighting a wide range of topics such as natural language processing, decision support systems, and electronic government, this book offers strategies for implementing smart devices and analytics into various professional disciplines. The techniques discussed in this publication are ideal for IT professionals, developers, computer scientists, practitioners, managers, policymakers, engineers, data analysts, and programmers seeking to understand the latest developments within this field and who are looking to apply new tools and policies in their practice. Additionally, academicians, researchers, and students in fields that include but are not limited to software engineering, cybersecurity, information technology, media and communications, urban planning, computer science, healthcare, economics, environmental science, data management, and political science will benefit from the extensive knowledge compiled within this publication.