In this book, you will learn how to use OpenCV, NumPy library and other libraries to perform signal processing, image processing, object detection, and feature extraction with Python GUI (PyQt). You will learn how to filter signals, detect edges and segments, and denoise images with PyQt. You will also learn how to detect objects (face, eye, and mouth) using Haar Cascades and how to detect features on images using Harris Corner Detection, Shi-Tomasi Corner Detector, Scale-Invariant Feature Transform (SIFT), and Features from Accelerated Segment Test (FAST). In Chapter 1, you will learn: Tutorial Steps To Create A Simple GUI Application, Tutorial Steps to Use Radio Button, Tutorial Steps to Group Radio Buttons, Tutorial Steps to Use CheckBox Widget, Tutorial Steps to Use Two CheckBox Groups, Tutorial Steps to Understand Signals and Slots, Tutorial Steps to Convert Data Types, Tutorial Steps to Use Spin Box Widget, Tutorial Steps to Use ScrollBar and Slider, Tutorial Steps to Use List Widget, Tutorial Steps to Select Multiple List Items in One List Widget and Display It in Another List Widget, Tutorial Steps to Insert Item into List Widget, Tutorial Steps to Use Operations on Widget List, Tutorial Steps to Use Combo Box, Tutorial Steps to Use Calendar Widget and Date Edit, and Tutorial Steps to Use Table Widget. In Chapter 2, you will learn: Tutorial Steps To Create A Simple Line Graph, Tutorial Steps To Create A Simple Line Graph in Python GUI, Tutorial Steps To Create A Simple Line Graph in Python GUI: Part 2, Tutorial Steps To Create Two or More Graphs in the Same Axis, Tutorial Steps To Create Two Axes in One Canvas, Tutorial Steps To Use Two Widgets, Tutorial Steps To Use Two Widgets, Each of Which Has Two Axes, Tutorial Steps To Use Axes With Certain Opacity Levels, Tutorial Steps To Choose Line Color From Combo Box, Tutorial Steps To Calculate Fast Fourier Transform, Tutorial Steps To Create GUI For FFT, Tutorial Steps To Create GUI For FFT With Some Other Input Signals, Tutorial Steps To Create GUI For Noisy Signal, Tutorial Steps To Create GUI For Noisy Signal Filtering, and Tutorial Steps To Create GUI For Wav Signal Filtering. In Chapter 3, you will learn: Tutorial Steps To Convert RGB Image Into Grayscale, Tutorial Steps To Convert RGB Image Into YUV Image, Tutorial Steps To Convert RGB Image Into HSV Image, Tutorial Steps To Filter Image, Tutorial Steps To Display Image Histogram, Tutorial Steps To Display Filtered Image Histogram, Tutorial Steps To Filter Image With CheckBoxes, Tutorial Steps To Implement Image Thresholding, and Tutorial Steps To Implement Adaptive Image Thresholding. In Chapter 4, you will learn: Tutorial Steps To Generate And Display Noisy Image, Tutorial Steps To Implement Edge Detection On Image, Tutorial Steps To Implement Image Segmentation Using Multiple Thresholding and K-Means Algorithm, and Tutorial Steps To Implement Image Denoising. In Chapter 5, you will learn: Tutorial Steps To Detect Face, Eye, and Mouth Using Haar Cascades, Tutorial Steps To Detect Face Using Haar Cascades with PyQt, Tutorial Steps To Detect Eye, and Mouth Using Haar Cascades with PyQt, and Tutorial Steps To Extract Detected Objects. In Chapter 6, you will learn: Tutorial Steps To Detect Image Features Using Harris Corner Detection, Tutorial Steps To Detect Image Features Using Shi-Tomasi Corner Detection, Tutorial Steps To Detect Features Using Scale-Invariant Feature Transform (SIFT), and Tutorial Steps To Detect Features Using Features from Accelerated Segment Test (FAST). You can download the XML files from https://viviansiahaan.blogspot.com/2023/06/learn-from-scratch-signal-and-image.html.
In this book, you will learn how to build from scratch a criminal records management database system using Java/PostgreSQL. All Java code for cryptography and digital image processing in this book is Native Java. Intentionally not to rely on external libraries, so that readers know in detail the process of extracting digital images from scratch in Java. There are only three external libraries used in this book: Connector / J to facilitate Java to PostgreSQL connections, JCalendar to display calendar controls, and JFreeChart to display graphics. Digital image techniques to extract image features used in this book are grascaling, sharpening, invertering, blurring, dilation, erosion, closing, opening, vertical prewitt, horizontal prewitt, Laplacian, horizontal sobel, and vertical sobel. For readers, you can develop it to store other advanced image features based on descriptors such as SIFT and others for developing descriptor based matching. In the first chapter, you will learn: How to install NetBeans, JDK 11, and the PostgreSQL connector; How to integrate external libraries into projects; How the basic PostgreSQL commands are used; How to query statements to create databases, create tables, fill tables, and manipulate table contents is done. In the second chapter, you will learn querying data from the postgresql using jdbc including establishing a database connection, creating a statement object, executing the query, processing the resultset object, querying data using a statement that returns multiple rows, querying data using a statement that has parameters, inserting data into a table using jdbc, updating data in postgresql database using jdbc, calling postgresql stored function using jdbc, deleting data from a postgresql table using jdbc, and postgresql jdbc transaction. In the second chapter, you will learn the basics of cryptography using Java. Here, you will learn how to write a Java program to count Hash, MAC (Message Authentication Code), store keys in a KeyStore, generate PrivateKey and PublicKey, encrypt / decrypt data, and generate and verify digital prints. In the third chapter, you will learn how to create and store salt passwords and verify them. You will create a Login table. In this case, you will see how to create a Java GUI using NetBeans to implement it. In addition to the Login table, in this chapter you will also create a Client table. In the case of the Client table, you will learn how to generate and save public and private keys into a database. You will also learn how to encrypt / decrypt data and save the results into a database. In the fourth chapter, you will create an Account table. This account table has the following ten fields: account_id (primary key), client_id (primarykey), account_number, account_date, account_type, plain_balance, cipher_balance, decipher_balance, digital_signature, and signature_verification. In this case, you will learn how to implement generating and verifying digital prints and storing the results into a database. In the fifth chapter, you create a table with the name of the Account, which has ten columns: account_id (primary key), client_id (primarykey), account_number, account_date, account_type, plain_balance, cipher_balance, decipher_balance, digital_signature, and signature_verification. In the sixth chapter, you will create a Client_Data table, which has the following seven fields: client_data_id (primary key), account_id (primary_key), birth_date, address, mother_name, telephone, and photo_path. In the seventh chapter, you will be taught how to create Crime database and its tables. In eighth chapter, you will be taught how to extract image features, utilizing BufferedImage class, in Java GUI. In the nineth chapter, you will be taught to create Java GUI to view, edit, insert, and delete Suspect table data. This table has eleven columns: suspect_id (primary key), suspect_name, birth_date, case_date, report_date, suspect_ status, arrest_date, mother_name, address, telephone, and photo. In the tenth chapter, you will be taught to create Java GUI to view, edit, insert, and delete Feature_Extraction table data. This table has eight columns: feature_id (primary key), suspect_id (foreign key), feature1, feature2, feature3, feature4, feature5, and feature6. In the eleventh chapter, you will add two tables: Police_Station and Investigator. These two tables will later be joined to Suspect table through another table, File_Case, which will be built in the seventh chapter. The Police_Station has six columns: police_station_id (primary key), location, city, province, telephone, and photo. The Investigator has eight columns: investigator_id (primary key), investigator_name, rank, birth_date, gender, address, telephone, and photo. Here, you will design a Java GUI to display, edit, fill, and delete data in both tables. In the twelfth chapter, you will add two tables: Victim and File_Case. The File_Case table will connect four other tables: Suspect, Police_Station, Investigator and Victim. The Victim table has nine columns: victim_id (primary key), victim_name, crime_type, birth_date, crime_date, gender, address, telephone, and photo. The File_Case has seven columns: file_case_id (primary key), suspect_id (foreign key), police_station_id (foreign key), investigator_id (foreign key), victim_id (foreign key), status, and description. Here, you will also design a Java GUI to display, edit, fill, and delete data in both tables. Finally, this book is hopefully useful for you.
"Start from Scratch: Digital Image Processing with Tkinter" is a beginner-friendly guide that delves into the basics of digital image processing using Python and Tkinter, a popular GUI library. The project is divided into distinct modules, each focusing on a specific aspect of image manipulation. The journey begins with an exploration of Image Color Space. Here, readers encounter the Main Form, which serves as the entry point to the application. It provides a user-friendly interface for loading images, selecting color spaces, and visualizing various color channels. The Fundamental Utilities play a crucial role by providing core functionalities like loading images, converting color spaces, and manipulating pixel data. The project also includes forms dedicated to displaying individual color channels and offering insights into the current color space through histograms. The Plotting Utilities module facilitates the creation of visual representations such as plots and graphs, enhancing the user's understanding of color spaces. Moving on, the Image Transformation section introduces readers to techniques like the Fast Fourier Transform (FFT). The Fast Fourier Transform Utilities module enables the implementation of FFT algorithms for converting images from spatial to frequency domains. A corresponding form allows users to view images in the frequency domain, with additional adjustments made to the plotting utilities for effective visualization. In the context of Discrete Cosine Transform (DCT), readers gain insights into algorithms and functions for transforming images. The Form for Discrete Cosine Transform aids in visualizing images in the DCT domain, while the plotting utilities are modified to accommodate these transformed images. The Discrete Sine Transform (DST) section introduces readers to DST algorithms and their role in image transformation. A dedicated form for visualizing images in the DST domain is provided, and the plotting utilities are further extended to handle these transformations effectively. Moving Average Smoothing is another critical aspect covered in the project. The Filter2D Utilities facilitate the application of moving average smoothing techniques. Additionally, metrics utilities enable the assessment of the smoothing process, with forms available for displaying both metrics and the smoothed images. Next, the project addresses Exponential Moving Average techniques, modifying the existing utilities to accommodate this specific approach. Similarly, forms for visualizing results and metrics are provided. Readers are then introduced to techniques like Median Filtering, Savitzky-Golay Filtering, and Wiener Filtering. The Filter2D Utilities are adapted to facilitate these filtering methods, and metrics utilities are employed to assess the effectiveness of each technique. Forms dedicated to each filtering method provide a platform for visualizing the results. The final section of the project explores techniques such as Total Variation Denoising, Non-Local Means Denoising, and PCA Denoising. The Filter2D Utilities are once again modified to support these denoising techniques. Metrics utilities are employed to evaluate the denoising process, and dedicated forms offer visualization capabilities. By breaking down the project into these modules, readers can systematically grasp the fundamentals of digital image processing, gradually building their skills from one concept to the next. Each section provides hands-on experience and practical knowledge, making it an ideal starting point for beginners in image processing.
You will learn to create GUI applications using the Qt toolkit. The Qt toolkit, also popularly known as Qt, is a cross-platform application and UI framework developed by Trolltech, which is used to develop GUI applications. You will develop an existing GUI by adding several Line Edit widgets to read input, which are used to set the range and step of the graph (signal). Next, Now, you can use a widget for each graph. Add another Widget from Containers in gui_graphics.ui using Qt Designer. Then, Now, you can use two Widgets, each of which has two canvases. The two canvases has QVBoxLayout in each Widget. Finally, you will apply those Widgets to display the results of signal and image processing techniques.
PROJECT 1: FULL SOURCE CODE: SQL SERVER FOR STUDENTS AND DATA SCIENTISTS WITH PYTHON GUI In this project, we provide you with the SQL SERVER version of SQLite sample database named chinook. The chinook sample database is a good database for practicing with SQL, especially PostgreSQL. The detailed description of the database can be found on: https://www.sqlitetutorial.net/sqlite-sample-database/. The sample database consists of 11 tables: The employee table stores employees data such as employee id, last name, first name, etc. It also has a field named ReportsTo to specify who reports to whom; customers table stores customers data; invoices & invoice_items tables: these two tables store invoice data. The invoice table stores invoice header data and the invoice_items table stores the invoice line items data; The artist table stores artists data. It is a simple table that contains only the artist id and name; The album table stores data about a list of tracks. Each album belongs to one artist. However, one artist may have multiple albums; The media_type table stores media types such as MPEG audio and AAC audio files; genre table stores music types such as rock, jazz, metal, etc; The track table stores the data of songs. Each track belongs to one album; playlist & playlist_track tables: The playlist table store data about playlists. Each playlist contains a list of tracks. Each track may belong to multiple playlists. The relationship between the playlist table and track table is many-to-many. The playlist_track table is used to reflect this relationship. In this project, you will write Python script to create every table and insert rows of data into each of them. You will develop GUI with PyQt5 to each table in the database. You will also create GUI to plot: case distribution of order date by year, quarter, month, week, and day; the distribution of amount by year, quarter, month, week, day, and hour; the bottom/top 10 sales by employee, the bottom/top 10 sales by customer, the bottom/top 10 sales by customer, the bottom/top 10 sales by artist, the bottom/top 10 sales by genre, the bottom/top 10 sales by play list, the bottom/top 10 sales by customer city, the bottom/top 10 sales by customer city, the bottom/top 10 sales by customer city, the payment amount by month with mean and EWM, the average payment amount by every month, and amount payment in all years. PROJECT 2: FULL SOURCE CODE: SQL SERVER FOR DATA ANALYTICS AND VISUALIZATION WITH PYTHON GUI This book uses SQL SERVER version of MySQL-based Sakila sample database. It is a fictitious database designed to represent a DVD rental store. The tables of the database include film, film_category, actor, customer, rental, payment and inventory among others. The Sakila sample database is intended to provide a standard schema that can be used for examples in books, tutorials, articles, samples, and so forth. Detailed information about the database can be found on website: https://dev.mysql.com/doc/index-other.html. In this project, you will develop GUI using PyQt5 to: read SQL SERVER database and every table in it; read every actor in actor table, read every film in films table; plot case distribution of film release year, film rating, rental duration, and categorize film length; plot rating variable against rental_duration variable in stacked bar plots; plot length variable against rental_duration variable in stacked bar plots; read payment table; plot case distribution of Year, Day, Month, Week, and Quarter of payment; plot which year, month, week, days of week, and quarter have most payment amount; read film list by joining five tables: category, film_category, film_actor, film, and actor; plot case distribution of top 10 and bottom 10 actors; plot which film title have least and most sales; plot which actor have least and most sales; plot which film category have least and most sales; plot case distribution of top 10 and bottom 10 overdue customers; plot which customer have least and most overdue days; plot which store have most sales; plot average payment amount by month with mean and EWM; and plot payment amount over June 2005. PROJECT 3: ZERO TO MASTERY: THE COMPLETE GUIDE TO LEARNING SQL SERVER AND DATA SCIENCE WITH PYTHON GUI In this project, we provide you with a SQL SERVER version of an Oracle sample database named OT which is based on a global fictitious company that sells computer hardware including storage, motherboard, RAM, video card, and CPU. The company maintains the product information such as name, description standard cost, list price, and product line. It also tracks the inventory information for all products including warehouses where products are available. Because the company operates globally, it has warehouses in various locations around the world. The company records all customer information including name, address, and website. Each customer has at least one contact person with detailed information including name, email, and phone. The company also places a credit limit on each customer to limit the amount that customer can owe. Whenever a customer issues a purchase order, a sales order is created in the database with the pending status. When the company ships the order, the order status becomes shipped. In case the customer cancels an order, the order status becomes canceled. In addition to the sales information, the employee data is recorded with some basic information such as name, email, phone, job title, manager, and hire date. In this project, you will write Python script to create every table and insert rows of data into each of them. You will develop GUI with PyQt5 to each table in the database. You will also create GUI to plot: case distribution of order date by year, quarter, month, week, and day; the distribution of amount by year, quarter, month, week, day, and hour; the distribution of bottom 10 sales by product, top 10 sales by product, bottom 10 sales by customer, top 10 sales by customer, bottom 10 sales by category, top 10 sales by category, bottom 10 sales by status, top 10 sales by status, bottom 10 sales by customer city, top 10 sales by customer city, bottom 10 sales by customer state, top 10 sales by customer state, average amount by month with mean and EWM, average amount by every month, amount feature over June 2016, amount feature over 2017, and amount payment in all years.
In this book, you will create two desktop applications using Python GUI and MariaDB. This book is mariadb-based python programming Intentionally designed for various levels of interest and ability of learners, this book is suitable for students, engineers, and even researchers in a variety of disciplines. No advanced programming experience is needed, and only a few school-level programming skill are needed. In the first chapter, you will learn to use several widgets in PyQt5: Display a welcome message; Use the Radio Button widget; Grouping radio buttons; Displays options in the form of a check box; and Display two groups of check boxes. In chapter two, you will learn to use the following topics: Using Signal / Slot Editor; Copy and place text from one Line Edit widget to another; Convert data types and make a simple calculator; Use the Spin Box widget; Use scrollbars and sliders; Using the Widget List; Select a number of list items from one Widget List and display them on another Widget List widget; Add items to the Widget List; Perform operations on the Widget List; Use the Combo Box widget; Displays data selected by the user from the Calendar Widget; Creating a hotel reservation application; and Display tabular data using Table Widgets. In third chapter, you will learn: How to create the initial three tables project in the School database: Teacher, Class, and Subject tables; How to create database configuration files; How to create a Python GUI for inserting and editing tables; How to create a Python GUI to join and query the three tables. In fourth chapter, you will learn how to: Create a main form to connect all forms; Create a project will add three more tables to the school database: Student, Parent, and Tuition tables; Create a Python GUI for inserting and editing tables; Create a Python GUI to join and query over the three tables. In chapter five, you will join the six classes, Teacher, TClass, Subject, Student, Parent, and Tuition and make queries over those tables. In chapter six, you will create dan configure database. In this chapter, you will create Suspect table in crime database. This table has eleven columns: suspect_id (primary key), suspect_name, birth_date, case_date, report_date, suspect_ status, arrest_date, mother_name, address, telephone, and photo. You will also create GUI to display, edit, insert, and delete for this table. In chapter seven, you will create a table with the name Feature_Extraction, which has eight columns: feature_id (primary key), suspect_id (foreign key), feature1, feature2, feature3, feature4, feature5, and feature6. The six fields (except keys) will have a VARCHAR data type (200). You will also create GUI to display, edit, insert, and delete for this table. In chapter eight, you will create two tables, Police and Investigator. The Police table has six columns: police_id (primary key), province, city, address, telephone, and photo. The Investigator table has eight columns: investigator_id (primary key), investigator_name, rank, birth_date, gender, address, telephone, and photo. You will also create GUI to display, edit, insert, and delete for both tables. In chapter nine, you will create two tables, Victim and Case_File. The Victim table has nine columns: victim_id (primary key), victim_name, crime_type, birth_date, crime_date, gender, address, telephone, and photo. The Case_File table has seven columns: case_file_id (primary key), suspect_id (foreign key), police_id (foreign key), investigator_id (foreign key), victim_id (foreign key), status, and description. You will create GUI to display, edit, insert, and delete for both tables as well.
In this book, you will learn how to use Scikit-Learn, TensorFlow, Keras, NumPy, Pandas, Seaborn, and other libraries to implement brain tumor classification and detection with machine learning using Brain Tumor dataset provided by Kaggle. this dataset contains five first order features: Mean (the contribution of individual pixel intensity for the entire image), Variance (used to find how each pixel varies from the neighboring pixel 0, Standard Deviation (the deviation of measured Values or the data from its mean), Skewness (measures of symmetry), and Kurtosis (describes the peak of e.g. a frequency distribution). it also contains eight second order features: Contrast, Energy, ASM (Angular second moment), Entropy, Homogeneity, Dissimilarity, Correlation, and Coarseness. In this project, various methods and functionalities related to machine learning and deep learning are covered. Here is a summary of the process: Data Preprocessing: Loaded and preprocessed the dataset using various techniques such as feature scaling, encoding categorical variables, and splitting the dataset into training and testing sets.; Feature Selection: Implemented feature selection techniques such as SelectKBest, Recursive Feature Elimination, and Principal Component Analysis to select the most relevant features for the model.; Model Training and Evaluation: Trained and evaluated multiple machine learning models such as Random Forest, AdaBoost, Gradient Boosting, Logistic Regression, and Support Vector Machines using cross-validation and hyperparameter tuning. Implemented ensemble methods like Voting Classifier and Stacking Classifier to combine the predictions of multiple models. Calculated evaluation metrics such as accuracy, precision, recall, F1-score, and mean squared error for each model. Visualized the predictions and confusion matrix for the models using plotting techniques.; Deep Learning Model Building and Training: Built deep learning models using architectures such as MobileNet and ResNet50 for image classification tasks. Compiled and trained the models using appropriate loss functions, optimizers, and metrics. Saved the trained models and their training history for future use.; Visualization and Interaction: Implemented methods to plot the training loss and accuracy curves during model training. Created interactive widgets for displaying prediction results and confusion matrices. Linked the selection of prediction options in combo boxes to trigger the corresponding prediction and visualization functions.; Throughout the process, various libraries and frameworks such as scikit-learn, TensorFlow, and Keras are used to perform the tasks efficiently. The overall goal was to train models, evaluate their performance, visualize the results, and provide an interactive experience for the user to explore different prediction options.
Google, officially known as Alphabet Inc., is an American multinational technology company. It was founded in September 1998 by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University. Initially, it started as a research project to develop a search engine, but it rapidly grew into one of the largest and most influential technology companies in the world. Google is primarily known for its internet-related services and products, with its search engine being its most well-known offering. It revolutionized the way people access information by providing a fast and efficient search engine that delivers highly relevant results. Over the years, Google expanded its portfolio to include a wide range of products and services, including Google Maps, Google Drive, Gmail, Google Docs, Google Photos, Google Chrome, YouTube, and many more. In addition to its internet services, Google ventured into hardware with products like the Google Pixel smartphones, Google Home smart speakers, and Google Nest smart home devices. It also developed its own operating system called Android, which has become the most widely used mobile operating system globally. Google's success can be attributed to its ability to monetize its services through online advertising. The company introduced Google AdWords, a highly successful online advertising program that enables businesses to display ads on Google's search engine and other websites through its AdSense program. Advertising contributes significantly to Google's revenue, along with other sources such as cloud services, app sales, and licensing fees. The dataset used in this project starts from 19-Aug-2004 and is updated till 11-Oct-2021. It contains 4317 rows and 7 columns. The columns in the dataset are Date, Open, High, Low, Close, Adj Close, and Volume. You can download the dataset from https://viviansiahaan.blogspot.com/2023/06/google-stock-price-time-series-analysis.html. In this project, you will involve technical indicators such as daily returns, Moving Average Convergence-Divergence (MACD), Relative Strength Index (RSI), Simple Moving Average (SMA), lower and upper bands, and standard deviation. In this book, you will learn how to perform forecasting based on regression on Adj Close price of Google stock price, you will use: Linear Regression, Random Forest regression, Decision Tree regression, Support Vector Machine regression, Naïve Bayes regression, K-Nearest Neighbor regression, Adaboost regression, Gradient Boosting regression, Extreme Gradient Boosting regression, Light Gradient Boosting regression, Catboost regression, MLP regression, Lasso regression, and Ridge regression. The machine learning models used to predict Google daily returns as target variable are K-Nearest Neighbor classifier, Random Forest classifier, Naive Bayes classifier, Logistic Regression classifier, Decision Tree classifier, Support Vector Machine classifier, LGBM classifier, Gradient Boosting classifier, XGB classifier, MLP classifier, and Extra Trees classifier. Finally, you will develop GUI to plot boundary decision, distribution of features, feature importance, predicted values versus true values, confusion matrix, learning curve, performance of the model, and scalability of the model.
Book 1: BANK LOAN STATUS CLASSIFICATION AND PREDICTION USING MACHINE LEARNING WITH PYTHON GUI The dataset used in this project consists of more than 100,000 customers mentioning their loan status, current loan amount, monthly debt, etc. There are 19 features in the dataset. The dataset attributes are as follows: Loan ID, Customer ID, Loan Status, Current Loan Amount, Term, Credit Score, Annual Income, Years in current job, Home Ownership, Purpose, Monthly Debt, Years of Credit History, Months since last delinquent, Number of Open Accounts, Number of Credit Problems, Current Credit Balance, Maximum Open Credit, Bankruptcies, and Tax Liens. The models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, and XGB classifier. Three feature scaling used in machine learning are raw, minmax scaler, and standard scaler. Finally, you will develop a GUI using PyQt5 to plot cross validation score, predicted values versus true values, confusion matrix, learning curve, decision boundaries, performance of the model, scalability of the model, training loss, and training accuracy. Book 2: OPINION MINING AND PREDICTION USING MACHINE LEARNING AND DEEP LEARNING WITH PYTHON GUI Opinion mining (sometimes known as sentiment analysis or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. This dataset was created for the Paper 'From Group to Individual Labels using Deep Features', Kotzias et. al,. KDD 2015. It contains sentences labelled with a positive or negative sentiment. Score is either 1 (for positive) or 0 (for negative). The sentences come from three different websites/fields: imdb.com, amazon.com, and yelp.com. For each website, there exist 500 positive and 500 negative sentences. Those were selected randomly for larger datasets of reviews. Amazon: contains reviews and scores for products sold on amazon.com in the cell phones and accessories category, and is part of the dataset collected by McAuley and Leskovec. Scores are on an integer scale from 1 to 5. Reviews considered with a score of 4 and 5 to be positive, and scores of 1 and 2 to be negative. The data is randomly partitioned into two halves of 50%, one for training and one for testing, with 35,000 documents in each set. IMDb: refers to the IMDb movie review sentiment dataset originally introduced by Maas et al. as a benchmark for sentiment analysis. This dataset contains a total of 100,000 movie reviews posted on imdb.com. There are 50,000 unlabeled reviews and the remaining 50,000 are divided into a set of 25,000 reviews for training and 25,000 reviews for testing. Each of the labeled reviews has a binary sentiment label, either positive or negative. Yelp: refers to the dataset from the Yelp dataset challenge from which we extracted the restaurant reviews. Scores are on an integer scale from 1 to 5. Reviews considered with scores 4 and 5 to be positive, and 1 and 2 to be negative. The data is randomly generated a 50-50 training and testing split, which led to approximately 300,000 documents for each set. Sentences: for each of the datasets above, labels are extracted and manually 1000 sentences are manually labeled from the test set, with 50% positive sentiment and 50% negative sentiment. These sentences are only used to evaluate our instance-level classifier for each dataset3. They are not used for model training, to maintain consistency with our overall goal of learning at a group level and predicting at the instance level. The models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, and XGB classifier. Three feature scaling used in machine learning are raw, minmax scaler, and standard scaler. Finally, you will develop a GUI using PyQt5 to plot cross validation score, predicted values versus true values, confusion matrix, learning curve, decision boundaries, performance of the model, scalability of the model, training loss, and training accuracy. Book 3: EMOTION PREDICTION FROM TEXT USING MACHINE LEARNING AND DEEP LEARNING WITH PYTHON GUI In the dataset used in this project, there are two columns, Text and Emotion. Quite self-explanatory. The Emotion column has various categories ranging from happiness to sadness to love and fear. You will build and implement machine learning and deep learning models which can identify what words denote what emotion. The models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, and XGB classifier. Three feature scaling used in machine learning are raw, minmax scaler, and standard scaler. Finally, you will develop a GUI using PyQt5 to plot cross validation score, predicted values versus true values, confusion matrix, learning curve, decision boundaries, performance of the model, scalability of the model, training loss, and training accuracy. Book 4: HATE SPEECH DETECTION AND SENTIMENT ANALYSIS USING MACHINE LEARNING AND DEEP LEARNING WITH PYTHON GUI The objective of this task is to detect hate speech in tweets. For the sake of simplicity, a tweet contains hate speech if it has a racist or sexist sentiment associated with it. So, the task is to classify racist or sexist tweets from other tweets. Formally, given a training sample of tweets and labels, where label '1' denotes the tweet is racist/sexist and label '0' denotes the tweet is not racist/sexist, the objective is to predict the labels on the test dataset. The models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, XGB classifier, LSTM, and CNN. Three feature scaling used in machine learning are raw, minmax scaler, and standard scaler. Finally, you will develop a GUI using PyQt5 to plot cross validation score, predicted values versus true values, confusion matrix, learning curve, decision boundaries, performance of the model, scalability of the model, training loss, and training accuracy. Book 5: TRAVEL REVIEW RATING CLASSIFICATION AND PREDICTION USING MACHINE LEARNING WITH PYTHON GUI The dataset used in this project has been sourced from the Machine Learning Repository of University of California, Irvine (UC Irvine): Travel Review Ratings Data Set. This dataset is populated by capturing user ratings from Google reviews. Reviews on attractions from 24 categories across Europe are considered. Google user rating ranges from 1 to 5 and average user rating per category is calculated. The attributes in the dataset are as follows: Attribute 1 : Unique user id; Attribute 2 : Average ratings on churches; Attribute 3 : Average ratings on resorts; Attribute 4 : Average ratings on beaches; Attribute 5 : Average ratings on parks; Attribute 6 : Average ratings on theatres; Attribute 7 : Average ratings on museums; Attribute 8 : Average ratings on malls; Attribute 9 : Average ratings on zoo; Attribute 10 : Average ratings on restaurants; Attribute 11 : Average ratings on pubs/bars; Attribute 12 : Average ratings on local services; Attribute 13 : Average ratings on burger/pizza shops; Attribute 14 : Average ratings on hotels/other lodgings; Attribute 15 : Average ratings on juice bars; Attribute 16 : Average ratings on art galleries; Attribute 17 : Average ratings on dance clubs; Attribute 18 : Average ratings on swimming pools; Attribute 19 : Average ratings on gyms; Attribute 20 : Average ratings on bakeries; Attribute 21 : Average ratings on beauty & spas; Attribute 22 : Average ratings on cafes; Attribute 23 : Average ratings on view points; Attribute 24 : Average ratings on monuments; and Attribute 25 : Average ratings on gardens. The models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, XGB classifier, and MLP classifier. Three feature scaling used in machine learning are raw, minmax scaler, and standard scaler. Finally, you will develop a GUI using PyQt5 to plot cross validation score, predicted values versus true values, confusion matrix, learning curve, decision boundaries, performance of the model, scalability of the model, training loss, and training accuracy. Book 6: ONLINE RETAIL CLUSTERING AND PREDICTION USING MACHINE LEARNING WITH PYTHON GUI The dataset used in this project is a transnational dataset which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers. You will be using the online retail transnational dataset to build a RFM clustering and choose the best set of customers which the company should target. In this project, you will perform Cohort analysis and RFM analysis. You will also perform clustering using K-Means to get 5 clusters. The machine learning models used in this project to predict clusters as target variable are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, LGBM, Gradient Boosting, XGB, and MLP. Finally, you will plot boundary decision, distribution of features, feature importance, cross validation score, and predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy.
The data used in this project is the data published by Anurag Sharma about hotel reviews that were given by costumers. The data is given in two files, a train and test. The train.csv is the training data, containing unique User_ID for each entry with the review entered by a costumer and the browser and device used. The target variable is Is_Response, a variable that states whether the costumers was happy or not happy while staying in the hotel. This type of variable makes the project to a classification problem. The test.csv is the testing data, contains similar headings as the train data, without the target variable. The models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, and XGB classifier, and LSTM. Three vectorizers used in machine learning are Hashing Vectorizer, Count Vectorizer, and TFID Vectorizer. Finally, you will develop a GUI using PyQt5 to plot cross validation score, predicted values versus true values, confusion matrix, learning curve, performance of the model, scalability of the model, training loss, and training accuracy.