KOBV Portal

UID:

kobvindex_INTEBC1910044

Format: 1 online resource (415 pages)

Edition: 1st ed.

ISBN: 9780124173071

Note: Front Cover -- Sharing Data and Models in Software Engineering -- Copyright -- Why this book? -- Foreword -- Contents -- List of Figures -- Chapter 1: Introduction -- 1.1 Why Read This Book? -- 1.2 What Do We Mean by ``Sharing''? -- 1.2.1 Sharing Insights -- 1.2.2 Sharing Models -- 1.2.3 Sharing Data -- 1.2.4 Sharing Analysis Methods -- 1.2.5 Types of Sharing -- 1.2.6 Challenges with Sharing -- 1.2.7 How to Share -- 1.3 What? (Our Executive Summary) -- 1.3.1 An Overview -- 1.3.2 More Details -- 1.4 How to Read This Book -- 1.4.1 Data Analysis Patterns -- 1.5 But What About ...? (What Is Not in This Book) -- 1.5.1 What About ``Big Data''? -- 1.5.2 What About Related Work? -- 1.5.3 Why All the Defect Prediction and Effort Estimation? -- 1.6 Who? (About the Authors) -- 1.7 Who Else? (Acknowledgments) -- Part I: Data Mining for Managers -- Chapter 2: Rules for Managers -- 2.1 The Inductive Engineering Manifesto -- 2.2 More Rules -- Chapter 3: Rule #1: Talk to the Users -- 3.1 Users Biases -- 3.2 Data Mining Biases -- 3.3 Can We Avoid Bias? -- 3.4 Managing Biases -- 3.5 Summary -- Chapter 4: Rule #2: Know the Domain -- 4.1 Cautionary Tale #1: ``Discovering'' Random Noise -- 4.2 Cautionary Tale #2: Jumping at Shadows -- 4.3 Cautionary Tale #3: It Pays to Ask -- 4.4 Summary -- Chapter 5: Rule #3: Suspect Your Data -- 5.1 Controlling Data Collection -- 5.2 Problems with Controlled Data Collection -- 5.3 Rinse (and Prune) Before Use -- 5.3.1 Row Pruning -- 5.3.2 Column Pruning -- 5.4 On the Value of Pruning -- 5.5 Summary -- Chapter 6: Rule #4: Data Science Is Cyclic -- 6.1 The Knowledge Discovery Cycle -- 6.2 Evolving Cyclic Development -- 6.2.1 Scouting -- 6.2.2 Surveying -- 6.2.3 Building -- 6.2.4 Effort -- 6.3 Summary -- Part II: Data Mining: A Technical Tutorial -- Chapter 7: Data Mining and SE -- 7.1 Some Definitions -- 7.2 Some Application Areas , 12.6.3.3 Settings -- 12.6.3.4 Chunk (main function) -- 12.6.4 Support Utilities -- 12.6.4.1 Some standard tricks -- 12.6.4.2 Tree iterators -- 12.6.4.3 Pretty printing -- 12.7 Putting It all Together -- 12.7.1 _nasa93 -- 12.8 Using CHUNK -- 12.9 Closing Remarks -- Chapter 13: Cross-Company Learning: Handling the Data Drought -- 13.1 Motivation -- 13.2 Setting the Ground for Analyses -- 13.2.1 Wait ... Is This Really CC Data? -- 13.2.2 Mining the Data -- 13.2.3 Magic Trick: NN Relevancy Filtering -- 13.3 Analysis #1: Can CC Data be Useful for an Organization? -- 13.3.1 Design -- 13.3.2 Results from Analysis #1 -- 13.3.3 Checking the Analysis #1 Results -- 13.3.4 Discussion of Analysis #1 -- 13.4 Analysis #2: How to Cleanup CC Data for Local Tuning? -- 13.4.1 Design -- 13.4.2 Results -- 13.4.3 Discussions -- 13.5 Analysis #3: How Much Local Data Does an Organization Need for a Local Model? -- 13.5.1 Design -- 13.5.2 Results from Analysis #3 -- 13.5.3 Checking the Analysis #3 Results -- 13.5.4 Discussion of Analysis #3 -- 13.6 How Trustworthy Are These Results? -- 13.7 Are These Useful in Practice or Just Number Crunching? -- 13.8 What's New on Cross-Learning? -- 13.8.1 Discussion -- 13.9 What's the Takeaway? -- Chapter 14: Building Smarter Transfer Learners -- 14.1 What Is Actually the Problem? -- 14.2 What Do We Know So Far? -- 14.2.1 Transfer Learning -- 14.2.2 Transfer Learning and SE -- 14.2.3 Data Set Shift -- 14.3 An Example Technology: TEAK -- 14.4 The Details of the Experiments -- 14.4.1 Performance Comparison -- 14.4.2 Performance Measures -- 14.4.3 Retrieval Tendency -- 14.5 Results -- 14.5.1 Performance Comparison -- 14.5.2 Inspecting Selection Tendencies -- 14.6 Discussion -- 14.7 What Are the Takeaways? -- Chapter 15: Sharing Less Data (Is a Good Thing) -- 15.1 Can We Share Less Data? -- 15.2 Using Less Data -- 15.3 Why Share Less Data? , 15.3.1 Less Data Is More Reliable -- 15.3.2 Less Data Is Faster to Discuss -- 15.3.3 Less Data Is Easier to Process -- 15.4 How to Find Less Data -- 15.4.1 Input -- 15.4.2 Comparisons to Other Learners -- 15.4.3 Reporting the Results -- 15.4.4 Discussion of Results -- 15.5 What's Next? -- Chapter 16: How to Keep Your Data Private -- 16.1 Motivation -- 16.2 What Is PPDP and Why Is It Important? -- 16.3 What Is Considered a Breach of Privacy? -- 16.4 How to Avoid Privacy Breaches? -- 16.4.1 Generalization and Suppression -- 16.4.2 Anatomization and Permutation -- 16.4.3 Perturbation -- 16.4.4 Output Perturbation -- 16.5 How Are Privacy-Preserving Algorithms Evaluated? -- 16.5.1 Privacy Metrics -- 16.5.2 Modeling the Background Knowledge of an Attacker -- 16.6 Case Study: Privacy and Cross-Company Defect Prediction -- 16.6.1 Results and Contributions -- 16.6.2 Privacy and CCDP -- 16.6.3 CLIFF -- 16.6.4 MORPH -- 16.6.5 Example of CLIFF& -- MORPH -- 16.6.6 Evaluation Metrics -- 16.6.7 Evaluating Utility via Classification -- 16.6.8 Evaluating Privatization -- 16.6.8.1 Defining privacy -- 16.6.9 Experiments -- 16.6.9.1 Data -- 16.6.10 Design -- 16.6.11 Defect Predictors -- 16.6.12 Query Generator -- 16.6.13 Benchmark Privacy Algorithms -- 16.6.14 Experimental Evaluation -- 16.6.15 Discussion -- 16.6.16 Related Work: Privacy in SE -- 16.6.17 Summary -- Chapter 17: Compensating for Missing Data -- 17.1 Background Notes on SEE and Instance Selection -- 17.1.1 Software Effort Estimation -- 17.1.2 Instance Selection in SEE -- 17.2 Data Sets and Performance Measures -- 17.2.1 Data Sets -- 17.2.2 Error Measures -- 17.3 Experimental Conditions -- 17.3.1 The Algorithms Adopted -- 17.3.2 Proposed Method: POP1 -- 17.3.3 Experiments -- 17.4 Results -- 17.4.1 Results Without Instance Selection -- 17.4.2 Results with Instance Selection -- 17.5 Summary , 21.2 Related Work , Chapter 18: Active Learning: Learning More with Less -- 18.1 How Does the QUICK Algorithm Work? -- 18.1.1 Getting Rid of Similar Features: Synonym Pruning -- 18.1.2 Getting Rid of Dissimilar Instances: Outlier Pruning -- 18.2 Notes on Active Learning -- 18.3 The Application and Implementation Details of QUICK -- 18.3.1 Phase 1: Synonym Pruning -- 18.3.2 Phase 2: Outlier Removal and Estimation -- 18.3.3 Seeing QUICK in Action with a Toy Example -- 18.3.3.1 Phase 1: Synonym pruning -- 18.3.3.2 Phase 2: Outlier removal and estimation -- 18.4 How the Experiments Are Designed -- 18.5 Results -- 18.5.1 Performance -- 18.5.2 Reduction via Synonym and Outlier Pruning -- 18.5.3 Comparison of QUICK vs. CART -- 18.5.4 Detailed Look at the Statistical Analysis -- 18.5.5 Early Results on Defect Data Sets -- 18.6 Summary -- Part IV: Sharing Models -- Chapter 19: Sharing Models: Challenges and Methods -- Chapter 20: Ensembles of Learning Machines -- 20.1 When and Why Ensembles Work -- 20.1.1 Intuition -- 20.1.2 Theoretical Foundation -- 20.2 Bootstrap Aggregating (Bagging) -- 20.2.1 How Bagging Works -- 20.2.2 When and Why Bagging Works -- 20.2.3 Potential Advantages of Bagging for SEE -- 20.3 Regression Trees (RTs) for Bagging -- 20.4 Evaluation Framework -- 20.4.1 Choice of Data Sets and Preprocessing Techniques -- 20.4.1.1 PROMISE data -- 20.4.1.2 ISBSG data -- 20.4.2 Choice of Learning Machines -- 20.4.3 Choice of Evaluation Methods -- 20.4.4 Choice of Parameters -- 20.5 Evaluation of Bagging+RTs in SEE -- 20.5.1 Friedman Ranking -- 20.5.2 Approaches Most Often Ranked First or Second in Terms of MAE, MMRE and PRED(25) -- 20.5.3 Magnitude of Performance Against the Best -- 20.5.4 Discussion -- 20.6 Further Understanding of Bagging+RTs in SEE -- 20.7 Summary -- Chapter 21: How to Adapt Models in a Dynamic World -- 21.1 Cross-Company Data and Questions Tackled , Chapter 8: Defect Prediction -- 8.1 Defect Detection Economics -- 8.2 Static Code Defect Prediction -- 8.2.1 Easy to Use -- 8.2.2 Widely Used -- 8.2.3 Useful -- Chapter 9: Effort Estimation -- 9.1 The Estimation Problem -- 9.2 How to Make Estimates -- 9.2.1 Expert-Based Estimation -- 9.2.2 Model-Based Estimation -- 9.2.3 Hybrid Methods -- Chapter 10: Data Mining (Under the Hood) -- 10.1 Data Carving -- 10.2 About the Data -- 10.3 Cohen Pruning -- 10.4 Discretization -- 10.4.1 Other Discretization Methods -- 10.5 Column Pruning -- 10.6 Row Pruning -- 10.7 Cluster Pruning -- 10.7.1 Advantages of Prototypes -- 10.7.2 Advantages of Clustering -- 10.8 Contrast Pruning -- 10.9 Goal Pruning -- 10.10 Extensions for Continuous Classes -- 10.10.1 How RTs Work -- 10.10.2 Creating Splits for Categorical Input Features -- 10.10.3 Splits on Numeric Input Features -- 10.10.4 Termination Condition and Predictions -- 10.10.5 Potential Advantages of RTs for Software Effort Estimation -- 10.10.6 Predictions for Multiple Numeric Goals -- Part III: Sharing Data -- Chapter 11: Sharing Data: Challenges and Methods -- 11.1 Houston, We Have a Problem -- 11.2 Good News, Everyone -- Chapter 12: Learning Contexts -- 12.1 Background -- 12.2 Manual Methods for Contextualization -- 12.3 Automatic Methods -- 12.4 Other Motivation to Find Contexts -- 12.4.1 Variance Reduction -- 12.4.2 Anomaly Detection -- 12.4.3 Certification Envelopes -- 12.4.4 Incremental Learning -- 12.4.5 Compression -- 12.4.6 Optimization -- 12.5 How to Find Local Regions -- 12.5.1 License -- 12.5.2 Installing CHUNK -- 12.5.3 Testing Your Installation -- 12.5.4 Applying CHUNK to Other Models -- 12.6 Inside CHUNK -- 12.6.1 Roadmap to Functions -- 12.6.2 Distance Calculations -- 12.6.2.1 Normalize -- 12.6.2.2 SquaredDifference -- 12.6.3 Dividing the Data -- 12.6.3.1 FastDiv -- 12.6.3.2 TwoDistantPoints

Additional Edition: Print version Menzies, Tim Sharing Data and Models in Software Engineering San Diego : Elsevier Science & Technology,c2014 ISBN 9780124172951

Language: English

Keywords: Electronic books ; Electronic books

URL: FULL ((OIS Credentials Required))

Kooperativer Bibliotheksverbund

Berlin Brandenburg