feed icon rss

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    UID:
    almafu_BV044474245
    Format: 1 Online-Ressource (XXVI, 197 Seiten) : , Illustrationen, Diagramme.
    ISBN: 978-3-319-66299-2
    Series Statement: Lecture Notes in Computer Science 10452
    Additional Edition: Erscheint auch als Druck-Ausgabe ISBN 978-3-319-66298-5
    Language: English
    Subjects: Computer Science
    RVK:
    Keywords: Software Engineering ; Optimierungsproblem ; Evolutionärer Algorithmus ; Softwaretest ; Softwaremetrie ; Konferenzschrift ; Konferenzschrift
    URL: Volltext  (URL des Erstveröffentlichers)
    URL: Volltext  (URL des Erstveröffentlichers)
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 2
    UID:
    almahu_9947420911002882
    Format: 1 online resource (410 pages) : , illustrations (some color), photographs, graphs, tables
    Edition: 1st edition
    ISBN: 0-12-804261-3 , 0-12-804206-0
    Content: Perspectives on Data Science for Software Engineering presents the best practices of seasoned data miners in software engineering. The idea for this book was created during the 2014 conference at Dagstuhl, an invitation-only gathering of leading computer scientists who meet to identify and discuss cutting-edge informatics topics. At the 2014 conference, the concept of how to transfer the knowledge of experts from seasoned software engineers and data scientists to newcomers in the field highlighted many discussions. While there are many books covering data mining and software engineering basics, they present only the fundamentals and lack the perspective that comes from real-world experience. This book offers unique insights into the wisdom of the community’s leaders gathered to share hard-won lessons from the trenches. Ideas are presented in digestible chapters designed to be applicable across many domains. Topics included cover data collection, data sharing, data mining, and how to utilize these techniques in successful software projects. Newcomers to software engineering data science will learn the tips and tricks of the trade, while more experienced data scientists will benefit from war stories that show what traps to avoid. Presents the wisdom of community experts, derived from a summit on software analytics Provides contributed chapters that share discrete ideas and technique from the trenches Covers top areas of concern, including mining security and social data, data visualization, and cloud-based data Presented in clear chapters designed to be applicable across many domains
    Note: Front Cover -- Perspectives on Data Science for Software Engineering -- Copyright -- Contents -- Contributors -- Acknowledgments -- Introduction -- Perspectives on data science for software engineering -- Why This Book? -- About This Book -- The Future -- References -- Software analytics and its application in practice -- Six Perspectives of Software Analytics -- Experiences in Putting Software Analytics into Practice -- References -- Seven principles of inductive software engineering: What we do is different -- Different and Important -- Principle #1: Humans Before Algorithms -- Principle #2: Plan for Scale -- Principle #3: Get Early Feedback -- Principle #4: Be Open Minded -- Principle #5: Be smart with your learning -- Principle #6: Live With the Data You Have -- Principle #7: Develop a Broad Skill Set That Uses a Big Toolkit -- References -- The need for data analysis patterns (in software engineering) -- The Remedy Metaphor -- Software Engineering Data -- Needs of Data Analysis Patterns -- Building Remedies for Data Analysis in Software Engineering Research -- References -- From software data to software theory: The path less traveled -- Pathways of Software Repository Research -- From Observation, to Theory, to Practice -- References -- Why theory matters -- Introduction -- How to Use Theory -- How to build theory -- Constructs -- Propositions -- Explanation -- Scope -- In Summary: Find a Theory or Build One Yourself -- Further Reading -- Success stories/applications -- Mining apps for anomalies -- The Million-Dollar Question -- App Mining -- Detecting Abnormal Behavior -- A Treasure Trove of Data -- ... But Also Obstacles -- Executive Summary -- Further Reading -- Embrace dynamic artifacts -- Can We Minimize the USB Driver Test Suite? -- Yes, Lets Observe Interactions -- Why Did Our Solution Work? -- Still Not Convinced? Heres More. , Dynamic Artifacts Are Here to Stay -- Acknowledgments -- References -- Mobile app store analytics -- Introduction -- Understanding End Users -- Conclusion -- References -- The naturalness of software* -- Introduction -- Transforming Software Practice -- Porting and Translation -- The ``Natural Linguistics´´ of Code -- Analysis and Tools -- Assistive Technologies -- Conclusion -- References -- Advances in release readiness -- Predictive Test Metrics -- Universal Release Criteria Model -- Best Estimation Technique -- Resource/Schedule/Content Model -- Using Models in Release Management -- Research to Implementation: A Difficult (but Rewarding) Journey -- How to tame your online services -- Background -- Service Analysis Studio -- Success Story -- References -- Measuring individual productivity -- No Single and Simple Best Metric for Success/Productivity -- Measure the Process, Not Just the Outcome -- Allow for Measures to Evolve -- Goodharts Law and the Effect of Measuring -- How to Measure Individual Productivity? -- References -- Stack traces reveal attack surfaces -- Another Use of Stack Traces? -- Attack Surface Approximation -- References -- Visual analytics for software engineering data -- References -- Gameplay data plays nicer when divided into cohorts -- Cohort Analysis as a Tool for Gameplay Data -- Play to Lose -- Forming Cohorts -- Case Studies of Gameplay Data -- Challenges of using cohorts -- Summary -- References -- A success story in applying data science in practice -- Overview -- Analytics Process -- Data Collection -- Exploratory Data Analysis -- Model Selection -- Performance Measures and Benefit Analysis -- Communication Process-Best Practices -- Problem Selection -- Managerial Support -- Project Management -- Trusted Relationship -- Summary -- References -- There's never enough time to do all the testing you want. , The Impact of Short Release Cycles (There's Not Enough Time) -- Testing Is More Than Functional Correctness (All the Testing You Want) -- Learn From Your Test Execution History -- Test Effectiveness -- Test Reliability/Not Every Test Failure Points to a Defect -- The Art of Testing Less -- Without Sacrificing Code Quality -- Tests Evolve Over Time -- In Summary -- References -- The perils of energy mining: measure a bunch, compare just once -- A Tale of TWO HTTPs -- Let's energise your software energy experiments -- Environment -- N-Versions -- Energy or Power -- Repeat! -- Granularity -- Idle Measurement -- Statistical Analysis -- Exceptions -- Summary -- References -- Identifying fault-prone files in large industrial software systems -- Acknowledgment -- References -- A tailored suit: The big opportunity in personalizing issue tracking -- Many Choices, Nothing Great -- The Need for Personalization -- Developer Dashboards or ``A Tailored Suit´´ -- Room for Improvement -- References -- What counts is decisions, not numbers-Toward an analytics design sheet -- Decisions Everywhere -- The Decision-Making Process -- The Analytics Design Sheet -- Example: App Store Release Analysis -- References -- A large ecosystem study to understand the effect of programming languages on code quality -- Comparing Languages -- Study Design and Analysis -- Results -- Summary -- References -- Code reviews are not for finding defects-Even established tools need occasional evaluation -- Results -- Effects -- Conclusions -- References -- Techniques -- Interviews -- Why Interview? -- The Interview Guide -- Selecting Interviewees -- Recruitment -- Collecting Background Data -- Conducting the Interview -- Post-Interview Discussion and Notes -- Transcription -- Analysis -- Reporting -- Now Go Interview! -- References -- Look for state transitions in temporal data. , Bikeshedding in Software Engineering -- Summarizing Temporal Data -- Recommendations -- Reference -- Card-sorting: From text to themes -- Preparation Phase -- Execution Phase -- Analysis Phase -- References -- Tools! Tools! We need tools! -- Tools in Science -- The Tools We Need -- Recommendations for Tool Building -- References -- Evidence-based software engineering -- Introduction -- The Aim and Methodology of EBSE -- Contextualizing Evidence -- Strength of Evidence -- Evidence and Theory -- References -- Which machine learning method do you need? -- Learning Styles -- Do additional Data Arrive Over Time? -- Are Changes Likely to Happen Over Time? -- If You Have a Prediction Problem, What Do You Really Need to Predict? -- Do You Have a Prediction Problem Where Unlabeled Data are Abundant and Labeled Data are Expensive? -- Are Your Data Imbalanced? -- Do You Need to Use Data From Different Sources? -- Do You Have Big Data? -- Do You Have Little Data? -- In Summary ... -- References -- Structure your unstructured data first! -- Unstructured Data in Software Engineering -- Summarizing Unstructured Software Data -- As Simple as Possible... But not Simpler! -- You Need Structure! -- Conclusion -- References -- Parse that data! Practical tips for preparing your raw data for analysis -- Use Assertions Everywhere -- Print Information About Broken Records -- Use Sets or Counters to Store Occurrences of Categorical Variables -- Restart Parsing in the Middle of the Data Set -- Test on a Small Subset of Your Data -- Redirect Stdout and Stderr to Log Files -- Store Raw Data Alongside Cleaned Data -- Finally, Write a Verifier Program to Check the Integrity of Your Cleaned Data -- Natural language processing is no free lunch -- Natural Language Data in Software Projects -- Natural Language Processing -- How to Apply NLP to Software Projects -- Do Stemming First. , Check the Level of Abstraction -- Dont Expect Magic -- Dont Discard Manual Analysis of Textual Data -- Summary -- References -- Aggregating empirical evidence for more trustworthy decisions -- What's Evidence? -- What Does Data From Empirical Studies Look Like? -- The Evidence-Based Paradigm and Systematic Reviews -- How Far Can We Use the Outcomes From Systematic Review to Make Decisions? -- References -- If it is software engineering, it is (probably) a Bayesian factor -- Causing the Future With Bayesian Networks -- The Need for a Hybrid Approach in Software Analytics -- Use the Methodology, Not the Model -- References -- Becoming Goldilocks: Privacy and data sharing in ``just right´´ conditions -- The ``Data Drought´´ -- Change is Good -- Dont Share Everything -- Share Your Leaders -- Summary -- Acknowledgments -- References -- The wisdom of the crowds in predictive modeling for software engineering -- The Wisdom of the Crowds -- So... How is That Related to Predictive Modeling for Software Engineering? -- Examples of Ensembles and Factors Affecting Their Accuracy -- Crowds for transferring knowledge and dealing with changes -- Crowds for Multiple Goals -- A Crowd of Insights -- Ensembles as Versatile Tools -- References -- Combining quantitative and qualitative methods (when mining software data) -- Prologue: We Have Solid Empirical Evidence! -- Correlation is Not Causation and, Even If We Can Claim Causation... -- Collect your data: People and artifacts -- Source 1: Dig Into Software Artifacts and Data -- ...but be careful about noise and incompleteness! -- Source 2: Getting Feedback From Developers -- ...and dont be afraid if you collect very little data! -- How Much to Analyze, and How? -- Build a theory upon your data -- Conclusion: The Truth is Out There! -- Suggested Readings -- References. , A process for surviving survey design and sailing through survey deployment.
    Language: English
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 3
    Online Resource
    Online Resource
    Waltham, Massachusetts :Morgan Kaufmann,
    UID:
    almahu_9948026285202882
    Format: 1 online resource (415 pages) : , illustrations (some color), graphs
    Edition: First edition.
    ISBN: 0-12-417307-1
    Content: Data Science for Software Engineering: Sharing Data and Models presents guidance and procedures for reusing data and models between projects to produce results that are useful and relevant. Starting with a background section of practical lessons and warnings for beginner data scientists for software engineering, this edited volume proceeds to identify critical questions of contemporary software engineering related to data and models. Learn how to adapt data from other organizations to local problems, mine privatized data, prune spurious information, simplify complex results, how to update models for new platforms, and more. Chapters share largely applicable experimental results discussed with the blend of practitioner focused domain expertise, with commentary that highlights the methods that are most useful, and applicable to the widest range of projects. Each chapter is written by a prominent expert and offers a state-of-the-art solution to an identified problem facing data scientists in software engineering. Throughout, the editors share best practices collected from their experience training software engineering students and practitioners to master data science, and highlight the methods that are most useful, and applicable to the widest range of projects. Shares the specific experience of leading researchers and techniques developed to handle data problems in the realm of software engineering Explains how to start a project of data science for software engineering as well as how to identify and avoid likely pitfalls Provides a wide range of useful qualitative and quantitative principles ranging from very simple to cutting edge research Addresses current challenges with software engineering data such as lack of local data, access issues due to data privacy, increasing data quality via cleaning of spurious chunks in data
    Note: Bibliographic Level Mode of Issuance: Monograph , Front Cover -- Sharing Data and Models in Software Engineering -- Copyright -- Why this book? -- Foreword -- Contents -- List of Figures -- Chapter 1: Introduction -- 1.1 Why Read This Book? -- 1.2 What Do We Mean by ``Sharing''? -- 1.2.1 Sharing Insights -- 1.2.2 Sharing Models -- 1.2.3 Sharing Data -- 1.2.4 Sharing Analysis Methods -- 1.2.5 Types of Sharing -- 1.2.6 Challenges with Sharing -- 1.2.7 How to Share -- 1.3 What? (Our Executive Summary) -- 1.3.1 An Overview -- 1.3.2 More Details -- 1.4 How to Read This Book -- 1.4.1 Data Analysis Patterns -- 1.5 But What About …? (What Is Not in This Book) -- 1.5.1 What About ``Big Data''? -- 1.5.2 What About Related Work? -- 1.5.3 Why All the Defect Prediction and Effort Estimation? -- 1.6 Who? (About the Authors) -- 1.7 Who Else? (Acknowledgments) -- Part I: Data Mining for Managers -- Chapter 2: Rules for Managers -- 2.1 The Inductive Engineering Manifesto -- 2.2 More Rules -- Chapter 3: Rule #1: Talk to the Users -- 3.1 Users Biases -- 3.2 Data Mining Biases -- 3.3 Can We Avoid Bias? -- 3.4 Managing Biases -- 3.5 Summary -- Chapter 4: Rule #2: Know the Domain -- 4.1 Cautionary Tale #1: ``Discovering'' Random Noise -- 4.2 Cautionary Tale #2: Jumping at Shadows -- 4.3 Cautionary Tale #3: It Pays to Ask -- 4.4 Summary -- Chapter 5: Rule #3: Suspect Your Data -- 5.1 Controlling Data Collection -- 5.2 Problems with Controlled Data Collection -- 5.3 Rinse (and Prune) Before Use -- 5.3.1 Row Pruning -- 5.3.2 Column Pruning -- 5.4 On the Value of Pruning -- 5.5 Summary -- Chapter 6: Rule #4: Data Science Is Cyclic -- 6.1 The Knowledge Discovery Cycle -- 6.2 Evolving Cyclic Development -- 6.2.1 Scouting -- 6.2.2 Surveying -- 6.2.3 Building -- 6.2.4 Effort -- 6.3 Summary -- Part II: Data Mining: A Technical Tutorial -- Chapter 7: Data Mining and SE -- 7.1 Some Definitions -- 7.2 Some Application Areas. , Chapter 8: Defect Prediction -- 8.1 Defect Detection Economics -- 8.2 Static Code Defect Prediction -- 8.2.1 Easy to Use -- 8.2.2 Widely Used -- 8.2.3 Useful -- Chapter 9: Effort Estimation -- 9.1 The Estimation Problem -- 9.2 How to Make Estimates -- 9.2.1 Expert-Based Estimation -- 9.2.2 Model-Based Estimation -- 9.2.3 Hybrid Methods -- Chapter 10: Data Mining (Under the Hood) -- 10.1 Data Carving -- 10.2 About the Data -- 10.3 Cohen Pruning -- 10.4 Discretization -- 10.4.1 Other Discretization Methods -- 10.5 Column Pruning -- 10.6 Row Pruning -- 10.7 Cluster Pruning -- 10.7.1 Advantages of Prototypes -- 10.7.2 Advantages of Clustering -- 10.8 Contrast Pruning -- 10.9 Goal Pruning -- 10.10 Extensions for Continuous Classes -- 10.10.1 How RTs Work -- 10.10.2 Creating Splits for Categorical Input Features -- 10.10.3 Splits on Numeric Input Features -- 10.10.4 Termination Condition and Predictions -- 10.10.5 Potential Advantages of RTs for Software Effort Estimation -- 10.10.6 Predictions for Multiple Numeric Goals -- Part III: Sharing Data -- Chapter 11: Sharing Data: Challenges and Methods -- 11.1 Houston, We Have a Problem -- 11.2 Good News, Everyone -- Chapter 12: Learning Contexts -- 12.1 Background -- 12.2 Manual Methods for Contextualization -- 12.3 Automatic Methods -- 12.4 Other Motivation to Find Contexts -- 12.4.1 Variance Reduction -- 12.4.2 Anomaly Detection -- 12.4.3 Certification Envelopes -- 12.4.4 Incremental Learning -- 12.4.5 Compression -- 12.4.6 Optimization -- 12.5 How to Find Local Regions -- 12.5.1 License -- 12.5.2 Installing CHUNK -- 12.5.3 Testing Your Installation -- 12.5.4 Applying CHUNK to Other Models -- 12.6 Inside CHUNK -- 12.6.1 Roadmap to Functions -- 12.6.2 Distance Calculations -- 12.6.2.1 Normalize -- 12.6.2.2 SquaredDifference -- 12.6.3 Dividing the Data -- 12.6.3.1 FastDiv -- 12.6.3.2 TwoDistantPoints. , 12.6.3.3 Settings -- 12.6.3.4 Chunk (main function) -- 12.6.4 Support Utilities -- 12.6.4.1 Some standard tricks -- 12.6.4.2 Tree iterators -- 12.6.4.3 Pretty printing -- 12.7 Putting It all Together -- 12.7.1 _nasa93 -- 12.8 Using CHUNK -- 12.9 Closing Remarks -- Chapter 13: Cross-Company Learning: Handling the Data Drought -- 13.1 Motivation -- 13.2 Setting the Ground for Analyses -- 13.2.1 Wait … Is This Really CC Data? -- 13.2.2 Mining the Data -- 13.2.3 Magic Trick: NN Relevancy Filtering -- 13.3 Analysis #1: Can CC Data be Useful for an Organization? -- 13.3.1 Design -- 13.3.2 Results from Analysis #1 -- 13.3.3 Checking the Analysis #1 Results -- 13.3.4 Discussion of Analysis #1 -- 13.4 Analysis #2: How to Cleanup CC Data for Local Tuning? -- 13.4.1 Design -- 13.4.2 Results -- 13.4.3 Discussions -- 13.5 Analysis #3: How Much Local Data Does an Organization Need for a Local Model? -- 13.5.1 Design -- 13.5.2 Results from Analysis #3 -- 13.5.3 Checking the Analysis #3 Results -- 13.5.4 Discussion of Analysis #3 -- 13.6 How Trustworthy Are These Results? -- 13.7 Are These Useful in Practice or Just Number Crunching? -- 13.8 What's New on Cross-Learning? -- 13.8.1 Discussion -- 13.9 What's the Takeaway? -- Chapter 14: Building Smarter Transfer Learners -- 14.1 What Is Actually the Problem? -- 14.2 What Do We Know So Far? -- 14.2.1 Transfer Learning -- 14.2.2 Transfer Learning and SE -- 14.2.3 Data Set Shift -- 14.3 An Example Technology: TEAK -- 14.4 The Details of the Experiments -- 14.4.1 Performance Comparison -- 14.4.2 Performance Measures -- 14.4.3 Retrieval Tendency -- 14.5 Results -- 14.5.1 Performance Comparison -- 14.5.2 Inspecting Selection Tendencies -- 14.6 Discussion -- 14.7 What Are the Takeaways? -- Chapter 15: Sharing Less Data (Is a Good Thing) -- 15.1 Can We Share Less Data? -- 15.2 Using Less Data -- 15.3 Why Share Less Data?. , 15.3.1 Less Data Is More Reliable -- 15.3.2 Less Data Is Faster to Discuss -- 15.3.3 Less Data Is Easier to Process -- 15.4 How to Find Less Data -- 15.4.1 Input -- 15.4.2 Comparisons to Other Learners -- 15.4.3 Reporting the Results -- 15.4.4 Discussion of Results -- 15.5 What's Next? -- Chapter 16: How to Keep Your Data Private -- 16.1 Motivation -- 16.2 What Is PPDP and Why Is It Important? -- 16.3 What Is Considered a Breach of Privacy? -- 16.4 How to Avoid Privacy Breaches? -- 16.4.1 Generalization and Suppression -- 16.4.2 Anatomization and Permutation -- 16.4.3 Perturbation -- 16.4.4 Output Perturbation -- 16.5 How Are Privacy-Preserving Algorithms Evaluated? -- 16.5.1 Privacy Metrics -- 16.5.2 Modeling the Background Knowledge of an Attacker -- 16.6 Case Study: Privacy and Cross-Company Defect Prediction -- 16.6.1 Results and Contributions -- 16.6.2 Privacy and CCDP -- 16.6.3 CLIFF -- 16.6.4 MORPH -- 16.6.5 Example of CLIFF& -- MORPH -- 16.6.6 Evaluation Metrics -- 16.6.7 Evaluating Utility via Classification -- 16.6.8 Evaluating Privatization -- 16.6.8.1 Defining privacy -- 16.6.9 Experiments -- 16.6.9.1 Data -- 16.6.10 Design -- 16.6.11 Defect Predictors -- 16.6.12 Query Generator -- 16.6.13 Benchmark Privacy Algorithms -- 16.6.14 Experimental Evaluation -- 16.6.15 Discussion -- 16.6.16 Related Work: Privacy in SE -- 16.6.17 Summary -- Chapter 17: Compensating for Missing Data -- 17.1 Background Notes on SEE and Instance Selection -- 17.1.1 Software Effort Estimation -- 17.1.2 Instance Selection in SEE -- 17.2 Data Sets and Performance Measures -- 17.2.1 Data Sets -- 17.2.2 Error Measures -- 17.3 Experimental Conditions -- 17.3.1 The Algorithms Adopted -- 17.3.2 Proposed Method: POP1 -- 17.3.3 Experiments -- 17.4 Results -- 17.4.1 Results Without Instance Selection -- 17.4.2 Results with Instance Selection -- 17.5 Summary. , Chapter 18: Active Learning: Learning More with Less -- 18.1 How Does the QUICK Algorithm Work? -- 18.1.1 Getting Rid of Similar Features: Synonym Pruning -- 18.1.2 Getting Rid of Dissimilar Instances: Outlier Pruning -- 18.2 Notes on Active Learning -- 18.3 The Application and Implementation Details of QUICK -- 18.3.1 Phase 1: Synonym Pruning -- 18.3.2 Phase 2: Outlier Removal and Estimation -- 18.3.3 Seeing QUICK in Action with a Toy Example -- 18.3.3.1 Phase 1: Synonym pruning -- 18.3.3.2 Phase 2: Outlier removal and estimation -- 18.4 How the Experiments Are Designed -- 18.5 Results -- 18.5.1 Performance -- 18.5.2 Reduction via Synonym and Outlier Pruning -- 18.5.3 Comparison of QUICK vs. CART -- 18.5.4 Detailed Look at the Statistical Analysis -- 18.5.5 Early Results on Defect Data Sets -- 18.6 Summary -- Part IV: Sharing Models -- Chapter 19: Sharing Models: Challenges and Methods -- Chapter 20: Ensembles of Learning Machines -- 20.1 When and Why Ensembles Work -- 20.1.1 Intuition -- 20.1.2 Theoretical Foundation -- 20.2 Bootstrap Aggregating (Bagging) -- 20.2.1 How Bagging Works -- 20.2.2 When and Why Bagging Works -- 20.2.3 Potential Advantages of Bagging for SEE -- 20.3 Regression Trees (RTs) for Bagging -- 20.4 Evaluation Framework -- 20.4.1 Choice of Data Sets and Preprocessing Techniques -- 20.4.1.1 PROMISE data -- 20.4.1.2 ISBSG data -- 20.4.2 Choice of Learning Machines -- 20.4.3 Choice of Evaluation Methods -- 20.4.4 Choice of Parameters -- 20.5 Evaluation of Bagging+RTs in SEE -- 20.5.1 Friedman Ranking -- 20.5.2 Approaches Most Often Ranked First or Second in Terms of MAE, MMRE and PRED(25) -- 20.5.3 Magnitude of Performance Against the Best -- 20.5.4 Discussion -- 20.6 Further Understanding of Bagging+RTs in SEE -- 20.7 Summary -- Chapter 21: How to Adapt Models in a Dynamic World -- 21.1 Cross-Company Data and Questions Tackled. , 21.2 Related Work. , English
    Additional Edition: ISBN 1-336-00889-X
    Additional Edition: ISBN 0-12-417295-4
    Language: English
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 4
    Online Resource
    Online Resource
    Waltham, Massachusetts :Morgan Kaufmann,
    UID:
    edoccha_9960073788002883
    Format: 1 online resource (415 pages) : , illustrations (some color), graphs
    Edition: First edition.
    ISBN: 0-12-417307-1
    Content: Data Science for Software Engineering: Sharing Data and Models presents guidance and procedures for reusing data and models between projects to produce results that are useful and relevant. Starting with a background section of practical lessons and warnings for beginner data scientists for software engineering, this edited volume proceeds to identify critical questions of contemporary software engineering related to data and models. Learn how to adapt data from other organizations to local problems, mine privatized data, prune spurious information, simplify complex results, how to update models for new platforms, and more. Chapters share largely applicable experimental results discussed with the blend of practitioner focused domain expertise, with commentary that highlights the methods that are most useful, and applicable to the widest range of projects. Each chapter is written by a prominent expert and offers a state-of-the-art solution to an identified problem facing data scientists in software engineering. Throughout, the editors share best practices collected from their experience training software engineering students and practitioners to master data science, and highlight the methods that are most useful, and applicable to the widest range of projects. Shares the specific experience of leading researchers and techniques developed to handle data problems in the realm of software engineering Explains how to start a project of data science for software engineering as well as how to identify and avoid likely pitfalls Provides a wide range of useful qualitative and quantitative principles ranging from very simple to cutting edge research Addresses current challenges with software engineering data such as lack of local data, access issues due to data privacy, increasing data quality via cleaning of spurious chunks in data
    Note: Bibliographic Level Mode of Issuance: Monograph , Front Cover -- Sharing Data and Models in Software Engineering -- Copyright -- Why this book? -- Foreword -- Contents -- List of Figures -- Chapter 1: Introduction -- 1.1 Why Read This Book? -- 1.2 What Do We Mean by ``Sharing''? -- 1.2.1 Sharing Insights -- 1.2.2 Sharing Models -- 1.2.3 Sharing Data -- 1.2.4 Sharing Analysis Methods -- 1.2.5 Types of Sharing -- 1.2.6 Challenges with Sharing -- 1.2.7 How to Share -- 1.3 What? (Our Executive Summary) -- 1.3.1 An Overview -- 1.3.2 More Details -- 1.4 How to Read This Book -- 1.4.1 Data Analysis Patterns -- 1.5 But What About …? (What Is Not in This Book) -- 1.5.1 What About ``Big Data''? -- 1.5.2 What About Related Work? -- 1.5.3 Why All the Defect Prediction and Effort Estimation? -- 1.6 Who? (About the Authors) -- 1.7 Who Else? (Acknowledgments) -- Part I: Data Mining for Managers -- Chapter 2: Rules for Managers -- 2.1 The Inductive Engineering Manifesto -- 2.2 More Rules -- Chapter 3: Rule #1: Talk to the Users -- 3.1 Users Biases -- 3.2 Data Mining Biases -- 3.3 Can We Avoid Bias? -- 3.4 Managing Biases -- 3.5 Summary -- Chapter 4: Rule #2: Know the Domain -- 4.1 Cautionary Tale #1: ``Discovering'' Random Noise -- 4.2 Cautionary Tale #2: Jumping at Shadows -- 4.3 Cautionary Tale #3: It Pays to Ask -- 4.4 Summary -- Chapter 5: Rule #3: Suspect Your Data -- 5.1 Controlling Data Collection -- 5.2 Problems with Controlled Data Collection -- 5.3 Rinse (and Prune) Before Use -- 5.3.1 Row Pruning -- 5.3.2 Column Pruning -- 5.4 On the Value of Pruning -- 5.5 Summary -- Chapter 6: Rule #4: Data Science Is Cyclic -- 6.1 The Knowledge Discovery Cycle -- 6.2 Evolving Cyclic Development -- 6.2.1 Scouting -- 6.2.2 Surveying -- 6.2.3 Building -- 6.2.4 Effort -- 6.3 Summary -- Part II: Data Mining: A Technical Tutorial -- Chapter 7: Data Mining and SE -- 7.1 Some Definitions -- 7.2 Some Application Areas. , Chapter 8: Defect Prediction -- 8.1 Defect Detection Economics -- 8.2 Static Code Defect Prediction -- 8.2.1 Easy to Use -- 8.2.2 Widely Used -- 8.2.3 Useful -- Chapter 9: Effort Estimation -- 9.1 The Estimation Problem -- 9.2 How to Make Estimates -- 9.2.1 Expert-Based Estimation -- 9.2.2 Model-Based Estimation -- 9.2.3 Hybrid Methods -- Chapter 10: Data Mining (Under the Hood) -- 10.1 Data Carving -- 10.2 About the Data -- 10.3 Cohen Pruning -- 10.4 Discretization -- 10.4.1 Other Discretization Methods -- 10.5 Column Pruning -- 10.6 Row Pruning -- 10.7 Cluster Pruning -- 10.7.1 Advantages of Prototypes -- 10.7.2 Advantages of Clustering -- 10.8 Contrast Pruning -- 10.9 Goal Pruning -- 10.10 Extensions for Continuous Classes -- 10.10.1 How RTs Work -- 10.10.2 Creating Splits for Categorical Input Features -- 10.10.3 Splits on Numeric Input Features -- 10.10.4 Termination Condition and Predictions -- 10.10.5 Potential Advantages of RTs for Software Effort Estimation -- 10.10.6 Predictions for Multiple Numeric Goals -- Part III: Sharing Data -- Chapter 11: Sharing Data: Challenges and Methods -- 11.1 Houston, We Have a Problem -- 11.2 Good News, Everyone -- Chapter 12: Learning Contexts -- 12.1 Background -- 12.2 Manual Methods for Contextualization -- 12.3 Automatic Methods -- 12.4 Other Motivation to Find Contexts -- 12.4.1 Variance Reduction -- 12.4.2 Anomaly Detection -- 12.4.3 Certification Envelopes -- 12.4.4 Incremental Learning -- 12.4.5 Compression -- 12.4.6 Optimization -- 12.5 How to Find Local Regions -- 12.5.1 License -- 12.5.2 Installing CHUNK -- 12.5.3 Testing Your Installation -- 12.5.4 Applying CHUNK to Other Models -- 12.6 Inside CHUNK -- 12.6.1 Roadmap to Functions -- 12.6.2 Distance Calculations -- 12.6.2.1 Normalize -- 12.6.2.2 SquaredDifference -- 12.6.3 Dividing the Data -- 12.6.3.1 FastDiv -- 12.6.3.2 TwoDistantPoints. , 12.6.3.3 Settings -- 12.6.3.4 Chunk (main function) -- 12.6.4 Support Utilities -- 12.6.4.1 Some standard tricks -- 12.6.4.2 Tree iterators -- 12.6.4.3 Pretty printing -- 12.7 Putting It all Together -- 12.7.1 _nasa93 -- 12.8 Using CHUNK -- 12.9 Closing Remarks -- Chapter 13: Cross-Company Learning: Handling the Data Drought -- 13.1 Motivation -- 13.2 Setting the Ground for Analyses -- 13.2.1 Wait … Is This Really CC Data? -- 13.2.2 Mining the Data -- 13.2.3 Magic Trick: NN Relevancy Filtering -- 13.3 Analysis #1: Can CC Data be Useful for an Organization? -- 13.3.1 Design -- 13.3.2 Results from Analysis #1 -- 13.3.3 Checking the Analysis #1 Results -- 13.3.4 Discussion of Analysis #1 -- 13.4 Analysis #2: How to Cleanup CC Data for Local Tuning? -- 13.4.1 Design -- 13.4.2 Results -- 13.4.3 Discussions -- 13.5 Analysis #3: How Much Local Data Does an Organization Need for a Local Model? -- 13.5.1 Design -- 13.5.2 Results from Analysis #3 -- 13.5.3 Checking the Analysis #3 Results -- 13.5.4 Discussion of Analysis #3 -- 13.6 How Trustworthy Are These Results? -- 13.7 Are These Useful in Practice or Just Number Crunching? -- 13.8 What's New on Cross-Learning? -- 13.8.1 Discussion -- 13.9 What's the Takeaway? -- Chapter 14: Building Smarter Transfer Learners -- 14.1 What Is Actually the Problem? -- 14.2 What Do We Know So Far? -- 14.2.1 Transfer Learning -- 14.2.2 Transfer Learning and SE -- 14.2.3 Data Set Shift -- 14.3 An Example Technology: TEAK -- 14.4 The Details of the Experiments -- 14.4.1 Performance Comparison -- 14.4.2 Performance Measures -- 14.4.3 Retrieval Tendency -- 14.5 Results -- 14.5.1 Performance Comparison -- 14.5.2 Inspecting Selection Tendencies -- 14.6 Discussion -- 14.7 What Are the Takeaways? -- Chapter 15: Sharing Less Data (Is a Good Thing) -- 15.1 Can We Share Less Data? -- 15.2 Using Less Data -- 15.3 Why Share Less Data?. , 15.3.1 Less Data Is More Reliable -- 15.3.2 Less Data Is Faster to Discuss -- 15.3.3 Less Data Is Easier to Process -- 15.4 How to Find Less Data -- 15.4.1 Input -- 15.4.2 Comparisons to Other Learners -- 15.4.3 Reporting the Results -- 15.4.4 Discussion of Results -- 15.5 What's Next? -- Chapter 16: How to Keep Your Data Private -- 16.1 Motivation -- 16.2 What Is PPDP and Why Is It Important? -- 16.3 What Is Considered a Breach of Privacy? -- 16.4 How to Avoid Privacy Breaches? -- 16.4.1 Generalization and Suppression -- 16.4.2 Anatomization and Permutation -- 16.4.3 Perturbation -- 16.4.4 Output Perturbation -- 16.5 How Are Privacy-Preserving Algorithms Evaluated? -- 16.5.1 Privacy Metrics -- 16.5.2 Modeling the Background Knowledge of an Attacker -- 16.6 Case Study: Privacy and Cross-Company Defect Prediction -- 16.6.1 Results and Contributions -- 16.6.2 Privacy and CCDP -- 16.6.3 CLIFF -- 16.6.4 MORPH -- 16.6.5 Example of CLIFF& -- MORPH -- 16.6.6 Evaluation Metrics -- 16.6.7 Evaluating Utility via Classification -- 16.6.8 Evaluating Privatization -- 16.6.8.1 Defining privacy -- 16.6.9 Experiments -- 16.6.9.1 Data -- 16.6.10 Design -- 16.6.11 Defect Predictors -- 16.6.12 Query Generator -- 16.6.13 Benchmark Privacy Algorithms -- 16.6.14 Experimental Evaluation -- 16.6.15 Discussion -- 16.6.16 Related Work: Privacy in SE -- 16.6.17 Summary -- Chapter 17: Compensating for Missing Data -- 17.1 Background Notes on SEE and Instance Selection -- 17.1.1 Software Effort Estimation -- 17.1.2 Instance Selection in SEE -- 17.2 Data Sets and Performance Measures -- 17.2.1 Data Sets -- 17.2.2 Error Measures -- 17.3 Experimental Conditions -- 17.3.1 The Algorithms Adopted -- 17.3.2 Proposed Method: POP1 -- 17.3.3 Experiments -- 17.4 Results -- 17.4.1 Results Without Instance Selection -- 17.4.2 Results with Instance Selection -- 17.5 Summary. , Chapter 18: Active Learning: Learning More with Less -- 18.1 How Does the QUICK Algorithm Work? -- 18.1.1 Getting Rid of Similar Features: Synonym Pruning -- 18.1.2 Getting Rid of Dissimilar Instances: Outlier Pruning -- 18.2 Notes on Active Learning -- 18.3 The Application and Implementation Details of QUICK -- 18.3.1 Phase 1: Synonym Pruning -- 18.3.2 Phase 2: Outlier Removal and Estimation -- 18.3.3 Seeing QUICK in Action with a Toy Example -- 18.3.3.1 Phase 1: Synonym pruning -- 18.3.3.2 Phase 2: Outlier removal and estimation -- 18.4 How the Experiments Are Designed -- 18.5 Results -- 18.5.1 Performance -- 18.5.2 Reduction via Synonym and Outlier Pruning -- 18.5.3 Comparison of QUICK vs. CART -- 18.5.4 Detailed Look at the Statistical Analysis -- 18.5.5 Early Results on Defect Data Sets -- 18.6 Summary -- Part IV: Sharing Models -- Chapter 19: Sharing Models: Challenges and Methods -- Chapter 20: Ensembles of Learning Machines -- 20.1 When and Why Ensembles Work -- 20.1.1 Intuition -- 20.1.2 Theoretical Foundation -- 20.2 Bootstrap Aggregating (Bagging) -- 20.2.1 How Bagging Works -- 20.2.2 When and Why Bagging Works -- 20.2.3 Potential Advantages of Bagging for SEE -- 20.3 Regression Trees (RTs) for Bagging -- 20.4 Evaluation Framework -- 20.4.1 Choice of Data Sets and Preprocessing Techniques -- 20.4.1.1 PROMISE data -- 20.4.1.2 ISBSG data -- 20.4.2 Choice of Learning Machines -- 20.4.3 Choice of Evaluation Methods -- 20.4.4 Choice of Parameters -- 20.5 Evaluation of Bagging+RTs in SEE -- 20.5.1 Friedman Ranking -- 20.5.2 Approaches Most Often Ranked First or Second in Terms of MAE, MMRE and PRED(25) -- 20.5.3 Magnitude of Performance Against the Best -- 20.5.4 Discussion -- 20.6 Further Understanding of Bagging+RTs in SEE -- 20.7 Summary -- Chapter 21: How to Adapt Models in a Dynamic World -- 21.1 Cross-Company Data and Questions Tackled. , 21.2 Related Work. , English
    Additional Edition: ISBN 1-336-00889-X
    Additional Edition: ISBN 0-12-417295-4
    Language: English
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 5
    Online Resource
    Online Resource
    Waltham, Massachusetts :Morgan Kaufmann,
    UID:
    edocfu_9960073788002883
    Format: 1 online resource (415 pages) : , illustrations (some color), graphs
    Edition: First edition.
    ISBN: 0-12-417307-1
    Content: Data Science for Software Engineering: Sharing Data and Models presents guidance and procedures for reusing data and models between projects to produce results that are useful and relevant. Starting with a background section of practical lessons and warnings for beginner data scientists for software engineering, this edited volume proceeds to identify critical questions of contemporary software engineering related to data and models. Learn how to adapt data from other organizations to local problems, mine privatized data, prune spurious information, simplify complex results, how to update models for new platforms, and more. Chapters share largely applicable experimental results discussed with the blend of practitioner focused domain expertise, with commentary that highlights the methods that are most useful, and applicable to the widest range of projects. Each chapter is written by a prominent expert and offers a state-of-the-art solution to an identified problem facing data scientists in software engineering. Throughout, the editors share best practices collected from their experience training software engineering students and practitioners to master data science, and highlight the methods that are most useful, and applicable to the widest range of projects. Shares the specific experience of leading researchers and techniques developed to handle data problems in the realm of software engineering Explains how to start a project of data science for software engineering as well as how to identify and avoid likely pitfalls Provides a wide range of useful qualitative and quantitative principles ranging from very simple to cutting edge research Addresses current challenges with software engineering data such as lack of local data, access issues due to data privacy, increasing data quality via cleaning of spurious chunks in data
    Note: Bibliographic Level Mode of Issuance: Monograph , Front Cover -- Sharing Data and Models in Software Engineering -- Copyright -- Why this book? -- Foreword -- Contents -- List of Figures -- Chapter 1: Introduction -- 1.1 Why Read This Book? -- 1.2 What Do We Mean by ``Sharing''? -- 1.2.1 Sharing Insights -- 1.2.2 Sharing Models -- 1.2.3 Sharing Data -- 1.2.4 Sharing Analysis Methods -- 1.2.5 Types of Sharing -- 1.2.6 Challenges with Sharing -- 1.2.7 How to Share -- 1.3 What? (Our Executive Summary) -- 1.3.1 An Overview -- 1.3.2 More Details -- 1.4 How to Read This Book -- 1.4.1 Data Analysis Patterns -- 1.5 But What About …? (What Is Not in This Book) -- 1.5.1 What About ``Big Data''? -- 1.5.2 What About Related Work? -- 1.5.3 Why All the Defect Prediction and Effort Estimation? -- 1.6 Who? (About the Authors) -- 1.7 Who Else? (Acknowledgments) -- Part I: Data Mining for Managers -- Chapter 2: Rules for Managers -- 2.1 The Inductive Engineering Manifesto -- 2.2 More Rules -- Chapter 3: Rule #1: Talk to the Users -- 3.1 Users Biases -- 3.2 Data Mining Biases -- 3.3 Can We Avoid Bias? -- 3.4 Managing Biases -- 3.5 Summary -- Chapter 4: Rule #2: Know the Domain -- 4.1 Cautionary Tale #1: ``Discovering'' Random Noise -- 4.2 Cautionary Tale #2: Jumping at Shadows -- 4.3 Cautionary Tale #3: It Pays to Ask -- 4.4 Summary -- Chapter 5: Rule #3: Suspect Your Data -- 5.1 Controlling Data Collection -- 5.2 Problems with Controlled Data Collection -- 5.3 Rinse (and Prune) Before Use -- 5.3.1 Row Pruning -- 5.3.2 Column Pruning -- 5.4 On the Value of Pruning -- 5.5 Summary -- Chapter 6: Rule #4: Data Science Is Cyclic -- 6.1 The Knowledge Discovery Cycle -- 6.2 Evolving Cyclic Development -- 6.2.1 Scouting -- 6.2.2 Surveying -- 6.2.3 Building -- 6.2.4 Effort -- 6.3 Summary -- Part II: Data Mining: A Technical Tutorial -- Chapter 7: Data Mining and SE -- 7.1 Some Definitions -- 7.2 Some Application Areas. , Chapter 8: Defect Prediction -- 8.1 Defect Detection Economics -- 8.2 Static Code Defect Prediction -- 8.2.1 Easy to Use -- 8.2.2 Widely Used -- 8.2.3 Useful -- Chapter 9: Effort Estimation -- 9.1 The Estimation Problem -- 9.2 How to Make Estimates -- 9.2.1 Expert-Based Estimation -- 9.2.2 Model-Based Estimation -- 9.2.3 Hybrid Methods -- Chapter 10: Data Mining (Under the Hood) -- 10.1 Data Carving -- 10.2 About the Data -- 10.3 Cohen Pruning -- 10.4 Discretization -- 10.4.1 Other Discretization Methods -- 10.5 Column Pruning -- 10.6 Row Pruning -- 10.7 Cluster Pruning -- 10.7.1 Advantages of Prototypes -- 10.7.2 Advantages of Clustering -- 10.8 Contrast Pruning -- 10.9 Goal Pruning -- 10.10 Extensions for Continuous Classes -- 10.10.1 How RTs Work -- 10.10.2 Creating Splits for Categorical Input Features -- 10.10.3 Splits on Numeric Input Features -- 10.10.4 Termination Condition and Predictions -- 10.10.5 Potential Advantages of RTs for Software Effort Estimation -- 10.10.6 Predictions for Multiple Numeric Goals -- Part III: Sharing Data -- Chapter 11: Sharing Data: Challenges and Methods -- 11.1 Houston, We Have a Problem -- 11.2 Good News, Everyone -- Chapter 12: Learning Contexts -- 12.1 Background -- 12.2 Manual Methods for Contextualization -- 12.3 Automatic Methods -- 12.4 Other Motivation to Find Contexts -- 12.4.1 Variance Reduction -- 12.4.2 Anomaly Detection -- 12.4.3 Certification Envelopes -- 12.4.4 Incremental Learning -- 12.4.5 Compression -- 12.4.6 Optimization -- 12.5 How to Find Local Regions -- 12.5.1 License -- 12.5.2 Installing CHUNK -- 12.5.3 Testing Your Installation -- 12.5.4 Applying CHUNK to Other Models -- 12.6 Inside CHUNK -- 12.6.1 Roadmap to Functions -- 12.6.2 Distance Calculations -- 12.6.2.1 Normalize -- 12.6.2.2 SquaredDifference -- 12.6.3 Dividing the Data -- 12.6.3.1 FastDiv -- 12.6.3.2 TwoDistantPoints. , 12.6.3.3 Settings -- 12.6.3.4 Chunk (main function) -- 12.6.4 Support Utilities -- 12.6.4.1 Some standard tricks -- 12.6.4.2 Tree iterators -- 12.6.4.3 Pretty printing -- 12.7 Putting It all Together -- 12.7.1 _nasa93 -- 12.8 Using CHUNK -- 12.9 Closing Remarks -- Chapter 13: Cross-Company Learning: Handling the Data Drought -- 13.1 Motivation -- 13.2 Setting the Ground for Analyses -- 13.2.1 Wait … Is This Really CC Data? -- 13.2.2 Mining the Data -- 13.2.3 Magic Trick: NN Relevancy Filtering -- 13.3 Analysis #1: Can CC Data be Useful for an Organization? -- 13.3.1 Design -- 13.3.2 Results from Analysis #1 -- 13.3.3 Checking the Analysis #1 Results -- 13.3.4 Discussion of Analysis #1 -- 13.4 Analysis #2: How to Cleanup CC Data for Local Tuning? -- 13.4.1 Design -- 13.4.2 Results -- 13.4.3 Discussions -- 13.5 Analysis #3: How Much Local Data Does an Organization Need for a Local Model? -- 13.5.1 Design -- 13.5.2 Results from Analysis #3 -- 13.5.3 Checking the Analysis #3 Results -- 13.5.4 Discussion of Analysis #3 -- 13.6 How Trustworthy Are These Results? -- 13.7 Are These Useful in Practice or Just Number Crunching? -- 13.8 What's New on Cross-Learning? -- 13.8.1 Discussion -- 13.9 What's the Takeaway? -- Chapter 14: Building Smarter Transfer Learners -- 14.1 What Is Actually the Problem? -- 14.2 What Do We Know So Far? -- 14.2.1 Transfer Learning -- 14.2.2 Transfer Learning and SE -- 14.2.3 Data Set Shift -- 14.3 An Example Technology: TEAK -- 14.4 The Details of the Experiments -- 14.4.1 Performance Comparison -- 14.4.2 Performance Measures -- 14.4.3 Retrieval Tendency -- 14.5 Results -- 14.5.1 Performance Comparison -- 14.5.2 Inspecting Selection Tendencies -- 14.6 Discussion -- 14.7 What Are the Takeaways? -- Chapter 15: Sharing Less Data (Is a Good Thing) -- 15.1 Can We Share Less Data? -- 15.2 Using Less Data -- 15.3 Why Share Less Data?. , 15.3.1 Less Data Is More Reliable -- 15.3.2 Less Data Is Faster to Discuss -- 15.3.3 Less Data Is Easier to Process -- 15.4 How to Find Less Data -- 15.4.1 Input -- 15.4.2 Comparisons to Other Learners -- 15.4.3 Reporting the Results -- 15.4.4 Discussion of Results -- 15.5 What's Next? -- Chapter 16: How to Keep Your Data Private -- 16.1 Motivation -- 16.2 What Is PPDP and Why Is It Important? -- 16.3 What Is Considered a Breach of Privacy? -- 16.4 How to Avoid Privacy Breaches? -- 16.4.1 Generalization and Suppression -- 16.4.2 Anatomization and Permutation -- 16.4.3 Perturbation -- 16.4.4 Output Perturbation -- 16.5 How Are Privacy-Preserving Algorithms Evaluated? -- 16.5.1 Privacy Metrics -- 16.5.2 Modeling the Background Knowledge of an Attacker -- 16.6 Case Study: Privacy and Cross-Company Defect Prediction -- 16.6.1 Results and Contributions -- 16.6.2 Privacy and CCDP -- 16.6.3 CLIFF -- 16.6.4 MORPH -- 16.6.5 Example of CLIFF& -- MORPH -- 16.6.6 Evaluation Metrics -- 16.6.7 Evaluating Utility via Classification -- 16.6.8 Evaluating Privatization -- 16.6.8.1 Defining privacy -- 16.6.9 Experiments -- 16.6.9.1 Data -- 16.6.10 Design -- 16.6.11 Defect Predictors -- 16.6.12 Query Generator -- 16.6.13 Benchmark Privacy Algorithms -- 16.6.14 Experimental Evaluation -- 16.6.15 Discussion -- 16.6.16 Related Work: Privacy in SE -- 16.6.17 Summary -- Chapter 17: Compensating for Missing Data -- 17.1 Background Notes on SEE and Instance Selection -- 17.1.1 Software Effort Estimation -- 17.1.2 Instance Selection in SEE -- 17.2 Data Sets and Performance Measures -- 17.2.1 Data Sets -- 17.2.2 Error Measures -- 17.3 Experimental Conditions -- 17.3.1 The Algorithms Adopted -- 17.3.2 Proposed Method: POP1 -- 17.3.3 Experiments -- 17.4 Results -- 17.4.1 Results Without Instance Selection -- 17.4.2 Results with Instance Selection -- 17.5 Summary. , Chapter 18: Active Learning: Learning More with Less -- 18.1 How Does the QUICK Algorithm Work? -- 18.1.1 Getting Rid of Similar Features: Synonym Pruning -- 18.1.2 Getting Rid of Dissimilar Instances: Outlier Pruning -- 18.2 Notes on Active Learning -- 18.3 The Application and Implementation Details of QUICK -- 18.3.1 Phase 1: Synonym Pruning -- 18.3.2 Phase 2: Outlier Removal and Estimation -- 18.3.3 Seeing QUICK in Action with a Toy Example -- 18.3.3.1 Phase 1: Synonym pruning -- 18.3.3.2 Phase 2: Outlier removal and estimation -- 18.4 How the Experiments Are Designed -- 18.5 Results -- 18.5.1 Performance -- 18.5.2 Reduction via Synonym and Outlier Pruning -- 18.5.3 Comparison of QUICK vs. CART -- 18.5.4 Detailed Look at the Statistical Analysis -- 18.5.5 Early Results on Defect Data Sets -- 18.6 Summary -- Part IV: Sharing Models -- Chapter 19: Sharing Models: Challenges and Methods -- Chapter 20: Ensembles of Learning Machines -- 20.1 When and Why Ensembles Work -- 20.1.1 Intuition -- 20.1.2 Theoretical Foundation -- 20.2 Bootstrap Aggregating (Bagging) -- 20.2.1 How Bagging Works -- 20.2.2 When and Why Bagging Works -- 20.2.3 Potential Advantages of Bagging for SEE -- 20.3 Regression Trees (RTs) for Bagging -- 20.4 Evaluation Framework -- 20.4.1 Choice of Data Sets and Preprocessing Techniques -- 20.4.1.1 PROMISE data -- 20.4.1.2 ISBSG data -- 20.4.2 Choice of Learning Machines -- 20.4.3 Choice of Evaluation Methods -- 20.4.4 Choice of Parameters -- 20.5 Evaluation of Bagging+RTs in SEE -- 20.5.1 Friedman Ranking -- 20.5.2 Approaches Most Often Ranked First or Second in Terms of MAE, MMRE and PRED(25) -- 20.5.3 Magnitude of Performance Against the Best -- 20.5.4 Discussion -- 20.6 Further Understanding of Bagging+RTs in SEE -- 20.7 Summary -- Chapter 21: How to Adapt Models in a Dynamic World -- 21.1 Cross-Company Data and Questions Tackled. , 21.2 Related Work. , English
    Additional Edition: ISBN 1-336-00889-X
    Additional Edition: ISBN 0-12-417295-4
    Language: English
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 6
    UID:
    edoccha_9958145839502883
    Format: 1 online resource (410 pages) : , illustrations (some color), photographs, graphs, tables
    Edition: 1st edition
    ISBN: 0-12-804261-3 , 0-12-804206-0
    Content: Perspectives on Data Science for Software Engineering presents the best practices of seasoned data miners in software engineering. The idea for this book was created during the 2014 conference at Dagstuhl, an invitation-only gathering of leading computer scientists who meet to identify and discuss cutting-edge informatics topics. At the 2014 conference, the concept of how to transfer the knowledge of experts from seasoned software engineers and data scientists to newcomers in the field highlighted many discussions. While there are many books covering data mining and software engineering basics, they present only the fundamentals and lack the perspective that comes from real-world experience. This book offers unique insights into the wisdom of the community’s leaders gathered to share hard-won lessons from the trenches. Ideas are presented in digestible chapters designed to be applicable across many domains. Topics included cover data collection, data sharing, data mining, and how to utilize these techniques in successful software projects. Newcomers to software engineering data science will learn the tips and tricks of the trade, while more experienced data scientists will benefit from war stories that show what traps to avoid. Presents the wisdom of community experts, derived from a summit on software analytics Provides contributed chapters that share discrete ideas and technique from the trenches Covers top areas of concern, including mining security and social data, data visualization, and cloud-based data Presented in clear chapters designed to be applicable across many domains
    Note: Front Cover -- Perspectives on Data Science for Software Engineering -- Copyright -- Contents -- Contributors -- Acknowledgments -- Introduction -- Perspectives on data science for software engineering -- Why This Book? -- About This Book -- The Future -- References -- Software analytics and its application in practice -- Six Perspectives of Software Analytics -- Experiences in Putting Software Analytics into Practice -- References -- Seven principles of inductive software engineering: What we do is different -- Different and Important -- Principle #1: Humans Before Algorithms -- Principle #2: Plan for Scale -- Principle #3: Get Early Feedback -- Principle #4: Be Open Minded -- Principle #5: Be smart with your learning -- Principle #6: Live With the Data You Have -- Principle #7: Develop a Broad Skill Set That Uses a Big Toolkit -- References -- The need for data analysis patterns (in software engineering) -- The Remedy Metaphor -- Software Engineering Data -- Needs of Data Analysis Patterns -- Building Remedies for Data Analysis in Software Engineering Research -- References -- From software data to software theory: The path less traveled -- Pathways of Software Repository Research -- From Observation, to Theory, to Practice -- References -- Why theory matters -- Introduction -- How to Use Theory -- How to build theory -- Constructs -- Propositions -- Explanation -- Scope -- In Summary: Find a Theory or Build One Yourself -- Further Reading -- Success stories/applications -- Mining apps for anomalies -- The Million-Dollar Question -- App Mining -- Detecting Abnormal Behavior -- A Treasure Trove of Data -- ... But Also Obstacles -- Executive Summary -- Further Reading -- Embrace dynamic artifacts -- Can We Minimize the USB Driver Test Suite? -- Yes, Lets Observe Interactions -- Why Did Our Solution Work? -- Still Not Convinced? Heres More. , Dynamic Artifacts Are Here to Stay -- Acknowledgments -- References -- Mobile app store analytics -- Introduction -- Understanding End Users -- Conclusion -- References -- The naturalness of software* -- Introduction -- Transforming Software Practice -- Porting and Translation -- The ``Natural Linguistics´´ of Code -- Analysis and Tools -- Assistive Technologies -- Conclusion -- References -- Advances in release readiness -- Predictive Test Metrics -- Universal Release Criteria Model -- Best Estimation Technique -- Resource/Schedule/Content Model -- Using Models in Release Management -- Research to Implementation: A Difficult (but Rewarding) Journey -- How to tame your online services -- Background -- Service Analysis Studio -- Success Story -- References -- Measuring individual productivity -- No Single and Simple Best Metric for Success/Productivity -- Measure the Process, Not Just the Outcome -- Allow for Measures to Evolve -- Goodharts Law and the Effect of Measuring -- How to Measure Individual Productivity? -- References -- Stack traces reveal attack surfaces -- Another Use of Stack Traces? -- Attack Surface Approximation -- References -- Visual analytics for software engineering data -- References -- Gameplay data plays nicer when divided into cohorts -- Cohort Analysis as a Tool for Gameplay Data -- Play to Lose -- Forming Cohorts -- Case Studies of Gameplay Data -- Challenges of using cohorts -- Summary -- References -- A success story in applying data science in practice -- Overview -- Analytics Process -- Data Collection -- Exploratory Data Analysis -- Model Selection -- Performance Measures and Benefit Analysis -- Communication Process-Best Practices -- Problem Selection -- Managerial Support -- Project Management -- Trusted Relationship -- Summary -- References -- There's never enough time to do all the testing you want. , The Impact of Short Release Cycles (There's Not Enough Time) -- Testing Is More Than Functional Correctness (All the Testing You Want) -- Learn From Your Test Execution History -- Test Effectiveness -- Test Reliability/Not Every Test Failure Points to a Defect -- The Art of Testing Less -- Without Sacrificing Code Quality -- Tests Evolve Over Time -- In Summary -- References -- The perils of energy mining: measure a bunch, compare just once -- A Tale of TWO HTTPs -- Let's energise your software energy experiments -- Environment -- N-Versions -- Energy or Power -- Repeat! -- Granularity -- Idle Measurement -- Statistical Analysis -- Exceptions -- Summary -- References -- Identifying fault-prone files in large industrial software systems -- Acknowledgment -- References -- A tailored suit: The big opportunity in personalizing issue tracking -- Many Choices, Nothing Great -- The Need for Personalization -- Developer Dashboards or ``A Tailored Suit´´ -- Room for Improvement -- References -- What counts is decisions, not numbers-Toward an analytics design sheet -- Decisions Everywhere -- The Decision-Making Process -- The Analytics Design Sheet -- Example: App Store Release Analysis -- References -- A large ecosystem study to understand the effect of programming languages on code quality -- Comparing Languages -- Study Design and Analysis -- Results -- Summary -- References -- Code reviews are not for finding defects-Even established tools need occasional evaluation -- Results -- Effects -- Conclusions -- References -- Techniques -- Interviews -- Why Interview? -- The Interview Guide -- Selecting Interviewees -- Recruitment -- Collecting Background Data -- Conducting the Interview -- Post-Interview Discussion and Notes -- Transcription -- Analysis -- Reporting -- Now Go Interview! -- References -- Look for state transitions in temporal data. , Bikeshedding in Software Engineering -- Summarizing Temporal Data -- Recommendations -- Reference -- Card-sorting: From text to themes -- Preparation Phase -- Execution Phase -- Analysis Phase -- References -- Tools! Tools! We need tools! -- Tools in Science -- The Tools We Need -- Recommendations for Tool Building -- References -- Evidence-based software engineering -- Introduction -- The Aim and Methodology of EBSE -- Contextualizing Evidence -- Strength of Evidence -- Evidence and Theory -- References -- Which machine learning method do you need? -- Learning Styles -- Do additional Data Arrive Over Time? -- Are Changes Likely to Happen Over Time? -- If You Have a Prediction Problem, What Do You Really Need to Predict? -- Do You Have a Prediction Problem Where Unlabeled Data are Abundant and Labeled Data are Expensive? -- Are Your Data Imbalanced? -- Do You Need to Use Data From Different Sources? -- Do You Have Big Data? -- Do You Have Little Data? -- In Summary ... -- References -- Structure your unstructured data first! -- Unstructured Data in Software Engineering -- Summarizing Unstructured Software Data -- As Simple as Possible... But not Simpler! -- You Need Structure! -- Conclusion -- References -- Parse that data! Practical tips for preparing your raw data for analysis -- Use Assertions Everywhere -- Print Information About Broken Records -- Use Sets or Counters to Store Occurrences of Categorical Variables -- Restart Parsing in the Middle of the Data Set -- Test on a Small Subset of Your Data -- Redirect Stdout and Stderr to Log Files -- Store Raw Data Alongside Cleaned Data -- Finally, Write a Verifier Program to Check the Integrity of Your Cleaned Data -- Natural language processing is no free lunch -- Natural Language Data in Software Projects -- Natural Language Processing -- How to Apply NLP to Software Projects -- Do Stemming First. , Check the Level of Abstraction -- Dont Expect Magic -- Dont Discard Manual Analysis of Textual Data -- Summary -- References -- Aggregating empirical evidence for more trustworthy decisions -- What's Evidence? -- What Does Data From Empirical Studies Look Like? -- The Evidence-Based Paradigm and Systematic Reviews -- How Far Can We Use the Outcomes From Systematic Review to Make Decisions? -- References -- If it is software engineering, it is (probably) a Bayesian factor -- Causing the Future With Bayesian Networks -- The Need for a Hybrid Approach in Software Analytics -- Use the Methodology, Not the Model -- References -- Becoming Goldilocks: Privacy and data sharing in ``just right´´ conditions -- The ``Data Drought´´ -- Change is Good -- Dont Share Everything -- Share Your Leaders -- Summary -- Acknowledgments -- References -- The wisdom of the crowds in predictive modeling for software engineering -- The Wisdom of the Crowds -- So... How is That Related to Predictive Modeling for Software Engineering? -- Examples of Ensembles and Factors Affecting Their Accuracy -- Crowds for transferring knowledge and dealing with changes -- Crowds for Multiple Goals -- A Crowd of Insights -- Ensembles as Versatile Tools -- References -- Combining quantitative and qualitative methods (when mining software data) -- Prologue: We Have Solid Empirical Evidence! -- Correlation is Not Causation and, Even If We Can Claim Causation... -- Collect your data: People and artifacts -- Source 1: Dig Into Software Artifacts and Data -- ...but be careful about noise and incompleteness! -- Source 2: Getting Feedback From Developers -- ...and dont be afraid if you collect very little data! -- How Much to Analyze, and How? -- Build a theory upon your data -- Conclusion: The Truth is Out There! -- Suggested Readings -- References. , A process for surviving survey design and sailing through survey deployment.
    Language: English
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 7
    UID:
    edocfu_9958145839502883
    Format: 1 online resource (410 pages) : , illustrations (some color), photographs, graphs, tables
    Edition: 1st edition
    ISBN: 0-12-804261-3 , 0-12-804206-0
    Content: Perspectives on Data Science for Software Engineering presents the best practices of seasoned data miners in software engineering. The idea for this book was created during the 2014 conference at Dagstuhl, an invitation-only gathering of leading computer scientists who meet to identify and discuss cutting-edge informatics topics. At the 2014 conference, the concept of how to transfer the knowledge of experts from seasoned software engineers and data scientists to newcomers in the field highlighted many discussions. While there are many books covering data mining and software engineering basics, they present only the fundamentals and lack the perspective that comes from real-world experience. This book offers unique insights into the wisdom of the community’s leaders gathered to share hard-won lessons from the trenches. Ideas are presented in digestible chapters designed to be applicable across many domains. Topics included cover data collection, data sharing, data mining, and how to utilize these techniques in successful software projects. Newcomers to software engineering data science will learn the tips and tricks of the trade, while more experienced data scientists will benefit from war stories that show what traps to avoid. Presents the wisdom of community experts, derived from a summit on software analytics Provides contributed chapters that share discrete ideas and technique from the trenches Covers top areas of concern, including mining security and social data, data visualization, and cloud-based data Presented in clear chapters designed to be applicable across many domains
    Note: Front Cover -- Perspectives on Data Science for Software Engineering -- Copyright -- Contents -- Contributors -- Acknowledgments -- Introduction -- Perspectives on data science for software engineering -- Why This Book? -- About This Book -- The Future -- References -- Software analytics and its application in practice -- Six Perspectives of Software Analytics -- Experiences in Putting Software Analytics into Practice -- References -- Seven principles of inductive software engineering: What we do is different -- Different and Important -- Principle #1: Humans Before Algorithms -- Principle #2: Plan for Scale -- Principle #3: Get Early Feedback -- Principle #4: Be Open Minded -- Principle #5: Be smart with your learning -- Principle #6: Live With the Data You Have -- Principle #7: Develop a Broad Skill Set That Uses a Big Toolkit -- References -- The need for data analysis patterns (in software engineering) -- The Remedy Metaphor -- Software Engineering Data -- Needs of Data Analysis Patterns -- Building Remedies for Data Analysis in Software Engineering Research -- References -- From software data to software theory: The path less traveled -- Pathways of Software Repository Research -- From Observation, to Theory, to Practice -- References -- Why theory matters -- Introduction -- How to Use Theory -- How to build theory -- Constructs -- Propositions -- Explanation -- Scope -- In Summary: Find a Theory or Build One Yourself -- Further Reading -- Success stories/applications -- Mining apps for anomalies -- The Million-Dollar Question -- App Mining -- Detecting Abnormal Behavior -- A Treasure Trove of Data -- ... But Also Obstacles -- Executive Summary -- Further Reading -- Embrace dynamic artifacts -- Can We Minimize the USB Driver Test Suite? -- Yes, Lets Observe Interactions -- Why Did Our Solution Work? -- Still Not Convinced? Heres More. , Dynamic Artifacts Are Here to Stay -- Acknowledgments -- References -- Mobile app store analytics -- Introduction -- Understanding End Users -- Conclusion -- References -- The naturalness of software* -- Introduction -- Transforming Software Practice -- Porting and Translation -- The ``Natural Linguistics´´ of Code -- Analysis and Tools -- Assistive Technologies -- Conclusion -- References -- Advances in release readiness -- Predictive Test Metrics -- Universal Release Criteria Model -- Best Estimation Technique -- Resource/Schedule/Content Model -- Using Models in Release Management -- Research to Implementation: A Difficult (but Rewarding) Journey -- How to tame your online services -- Background -- Service Analysis Studio -- Success Story -- References -- Measuring individual productivity -- No Single and Simple Best Metric for Success/Productivity -- Measure the Process, Not Just the Outcome -- Allow for Measures to Evolve -- Goodharts Law and the Effect of Measuring -- How to Measure Individual Productivity? -- References -- Stack traces reveal attack surfaces -- Another Use of Stack Traces? -- Attack Surface Approximation -- References -- Visual analytics for software engineering data -- References -- Gameplay data plays nicer when divided into cohorts -- Cohort Analysis as a Tool for Gameplay Data -- Play to Lose -- Forming Cohorts -- Case Studies of Gameplay Data -- Challenges of using cohorts -- Summary -- References -- A success story in applying data science in practice -- Overview -- Analytics Process -- Data Collection -- Exploratory Data Analysis -- Model Selection -- Performance Measures and Benefit Analysis -- Communication Process-Best Practices -- Problem Selection -- Managerial Support -- Project Management -- Trusted Relationship -- Summary -- References -- There's never enough time to do all the testing you want. , The Impact of Short Release Cycles (There's Not Enough Time) -- Testing Is More Than Functional Correctness (All the Testing You Want) -- Learn From Your Test Execution History -- Test Effectiveness -- Test Reliability/Not Every Test Failure Points to a Defect -- The Art of Testing Less -- Without Sacrificing Code Quality -- Tests Evolve Over Time -- In Summary -- References -- The perils of energy mining: measure a bunch, compare just once -- A Tale of TWO HTTPs -- Let's energise your software energy experiments -- Environment -- N-Versions -- Energy or Power -- Repeat! -- Granularity -- Idle Measurement -- Statistical Analysis -- Exceptions -- Summary -- References -- Identifying fault-prone files in large industrial software systems -- Acknowledgment -- References -- A tailored suit: The big opportunity in personalizing issue tracking -- Many Choices, Nothing Great -- The Need for Personalization -- Developer Dashboards or ``A Tailored Suit´´ -- Room for Improvement -- References -- What counts is decisions, not numbers-Toward an analytics design sheet -- Decisions Everywhere -- The Decision-Making Process -- The Analytics Design Sheet -- Example: App Store Release Analysis -- References -- A large ecosystem study to understand the effect of programming languages on code quality -- Comparing Languages -- Study Design and Analysis -- Results -- Summary -- References -- Code reviews are not for finding defects-Even established tools need occasional evaluation -- Results -- Effects -- Conclusions -- References -- Techniques -- Interviews -- Why Interview? -- The Interview Guide -- Selecting Interviewees -- Recruitment -- Collecting Background Data -- Conducting the Interview -- Post-Interview Discussion and Notes -- Transcription -- Analysis -- Reporting -- Now Go Interview! -- References -- Look for state transitions in temporal data. , Bikeshedding in Software Engineering -- Summarizing Temporal Data -- Recommendations -- Reference -- Card-sorting: From text to themes -- Preparation Phase -- Execution Phase -- Analysis Phase -- References -- Tools! Tools! We need tools! -- Tools in Science -- The Tools We Need -- Recommendations for Tool Building -- References -- Evidence-based software engineering -- Introduction -- The Aim and Methodology of EBSE -- Contextualizing Evidence -- Strength of Evidence -- Evidence and Theory -- References -- Which machine learning method do you need? -- Learning Styles -- Do additional Data Arrive Over Time? -- Are Changes Likely to Happen Over Time? -- If You Have a Prediction Problem, What Do You Really Need to Predict? -- Do You Have a Prediction Problem Where Unlabeled Data are Abundant and Labeled Data are Expensive? -- Are Your Data Imbalanced? -- Do You Need to Use Data From Different Sources? -- Do You Have Big Data? -- Do You Have Little Data? -- In Summary ... -- References -- Structure your unstructured data first! -- Unstructured Data in Software Engineering -- Summarizing Unstructured Software Data -- As Simple as Possible... But not Simpler! -- You Need Structure! -- Conclusion -- References -- Parse that data! Practical tips for preparing your raw data for analysis -- Use Assertions Everywhere -- Print Information About Broken Records -- Use Sets or Counters to Store Occurrences of Categorical Variables -- Restart Parsing in the Middle of the Data Set -- Test on a Small Subset of Your Data -- Redirect Stdout and Stderr to Log Files -- Store Raw Data Alongside Cleaned Data -- Finally, Write a Verifier Program to Check the Integrity of Your Cleaned Data -- Natural language processing is no free lunch -- Natural Language Data in Software Projects -- Natural Language Processing -- How to Apply NLP to Software Projects -- Do Stemming First. , Check the Level of Abstraction -- Dont Expect Magic -- Dont Discard Manual Analysis of Textual Data -- Summary -- References -- Aggregating empirical evidence for more trustworthy decisions -- What's Evidence? -- What Does Data From Empirical Studies Look Like? -- The Evidence-Based Paradigm and Systematic Reviews -- How Far Can We Use the Outcomes From Systematic Review to Make Decisions? -- References -- If it is software engineering, it is (probably) a Bayesian factor -- Causing the Future With Bayesian Networks -- The Need for a Hybrid Approach in Software Analytics -- Use the Methodology, Not the Model -- References -- Becoming Goldilocks: Privacy and data sharing in ``just right´´ conditions -- The ``Data Drought´´ -- Change is Good -- Dont Share Everything -- Share Your Leaders -- Summary -- Acknowledgments -- References -- The wisdom of the crowds in predictive modeling for software engineering -- The Wisdom of the Crowds -- So... How is That Related to Predictive Modeling for Software Engineering? -- Examples of Ensembles and Factors Affecting Their Accuracy -- Crowds for transferring knowledge and dealing with changes -- Crowds for Multiple Goals -- A Crowd of Insights -- Ensembles as Versatile Tools -- References -- Combining quantitative and qualitative methods (when mining software data) -- Prologue: We Have Solid Empirical Evidence! -- Correlation is Not Causation and, Even If We Can Claim Causation... -- Collect your data: People and artifacts -- Source 1: Dig Into Software Artifacts and Data -- ...but be careful about noise and incompleteness! -- Source 2: Getting Feedback From Developers -- ...and dont be afraid if you collect very little data! -- How Much to Analyze, and How? -- Build a theory upon your data -- Conclusion: The Truth is Out There! -- Suggested Readings -- References. , A process for surviving survey design and sailing through survey deployment.
    Language: English
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 8
    Online Resource
    Online Resource
    Waltham, MA : Morgan Kaufmann
    UID:
    b3kat_BV043020347
    Format: 1 Online-Ressource (xxiii, 660 Seiten) , Illustrationen, Diagramme
    Edition: First edition
    ISBN: 9780124115439 , 0124115438
    Note: Description based on online resource; title from cover page (Safari, viewed September 18, 2015)
    Additional Edition: Erscheint auch als Druck-Ausgabe ISBN 978-0-12-411519-4
    Language: English
    Subjects: Computer Science
    RVK:
    RVK:
    Keywords: Datenanalyse ; Data Mining ; Softwareentwicklung
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 9
    UID:
    gbv_772457255
    Format: Online-Ressource (1 online resource (145 pages))
    Edition: Association for Computing Machinery-Digital Library
    ISBN: 9781450307093
    Series Statement: ACM Digital Library
    Note: Title from The ACM Digital Library
    Language: English
    Keywords: Konferenzschrift
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 10
    Book
    Book
    New York, NY [u.a.] : IEEE Computer Society
    UID:
    b3kat_BV041589044
    Format: 88 S. , Ill., graph. Darst.
    Series Statement: IEEE software 30,5
    Language: English
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. Further information can be found on the KOBV privacy pages