SlideShare a Scribd company logo
Languages interaction and possible
   effects: an exploratory study




       Antonio Vetrò - Federico Tomassetti
       Marco Torchiano - Maurizio Morisio
No one writes in a single language anymore. Even trivial applications
have a general-purpose language, SQL, JavaScript, CSS, and dozens of
frameworks, each of which includes an external DSL
                                          Wampler 2010
How do those languages interact?

Is that interaction problematic?
Research questions
RQ1 How much interaction is there between the
    languages used in a project?

RQ2 Which language pairs interact more?

RQ3 Are Cross Language Modules more defect-
    prone than Intra Language Modules?
Plan
• Define a measure for the level of interaction
  among languages
• Investigate interaction vs. defect proneness

• Perform a case study
The Case Study
Apache Hadoop, which is a software to support
distributed data storage and processing.




Used in many real applications (e.g., Yahoo, Facebook).
Commit types
Language A (.extA)




                              Cross-Language Commit (CLC)




                                          Intra-Language Commit (ILC)



                     Language B (.extB)
RQ1 How much interaction is there between
    the languages present in a project?
Metric: Percentage of Cross-Language Commits

• All type of commits (RQ1.1)
• Commits divided by activity type (e.g., improvement,
  bug fixing, new feature) (RQ1.2)


  All      Bug    Improv New         Sub    Task   Test
(RQ 1.1)
                   ement Feature     task
 0.53      0.12     0.26  0.30       0.45   0.26   0.05
Cross Language Ratio
Language A (.extA)

                                  3 out of 4 commits involving
                                  m are Cross-Language
                                          Cross Language Ratio of
                                          module m CLRm = 0.75

                        m




Language C (.extC)   Language B (.extB)
Interaction level of a language
• Cross language ratio of an extension (language)
RQ2 Which extensions interact more?
Metric: CLRext

Considering one extension versus all the other
extensions (RQ2.1)

CLRext      Nr files     Extension
0.96        49           c
0.87        114          sh
0.72        75           properties
0.71        320          xml
0.59        4328         java
Focusing on extension pairs
Language A (.extA)
                                  2 out of 3 commits involving
                                  m together with extA are
                                  Cross Language
                                          Cross Language Ratio of
                                          module m w.r.t extA
                                          CLRm,extA = 0.67
                        m




Language C (.extC)   Language B (.extB)
Interaction level of a pair
• Cross language ratio of an extension w.r.t.
  another extension




  – Asymmetrical measure!
RQ2 Which extensions do interact more?
Metric: CLRextA,extB

Considering the most interacting ordered pairs of
extensions (RQ2.2).

extA/extB      C       Java   Properties Sh

C              -       0.51   0.10      0.50
Java           0.01    -      0.28      0.04
Properties     0       0.54   -         0.36
Sh             0.09    0.22   0.24      -
Xml            0.04    0.52   0.43      0.24
Cross vs. Intra Lang Modules

Cross Language Module (CLM): CLR is ≥ t%

Intra Language Modules (ILM): CLR is <
t%

                 t = 50%
RQ3 Are Cross Language Modules more defect-prone?
Metric: Odds ratio of CLM with/without defects , ILM
        with/without defects
- all module regardless of extension (RQ3.1)
- by extension (RQ3.2)

                 ILM      ILM    CLM    CLM p-value        OR
                no def.   def.  no def. def.
             all 1891       225   2875     89 <0.001       0.26

               c      2       0     46          1 1.000     Inf
           java    1692     201   2239         25 <0.001   0.09
      properties     19       1     45          7 0.429    2.92
              sh     10       5     64         13 0.162    0.41
            xml      96      11    184         24 0.851    1.14
RQ3 Are Cross Language Modules more defect-prone?
Metric: Odds ratio of CLM with/without defects , ILM
        with/without defects
Considering interaction between specific ordered
pairs of extensions (RQ3.3).
             C      Java   Properties sh               XML
           C -      Inf    0              0            Inf
        Java 2.79   -      0.32           0.43         0.96
   Properties Inf   1      -              12.08        0.94
          Sh 3.55   4.45   17.17          -            7.44
        Xml 3.83    0.95   3.22           4.73         -



                               In bold significant values
Threats
•   Confounding factors: age and size of modules
•   Usage of proxy for interaction between artifacts
•   Apache Hadoop representativeness
•   Renaming of modules
Conclusions
Language interaction depends on the type of activity
Frequent interactions are
generally not symmetric




                       Many of them involve XML
In general language interaction is not related to
higher defect proneness, see Java
Though several language pairs have CLMs
significantly more defect prone then ILMs, see C
Questions?

Languages interaction and possible
   effects: an exploratory study
       Antonio Vetrò - Federico Tomassetti
       Marco Torchiano - Maurizio Morisio

More Related Content

What's hot (20)

PDF
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Young Seok Kim
 
PDF
Deep learning for NLP and Transformer
Arvind Devaraj
 
PDF
PL Lecture 02 - Binding and Scope
Schwannden Kuo
 
PDF
Named Entity Recognition using Hidden Markov Model (HMM)
kevig
 
PPTX
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
PPTX
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...
ayaha osaki
 
PDF
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
Minh Pham
 
PPTX
Notes on attention mechanism
Khang Pham
 
PDF
BERT: Bidirectional Encoder Representations from Transformers
Liangqun Lu
 
PDF
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
PPTX
[Paper review] BERT
JEE HYUN PARK
 
PDF
Enriching Word Vectors with Subword Information
Seonghyun Kim
 
PPTX
Deep Learning for Machine Translation
Matīss ‎‎‎‎‎‎‎  
 
PPTX
C# Common Type System & Common Language Specification
Prem Kumar Badri
 
PDF
Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...
Association for Computational Linguistics
 
PDF
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Seonghyun Kim
 
PDF
Probabilistic content models,
Bryan Gummibearehausen
 
PDF
Seq2seq Model to Tokenize the Chinese Language
Jinho Choi
 
PDF
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Jinho Choi
 
PPTX
Bert
Abdallah Bashir
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Young Seok Kim
 
Deep learning for NLP and Transformer
Arvind Devaraj
 
PL Lecture 02 - Binding and Scope
Schwannden Kuo
 
Named Entity Recognition using Hidden Markov Model (HMM)
kevig
 
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...
ayaha osaki
 
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
Minh Pham
 
Notes on attention mechanism
Khang Pham
 
BERT: Bidirectional Encoder Representations from Transformers
Liangqun Lu
 
Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
[Paper review] BERT
JEE HYUN PARK
 
Enriching Word Vectors with Subword Information
Seonghyun Kim
 
Deep Learning for Machine Translation
Matīss ‎‎‎‎‎‎‎  
 
C# Common Type System & Common Language Specification
Prem Kumar Badri
 
Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...
Association for Computational Linguistics
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Seonghyun Kim
 
Probabilistic content models,
Bryan Gummibearehausen
 
Seq2seq Model to Tokenize the Chinese Language
Jinho Choi
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Jinho Choi
 

Similar to Language Interaction and Quality Issues: An Exploratory Study (20)

PDF
Python: The Programmer's Lingua Franca
ActiveState
 
PDF
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
David Beazley (Dabeaz LLC)
 
PPTX
A Large Scale Study of Multiple Programming Languages and Code Quality
Pavneet Singh Kochhar
 
PDF
Talk Lund University CS Department
ericupnorth
 
PDF
Programming Languages #devcon2013
Iván Montes
 
PPT
A Model-Based Approach to Language Integration
Marco Torchiano
 
PDF
Going to Mars with Groovy Domain-Specific Languages
Guillaume Laforge
 
PDF
Use maven in_right_way
Anton Naumov
 
PDF
Difference between xml and json
Umar Ali
 
PPTX
JVM languages "flame wars"
Gal Marder
 
PDF
Building DSLs: Marriage of High Essence and Groovy Metaprogramming
Skills Matter
 
PDF
talk at Virginia Bioinformatics Institute, December 5, 2013
ericupnorth
 
PPT
Indic threads pune12-polyglot & functional programming on jvm
IndicThreads
 
PDF
Domain Specific Languages
elliando dias
 
PDF
BCS SPA 2010 - An Introduction to Scala for Java Developers
Miles Sabin
 
PDF
An Introduction to Scala for Java Developers
Miles Sabin
 
KEY
Polyglot Grails
Marcin Gryszko
 
PDF
High Level Application Scripting With EFL and LuaJIT
Samsung Open Source Group
 
PDF
(1) cpp introducing the_cpp_programming_language
Nico Ludwig
 
PDF
Introducing the Ceylon Project
Michael Scovetta
 
Python: The Programmer's Lingua Franca
ActiveState
 
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
David Beazley (Dabeaz LLC)
 
A Large Scale Study of Multiple Programming Languages and Code Quality
Pavneet Singh Kochhar
 
Talk Lund University CS Department
ericupnorth
 
Programming Languages #devcon2013
Iván Montes
 
A Model-Based Approach to Language Integration
Marco Torchiano
 
Going to Mars with Groovy Domain-Specific Languages
Guillaume Laforge
 
Use maven in_right_way
Anton Naumov
 
Difference between xml and json
Umar Ali
 
JVM languages "flame wars"
Gal Marder
 
Building DSLs: Marriage of High Essence and Groovy Metaprogramming
Skills Matter
 
talk at Virginia Bioinformatics Institute, December 5, 2013
ericupnorth
 
Indic threads pune12-polyglot & functional programming on jvm
IndicThreads
 
Domain Specific Languages
elliando dias
 
BCS SPA 2010 - An Introduction to Scala for Java Developers
Miles Sabin
 
An Introduction to Scala for Java Developers
Miles Sabin
 
Polyglot Grails
Marcin Gryszko
 
High Level Application Scripting With EFL and LuaJIT
Samsung Open Source Group
 
(1) cpp introducing the_cpp_programming_language
Nico Ludwig
 
Introducing the Ceylon Project
Michael Scovetta
 
Ad

More from Marco Torchiano (13)

PPTX
Testing the UI of Mobile Applications
Marco Torchiano
 
PPTX
Software Engineering II Course at Politecnico di Torino
Marco Torchiano
 
PPTX
Espresso vs. EyeAutomate: comparing two generations of Android GUI testing tools
Marco Torchiano
 
PDF
Research Activities: past, present, and future.
Marco Torchiano
 
PPTX
Data Quality - Standards e Applicazioni
Marco Torchiano
 
PPTX
Data Quality - Standards and Application to Open Data
Marco Torchiano
 
PPTX
Data Visualization
Marco Torchiano
 
PDF
Riflessioni su Riforma Costituzionale "Renzi-Boschi"
Marco Torchiano
 
PDF
Relevance, Benefits, and Barriers of Software Modelling and Model Driven Tech...
Marco Torchiano
 
PDF
Energy Consumption Analysis
 of Image Encoding and Decoding Algorithms
Marco Torchiano
 
PPT
Relevance, Benefits, and Problems of Software Modelling and Model-Driven Tech...
Marco Torchiano
 
PPTX
On the computation of Truck Factor
Marco Torchiano
 
PPT
The impact of process maturity on defect density
Marco Torchiano
 
Testing the UI of Mobile Applications
Marco Torchiano
 
Software Engineering II Course at Politecnico di Torino
Marco Torchiano
 
Espresso vs. EyeAutomate: comparing two generations of Android GUI testing tools
Marco Torchiano
 
Research Activities: past, present, and future.
Marco Torchiano
 
Data Quality - Standards e Applicazioni
Marco Torchiano
 
Data Quality - Standards and Application to Open Data
Marco Torchiano
 
Data Visualization
Marco Torchiano
 
Riflessioni su Riforma Costituzionale "Renzi-Boschi"
Marco Torchiano
 
Relevance, Benefits, and Barriers of Software Modelling and Model Driven Tech...
Marco Torchiano
 
Energy Consumption Analysis
 of Image Encoding and Decoding Algorithms
Marco Torchiano
 
Relevance, Benefits, and Problems of Software Modelling and Model-Driven Tech...
Marco Torchiano
 
On the computation of Truck Factor
Marco Torchiano
 
The impact of process maturity on defect density
Marco Torchiano
 
Ad

Recently uploaded (20)

PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
July Patch Tuesday
Ivanti
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Complete Network Protection with Real-Time Security
L4RGINDIA
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
July Patch Tuesday
Ivanti
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Complete Network Protection with Real-Time Security
L4RGINDIA
 

Language Interaction and Quality Issues: An Exploratory Study

  • 1. Languages interaction and possible effects: an exploratory study Antonio Vetrò - Federico Tomassetti Marco Torchiano - Maurizio Morisio
  • 2. No one writes in a single language anymore. Even trivial applications have a general-purpose language, SQL, JavaScript, CSS, and dozens of frameworks, each of which includes an external DSL Wampler 2010
  • 3. How do those languages interact? Is that interaction problematic?
  • 4. Research questions RQ1 How much interaction is there between the languages used in a project? RQ2 Which language pairs interact more? RQ3 Are Cross Language Modules more defect- prone than Intra Language Modules?
  • 5. Plan • Define a measure for the level of interaction among languages • Investigate interaction vs. defect proneness • Perform a case study
  • 6. The Case Study Apache Hadoop, which is a software to support distributed data storage and processing. Used in many real applications (e.g., Yahoo, Facebook).
  • 7. Commit types Language A (.extA) Cross-Language Commit (CLC) Intra-Language Commit (ILC) Language B (.extB)
  • 8. RQ1 How much interaction is there between the languages present in a project? Metric: Percentage of Cross-Language Commits • All type of commits (RQ1.1) • Commits divided by activity type (e.g., improvement, bug fixing, new feature) (RQ1.2) All Bug Improv New Sub Task Test (RQ 1.1) ement Feature task 0.53 0.12 0.26 0.30 0.45 0.26 0.05
  • 9. Cross Language Ratio Language A (.extA) 3 out of 4 commits involving m are Cross-Language Cross Language Ratio of module m CLRm = 0.75 m Language C (.extC) Language B (.extB)
  • 10. Interaction level of a language • Cross language ratio of an extension (language)
  • 11. RQ2 Which extensions interact more? Metric: CLRext Considering one extension versus all the other extensions (RQ2.1) CLRext Nr files Extension 0.96 49 c 0.87 114 sh 0.72 75 properties 0.71 320 xml 0.59 4328 java
  • 12. Focusing on extension pairs Language A (.extA) 2 out of 3 commits involving m together with extA are Cross Language Cross Language Ratio of module m w.r.t extA CLRm,extA = 0.67 m Language C (.extC) Language B (.extB)
  • 13. Interaction level of a pair • Cross language ratio of an extension w.r.t. another extension – Asymmetrical measure!
  • 14. RQ2 Which extensions do interact more? Metric: CLRextA,extB Considering the most interacting ordered pairs of extensions (RQ2.2). extA/extB C Java Properties Sh C - 0.51 0.10 0.50 Java 0.01 - 0.28 0.04 Properties 0 0.54 - 0.36 Sh 0.09 0.22 0.24 - Xml 0.04 0.52 0.43 0.24
  • 15. Cross vs. Intra Lang Modules Cross Language Module (CLM): CLR is ≥ t% Intra Language Modules (ILM): CLR is < t% t = 50%
  • 16. RQ3 Are Cross Language Modules more defect-prone? Metric: Odds ratio of CLM with/without defects , ILM with/without defects - all module regardless of extension (RQ3.1) - by extension (RQ3.2) ILM ILM CLM CLM p-value OR no def. def. no def. def. all 1891 225 2875 89 <0.001 0.26 c 2 0 46 1 1.000 Inf java 1692 201 2239 25 <0.001 0.09 properties 19 1 45 7 0.429 2.92 sh 10 5 64 13 0.162 0.41 xml 96 11 184 24 0.851 1.14
  • 17. RQ3 Are Cross Language Modules more defect-prone? Metric: Odds ratio of CLM with/without defects , ILM with/without defects Considering interaction between specific ordered pairs of extensions (RQ3.3). C Java Properties sh XML C - Inf 0 0 Inf Java 2.79 - 0.32 0.43 0.96 Properties Inf 1 - 12.08 0.94 Sh 3.55 4.45 17.17 - 7.44 Xml 3.83 0.95 3.22 4.73 - In bold significant values
  • 18. Threats • Confounding factors: age and size of modules • Usage of proxy for interaction between artifacts • Apache Hadoop representativeness • Renaming of modules
  • 20. Language interaction depends on the type of activity
  • 21. Frequent interactions are generally not symmetric Many of them involve XML
  • 22. In general language interaction is not related to higher defect proneness, see Java Though several language pairs have CLMs significantly more defect prone then ILMs, see C
  • 23. Questions? Languages interaction and possible effects: an exploratory study Antonio Vetrò - Federico Tomassetti Marco Torchiano - Maurizio Morisio

Editor's Notes

  • #6: NB: file type = extension!!
  • #11: CLR_{m,ext} = \\frac{\\# CLC_{m,ext}}{\\# CLC_{m,ext} + \\# ILC_{m,ext}} CLR_{ext} = \\frac{\\displaystyle\\sum_{m \\in ext}{CLR_{m,ext}}}{\\# m \\in ext} CLR_{extA,extB} = \\frac{\\displaystyle\\sum_{m \\in extA}{CLR_{m,extB}}}{\\# m \\in extA}
  • #14: CLR_{m,ext} = \\frac{\\# CLC_{m,ext}}{\\# CLC_{m,ext} + \\# ILC_{m,ext}} CLR_{ext} = \\frac{\\displaystyle\\sum_{m \\in ext}{CLR_{m,ext}}}{\\# m \\in ext} CLR_{extA,extB} = \\frac{\\displaystyle\\sum_{m \\in extA}{CLR_{m,extB}}}{\\# m \\in extA}
  • #15: CLR_{m,ext} = \\frac{\\# CLC_{m,ext}}{\\# CLC_{m,ext} + \\# ILC_{m,ext}} CLR_{ext} = \\frac{\\displaystyle\\sum_{m \\in ext}{CLR_{m,ext}}}{\\# m \\in ext} CLR_{extA,extB} = \\frac{\\displaystyle\\sum_{m \\in extA}{CLR_{m,extB}}}{\\# m \\in extA}