Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

D-VRE: From a Jupyter-enabled Private Research Environment to Decentralized Collaborative Research Ecosystem (2405.15392v2)

Published 24 May 2024 in cs.DC

Abstract: Today, scientific research is increasingly data-centric and compute-intensive, relying on data and models across distributed sources. However, it still faces challenges in the traditional cooperation mode, due to the high storage and computing cost, geo-location barriers, and local confidentiality regulations. The Jupyter environment has recently emerged and evolved as a vital virtual research environment for scientific computing, which researchers can use to scale computational analyses up to larger datasets and high-performance computing resources. Nevertheless, existing approaches lack robust support of a decentralized cooperation mode to unlock the full potential of decentralized collaborative scientific research, e.g., seamlessly secure data sharing. In this work, we change the basic structure and legacy norms of current research environments via the seamless integration of Jupyter with Ethereum blockchain capabilities. As such, it creates a Decentralized Virtual Research Environment (D-VRE) from private computational notebooks to decentralized collaborative research ecosystem. We propose a novel architecture for the D-VRE and prototype some essential D-VRE elements for enabling secure data sharing with decentralized identity, user-centric agreement-making, membership, and research asset management. To validate our method, we conducted an experimental study to test all functionalities of D-VRE smart contracts and their gas consumption. In addition, we deployed the D-VRE prototype on a test net of the Ethereum blockchain for demonstration. The feedback from the studies showcases the current prototype's usability, ease of use, and potential and suggests further improvements.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Blockchain: a tale of peer to peer security. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE; 2018. p. 609–617.
  2. Allan, R.N.. Virtual research environments: From portals to science gateways. Elsevier, 2009.
  3. Virtual research environments co-creation: The D4Science experience. Concurrency and Computation: Practice and Experience 2023;35(18):1–12. doi:10.1002/cpe.6925.
  4. Covid-19, neoliberalism and health systems in 30 european countries: relationship to deceases. Revista espanola de salud publica 2020;94:e202010140–e202010140.
  5. Research objects: Towards exchange and reuse of digital knowledge. Nature Precedings 2010;:1–1.
  6. Benet, J.. Ipfs-content addressed, versioned, p2p file system. arXiv preprint arXiv:14073561 2014;.
  7. Libraries, integrations and hubs for decentralized ai using ipfs. arXiv preprint arXiv:221016651 2022;.
  8. Measuring success for a future vision: Defining impact in science gateways/virtual research environments. Concurrency and Computation: Practice and Experience 2021;33(19):e6099.
  9. Virtual research environments: An overview and a research agenda. Data Science Journal 2013;12(August):75–81. doi:10.2481/dsj.GRDI-013.
  10. Cao, L.. Decentralized ai: Edge intelligence and smart blockchain, metaverse, web3, and desci. IEEE Intelligent Systems 2022;37(3):6–19.
  11. Seedmelab: Search, manage, share and visualize data, like never before. In: AGU Fall Meeting Abstracts. volume 2019; 2019. p. IN11B–13.
  12. Non-functional requirements in software engineering. volume 5. Springer Science & Business Media, 2012.
  13. COSMIC2: A science gateway for cryo-electron microscopy structure determination. ACM International Conference Proceeding Series 2017;Part F1287:13–17. doi:10.1145/3093338.3093390.
  14. A critical analysis of lifecycle models of the research process and research data management. Aslib Journal of Information Management 2018;70(2):142–157.
  15. The design and realisation of the experimentmy virtual research environment for social sharing of workflows. Future Generation Computer Systems 2009;25(5):561–567.
  16. Demeler, B.. Ultrascan: a comprehensive data analysis software package for analytical ultracentrifugation experiments. Modern analytical ultracentrifugation: techniques and methods 2005;10:210–229.
  17. Desci based on web3 and dao: A comprehensive overview and reference model. IEEE Transactions on Computational Social Systems 2022;9(5):1563–1573.
  18. The fair trade framework for assessing decentralised data solutions. In: Companion Proceedings of The 2019 World Wide Web Conference. 2019. p. 866–882.
  19. Dreyfuss, R.C.. Collaborative research: conflicts on authorship, ownership, and accountability. VAnD l reV 2000;53:1159.
  20. Knowledge sharing and discovery across heterogeneous research infrastructures. Open Research Europe 2021;1.
  21. The virtual research environment: towards a comprehensive analysis platform. arXiv preprint arXiv:230510166 2023;.
  22. Software-defined infrastructure for decentralized data lifecycle governance: principled design and open challenges. In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE; 2019. p. 1674–1683.
  23. A decentralized framework for cultivating research lifecycle transparency. PLoS ONE 2020;15(11 November):e0241496. URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0241496. doi:10.1371/journal.pone.0241496.
  24. Kacsuk, P.. Science gateways for distributed computing infrastructures. Springer International Publishing doi 2014;10:978–3.
  25. Opentopography: a services oriented architecture for community access to lidar topography. In: Proceedings of the 2nd international conference on computing for Geospatial Research & Applications. 2011. p. 1–8.
  26. Federating medical deep learning models from private jupyter notebooks to distributed institutions. Applied Sciences 2023;13(2):919.
  27. Science gateways today and tomorrow: positive perspectives of nearly 5000 members of the research community. Concurrency and Computation: Practice and Experience 2015;27(16):4252–4268.
  28. Using the metamask chrome extension. Beginning Ethereum Smart Contracts Programming: With Examples in Python, Solidity, and JavaScript 2019;:93–126.
  29. Functional requirements and use cases. Bredemeyer Consulting 2001;.
  30. Machine learning ensemble species distribution modeling of an endangered arid land tree tecomella undulata: a global appraisal. Arabian Journal of Geosciences 2023;16(2):131.
  31. Creating the cipres science gateway for inference of large phylogenetic trees. In: 2010 gateway computing environments workshop (GCE). Ieee; 2010. p. 1–8.
  32. The cipres science gateway: a community resource for phylogenetic analyses. In: Proceedings of the 2011 TeraGrid Conference: extreme digital discovery. 2011. p. 1–8.
  33. The future of citizen science: emerging technologies and shifting paradigms. Frontiers in Ecology and the Environment 2012;10(6):298–304.
  34. A science gateway for simulating the economics of carbon sequestration technologies: Simccs2. 0. In: Practice and Experience in Advanced Research Computing. 2020. p. 262–269.
  35. Perkel, J.M.. By jupyter, it all makes sense. Nature 2018;563(7729):145–146.
  36. A taxonomy and survey of fault-tolerant workflow management systems in cloud and distributed computing environments. Software architecture for big data and the cloud 2017;:285–320.
  37. Virtual laboratories for education in science, technology, and engineering: A review. Computers & Education 2016;95:309–327.
  38. Distributed ledger technology systems: A conceptual framework. Available at SSRN 3230013 2018;.
  39. Data-centric green artificial intelligence: A survey. IEEE Transactions on Artificial Intelligence 2023;.
  40. Evaluating fair digital object and linked data as distributed object systems. arXiv preprint arXiv:230607436 2023;.
  41. Csdms—a modeling system to aid sedimentary research. The Sedimentary Record 2011;9(1):4–9.
  42. Wwfedcbmir: world-wide federated content-based medical image retrieval. Bioengineering 2023;10(10):1144.
  43. XSEDE: Accelerating scientific discovery. Computing in Science and Engineering 2014;16(5):62–74. doi:10.1109/MCSE.2014.80.
  44. Voshmgir, S.. Token Economy: How the Web3 reinvents the internet. volume 2. Token Kitchen, 2020.
  45. Wang, S.. CyberGIS . International Encyclopedia of Geography 2017;:1–10doi:10.1002/9781118786352.wbieg0931.
  46. Towards a service-based adaptable data layer for cloud workflows. In: 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE; 2023. p. 904–911.
  47. Fair in action-a flexible framework to guide fairification. Scientific data 2023;10(1):291.
  48. The fair guiding principles for scientific data management and stewardship. Scientific data 2016;3(1):1–9.
  49. A cybergis-jupyter framework for geospatial analytics at scale. ACM International Conference Proceeding Series 2017;Part F1287(Ci). doi:10.1145/3093338.3093378.
  50. Notebook-as-a-vre (naavre): From private notebooks to a collaborative cloud virtual research environment. Software: Practice and Experience 2022;52(9):1947–1966.
  51. An overview on smart contracts: Challenges, advances and platforms. Future Generation Computer Systems 2020;105:475–491.
  52. Blockchain challenges and opportunities: A survey. International journal of web and grid services 2018;14(4):352–375.
  53. Deploying jupyter notebooks at scale on xsede resources for science gateways and workshops. In: Proceedings of the Practice and Experience on Advanced Research Computing. 2018. p. 1–7.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yuandou Wang (11 papers)
  2. Siamak Farshidi (7 papers)
  3. Sheejan Tripathi (1 paper)
  4. Zhiming Zhao (16 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com