Mapping the network structure of science parks: An exploratory study of cross-sectoral interactions reflected on the web (1301.4933v1)

Published 21 Jan 2013 in cs.DL and cs.CY

Abstract: This study introduces a method based on link analysis to investigate the structure of the R&D support infrastructure associated with science parks in order to determine whether this webometric approach gives plausible results. Three science parks from Yorkshire and the Humber in the UK were analysed with webometric and social network analysis techniques. Interlinking networks were generated through the combination of two different data sets extracted from three sources (Yahoo!, Bing, SocSciBot). These networks suggest that institutional sectors, representing business, universities and public bodies, are primarily tied together by a core formed by research institutions, support structure organisations and business developers. The comparison of the findings with traditional indicators suggests that the web-based networks reflect the offline conditions and policy measures adopted in the region, giving some evidence that the webometric approach is plausible to investigating science park networks. This is the first study that applies a web-based approach to investigate to what extent the science parks facilitate a closer interaction between the heterogeneous organisations that converge in R&D networks. This indicates that link analysis may help to get a first insight into the organisation of the R&D support infrastructure provided by science parks.

Citations (25)

View on Semantic Scholar

Summary

The paper demonstrates that combining multiple data sources and link dimensions produces robust webometric networks of science park ecosystems.
It employs social network analysis to uncover distinct structural patterns with academia serving as central brokers linking various sectors.
The study validates the methodology by correlating web-based network structures with offline R&D reports, offering actionable insights for innovation policy.

This paper introduces and evaluates a webometric methodology using hyperlink analysis to map the network structure of organizations associated with science parks (SPs) and explore the cross-sectoral interactions (Academia, Industry, Government) occurring within them (1301.4933). The primary goal was to determine if this web-based approach yields plausible insights into the real-world R&D support infrastructure facilitated by SPs.

Methodology

Case Selection: Three UK Science Parks from the Yorkshire and the Humber region were chosen: Advanced Manufacturing Park (AMP), Leeds Innovation Centre (LIC), and York Science Park (YSP). These parks were selected for their presence on the UKSPA list, having websites listing tenants with URLs, regional importance in R&D infrastructure investment, and heterogeneity in tenant types and sizes.
Initial Data Collection: The websites of the three SPs were crawled using the SocSciBot web crawler in May 2010 to identify site outlinks. These linked websites (potential tenants, partners, support organizations) formed the initial set of nodes for the network analysis. 215 outlinks were found, manually checked, and reduced to 183 unique organizations (domains/sub-domains) after classification by sector (Industry, Academia, Government) and type (e.g., Tenant, Support, Partnership). The three SP websites were also included, making a total of 186 websites analyzed.
Link Data Acquisition: To build the interaction network between these 186 organizations, comprehensive link data was gathered using multiple sources to improve coverage and reliability:
- Inlinks: Collected using Yahoo! Search via the LexiURL Searcher software. This tool helped manage query limits, retrieving up to 19,619 inlinks per query, totaling 337,911 raw inlinks (reduced to 183,006 unique domain/sub-domain level inlinks). This formed the IN-data set.
- Outlinks: Collected using a combination of:
  - SocSciBot: Crawling each site to a depth of two levels (collected 6,597 links).
  - Bing Search: Using LexiURL Searcher (collected 104,890 links). Bing complemented SocSciBot's limited crawl depth but couldn't retrieve outlinks from sub-domains.
- The combined outlink sources yielded 111,487 raw outlinks (reduced to 80,588 unique domain/sub-domain level outlinks). This formed the OUT-data set. The overlap between SocSciBot and Bing outlinks was low (average 4%).
Network Construction:
- Two initial adjacency matrices were built: one using the IN-data set (identifying links between the 186 sample websites found within the collected inlinks) and another using the OUT-data set (identifying links between the sample websites found within the collected outlinks).
- Link frequencies were dichotomized (presence/absence) to focus on the existence of connections rather than their intensity. Self-links were removed.
- A final, combined network was created by merging the links found in both the IN-data set and OUT-data set networks.
Analysis: Social Network Analysis (SNA) techniques were applied using UCINET software and visualized with NetDraw.
- Structural Analysis: Cohesion measures (Inclusiveness, Connectivity Gap, Density, Reciprocity) were used to compare the networks derived from the IN and OUT datasets. Pearson correlations compared the in-degree and out-degree centrality distributions between the two networks. Gini coefficients measured the inequality in link distribution.
- Local Analysis: Centrality measures (In-degree, Out-degree, Betweenness) were calculated for the combined network to identify key organizations. The interactions between different sectors and organization types were examined visually and quantitatively.
- Validation: The key features and patterns observed in the web-based networks were qualitatively compared with findings from official UK R&D reports and surveys (e.g., from HEFCE, BIS) concerning the Yorkshire and the Humber region.

Key Findings

Methodological Viability: The paper found that combining multiple data sources (search engines like Yahoo!/Bing and crawlers like SocSciBot) and both link dimensions (inlinks and outlinks) is crucial for constructing more robust and reliable webometric networks due to low source overlap and inherent biases/limitations of each tool/dimension. While the raw IN-data set was larger, the OUT-data set network often showed higher cohesion (e.g., better connectivity, inclusiveness). Combining both datasets significantly increased the number of identified links compared to using either alone.
Network Structure: The web-based networks revealed distinct structural patterns:
- Organizations tended to cluster by sector (Academia, Industry, Government).
- A core-periphery structure was common, with Academia (universities, associated research centers) and Support Structure Organizations (regional development agencies, public business support) forming the central core.
- Universities often acted as key brokers, linking different sectors and facilitating knowledge transfer, reflected in high centrality scores.
- Industry firms, particularly non-R&D intensive ones, tended to be on the periphery with fewer links to the core or each other. University spin-offs and R&D consultants showed better integration.
- Hybrid organizations (e.g., public-private partnerships, business developers) played important intermediary roles.
Offline Correlation: The web-based network structures generally aligned with known offline characteristics of the R&D landscape in the region. For instance:
- The central role of universities (Leeds, York) and specific research centers (e.g., AMRC at AMP) matched their real-world importance.
- The prominence of public support organizations (Yorkshire Forward, Business Link) reflected regional development policies and high public investment.
- Differences in the structure around LIC (stronger ties to a private IP firm, focused spin-offs) and YSP (broader range of firms, larger public support infrastructure via Science City York) reflected their different, documented knowledge transfer strategies.
- The relative isolation of firms mirrored findings from traditional surveys about low inter-firm collaboration in innovation.

Practical Implications and Implementation

Tooling: The methodology relies on:
- Web crawlers (e.g., SocSciBot, or modern equivalents like Scrapy in Python).
- Search engine APIs or web scraping tools (like LexiURL Searcher, or custom scripts using libraries like requests and BeautifulSoup in Python, though API access is now more restricted/paid).
- SNA software (e.g., UCINET, Gephi, NetworkX library in Python).
Process:

1. Define the scope (e.g., SP tenants, cluster members, regional actors). 2. Identify seed URLs (e.g., SP website, member directory). 3. Crawl seed URLs to get an initial list of organizations and their websites. 4. For each organization website, collect both inlinks and outlinks using multiple available sources (APIs, crawlers).

# Pseudocode/Conceptual Example using a hypothetical library
import network_analysis_tools as nat

seed_urls = ["http://sp1.com", "http://sp2.com"]
initial_orgs = nat.crawl_for_linked_orgs(seed_urls) # Returns list of org URLs

all_org_urls = seed_urls + initial_orgs
link_data = {} # Store links: {'org_url': {'inlinks': set(), 'outlinks': set()}}

for url in all_org_urls:
    link_data[url] = {
        'inlinks': nat.get_inlinks_from_source1(url).union(nat.get_inlinks_from_source2(url)),
        'outlinks': nat.get_outlinks_from_source1(url).union(nat.get_outlinks_from_source2(url))
    }

# Build adjacency matrix from links *between* orgs in all_org_urls
adj_matrix = nat.build_interlink_matrix(link_data, all_org_urls)

# Analyze matrix using SNA functions
centrality = nat.calculate_centrality(adj_matrix)
nat.visualize_network(adj_matrix)

5. Clean and process link data (normalize URLs, remove duplicates). 6. Construct the interlinking network matrix (adjacency matrix). 7. Analyze the network using SNA metrics and visualization. 8. Interpret results in the context of the specific innovation ecosystem.

Applications: This approach can be used by policymakers, economic development agencies, SP managers, or researchers for:
- Rapidly assessing the overall structure and connectivity of an innovation ecosystem.
- Identifying key players, influential organizations, and potential network brokers.
- Detecting potentially isolated groups or sectors.
- Monitoring changes in network structure over time.
- Generating hypotheses about offline collaborations and knowledge flows for further qualitative or quantitative investigation.

Limitations

The analysis is limited to organizations initially linked by the SPs and the links between them, potentially missing important external ties.
Direct hyperlinks may not capture all forms of offline relationships, particularly tacit knowledge transfer or confidential commercial ties.
Web presence and linking behavior can be influenced by factors other than R&D collaboration (e.g., marketing, website size, technical implementation), potentially biasing results towards larger or more web-savvy organizations (often public/academic).
The simple Industry-Academia-Government classification might obscure the roles of increasingly important hybrid organizations.

The paper concludes that webometrics, specifically interlink analysis using combined data sources, offers a viable exploratory tool for gaining initial insights into the complex network structures of science parks and similar R&D environments, complementing traditional analysis methods.

PDF Markdown