Data Analysis
The inspection paradox occurs when sampling methods systematically oversample larger instances, leading to biased perceptions across various domains like class sizes, flight occupancy, and social networks. Through multiple real-world examples and data analysis, the phenomenon demonstrates how observers often experience skewed distributions that differ significantly from actual statistics. Statistical awareness of this paradox is crucial for accurate data interpretation and experimental design.
Satellogic operates a constellation of earth observation microsatellites, providing global imagery with up to 5-minute revisit times through their 300-satellite target deployment, and recently launched an open satellite feed program called Satellogic EarthView.
An analysis of French culinary networks using LeFooding.com reviews reveals over 5000 connections between restaurants and staff, mapped through advanced language models and data visualization techniques. The project demonstrates how LLMs can extract structured information from restaurant reviews to create an interactive network visualization, highlighting professional relationships in the French culinary scene.
Merlion is a comprehensive Python library for time series intelligence, offering end-to-end machine learning capabilities for forecasting, anomaly detection, and change point detection. The library features standardized data loading, diverse models, AutoML capabilities, and practical post-processing rules, while supporting both univariate and multivariate analysis with distributed computation via PySpark.
An analysis of 1,884 Oscar acceptance speeches reveals that contrary to popular belief, Harvey Weinstein was not thanked more than God, with God receiving thanks in 4.3% of speeches compared to Weinstein's 1.5%. Steven Spielberg emerged as the most-thanked living person, surpassing both God and Weinstein during specific decades.
An innovative spreadsheet application combining traditional spreadsheet functionality with Python data analysis and AI capabilities, leveraging OpenAI API and Pyodide for runtime execution. Built with Next.js 14 and TypeScript, it offers interactive data visualization through ECharts and intelligent suggestions through an AI-powered chat interface.
Telescope is a web application for exploring log data stored in ClickHouse databases, offering intuitive filtering, searching, and analysis capabilities. The platform provides multiple connection management, customizable visualizations, and GitHub-based authentication with permission controls. Currently in development, Telescope plans to implement additional features like custom SQL queries, live log trailing, and expanded authentication methods.
Google's Co-Scientist AI tool, powered by Gemini LLM, made headlines for supposedly solving a superbug problem in 48 hours, but it was later revealed that the solution was derived from previously published research. Similar patterns of overstated achievements were found in Google's other AI research claims, including drug discovery and materials synthesis.
An analytical study investigates the correlation between kebab restaurant quality and proximity to train stations in Paris using Google Places API and geospatial analysis. Despite thorough data collection of 400 establishments and complex spatial analysis, results showed only a weak correlation (0.091 Pearson coefficient), leaving the hypothesis largely unconfirmed.
A developer reverse-engineered League of Legends' replay system to extract high-fidelity gameplay data by decrypting game packets and emulating game engine functions, achieving better performance than existing approaches. The work demonstrates methods for accessing detailed match data including precise player positions, ability usage, and damage calculations that are not available through official APIs.
A new spreadsheet concept called 'Ambsheets' introduces 'amb values' that allow cells to hold multiple values simultaneously, enabling easier scenario exploration and comparison. The innovation improves upon traditional spreadsheet limitations and Excel's What-If Analysis by automatically computing all possible combinations while maintaining a seamless user interface.