تحليل البيانات
جزء من سلسلة عن الإحصاء |
تصوير البيانات |
---|
الفيزياء الحاسوبية |
---|
![]() |
تحليل البيانات • Visualization |
تحليل البيانات Data analysis، هي عملية فحص، تطهير، تحويل ونمذجة البيانات بهدف الكشف عن المعلومات المفيدة، استنباط الاستنتاجات ودعم اتخاذ القرارات. لتحليل البيانات جوانب وأساليب متعددة، تشمل تقنيات متنوعة تحت مجموعة متنوعة من الأسماء، ويستخدم في مجالات تجارية وعلمية وعلمية اجتماعية مختلفة. في عالم الأعمال اليوم، يلعب تحليل البيانات دوراً هاماً في اتخاذ قرارات أكثر علمية ومساعدة الشركات على العمل بشكل أكثر فعالية.[1]
التنقيب في البيانات هي تقنية معينة لتحليل البيانات تركز على نمذجة واكتشاف المعرفة لأغراض تنبؤية وليست وصفية بحتة، بينما يغطي ذكاء الأعمال تحليل البيانات الذي يعتمد بشكل كبير على التجميع، مع التركيز بشكل أساسي على معلومات الأعمال.[2]
أنواع تحليل البيانات
- تحليل وصفي: يُراد منه وصف ملخص للبيانات ولا يتطلب إيجاد تفسيرات لها، مثل ما يقدمه تحليل البيانات لإحصاء السكاني لبلد معين، حيث لا يقدم التحليل أكثر من خلاصة لما يشمله إستبيان الاحصاء من جنس، وعمر، وعنوان وغيرها.
- تحليل إستكشافي: تحليل البيانات الاستكشافي يحاول إيجاد علاقات، اكتشافات، ارتباطات، ميول من القياسات لعدة متغيرات بغرض إيجاد أفكار وفرضيات معينة. مثال على التحليل الاستكشافي هو ما قام به مجموعة من الهواة الذين حللوا بيانات فضائية كثيرة جمعها مقراب كبلر فوجدوا نظاماً شمسياً من أربعة كواكب من خلال تحليل خصائص الضوء.
- تحليل إستنتاجي: أحد أكثر تحليلات البيانات شيوعاً في البحوث العلمية، ويذهب إلى ما وراء التحليل الاستكشافي ليرى إن كانت الأنماط المكتشفة صالحة لكي تكون وراء مجاميع البيانات المتوفرة. مثال عليه كشف العلاقة بين التلوث البيئي ومتوسط العمر على مستوى الولايات في الولايات المتحدة. يقوم هذا التحليل بتقييس واحتساب العلاقات المختلفة بين القياسات المتوفرة.
- تحليل تنبؤي: بينما يقوم النوع السابق بتقييس العلاقات واحتساب قيمها، يقوم التحليل التنبؤي بتوقع قياسات معينة من قياسات موجودة. مثلاً ما تقوم به مؤسسات الاحصاء في تنبؤ نتيجة الانتخابات من خلال تحليل سلوك التنبؤ الذي تتم ملاحظته في الإستبيانات.
- تحليل سببي: يقوم هذا التحليل بإحتساب مقاييس معينة في حال تغير مقاييس أخرى، مثلاً إحتساب تأثير ممارسة طبية معينة على تقليل الإصابة بمرض معين.
- تحليل ميكانيكي: يقوم التحليل السببي السابق بإيجاد علاقة لها نسبة معينة من الحدوث وعلى أثر بيانات قد تكون ضخمة جداً، مثلاً على مدى عقود تقول البيانات أن التدخين يؤدي إلى الإصابة بالسرطان، لكن الأمر ليس مؤكداً فقد لا تموت بالسرطان رغم تدخينك. ما يقوم به التحليل الميكانيكي هو إيجاد علاقة مؤكدة وحتمين بين قياسين.
الأهداف
يهدف تحليل البيانات إلى اعداد ما يسمى بنموذج بيانات النظام. وتعتبر هذه العملية من الأنشطة الرئيسية لمرحلة التحليل وتتم نمذجة البيانات غالبا باستخدام النماذج البيانية،أي المخططات والرسوم التي تشبه إلى حد ما مخططات تدفق البيانات . [3]
عملية تحليل البيانات
تحديد متطلبات البيانات
هي الخطوة الأولي في تحليل البيانات ويقصد بها التعريف والتحديد بنوعية وكمية وغيرها من الأشياء المهمة المطلوب توافراها في البيانات المراد تحليلها.مثال: البيانات المطلوبة هل هي ارقام، نصوص ام صور، هل ستكون البيانات محسوبة لشخص واحد ام لكل الاشخاص في هذا المكان، .... الخ من المتطلبات.
تجميع البيانات

يتم فيها تجميع البيانات من مصادر مختلفة بحيث تحقق المتطالبات في الخطوة الاولي . ومن الممكن ان يقوم بجمعها اشخاص، او الحصول عليها من خلال التقنيات الحديثة مثل الاقمار الصناعية، اشارات المرور،الانترنت ... الخ.
تنظيم البيانات
بعد مرحلة تجميع البيانات تبدأ عملية توزيع البيانات في شكل جداول لها صفوف واعمد كما في ملفات Excel.
فحص البيانات
من الضروري فحص البيانات حتي لا تكون المعلومات الناتجة بها اخطاء وغير صحيحة. ويتم ذلك من خلال مراجعة البيانات وازالة او تصحيح المغلوطة . البيانات المغلوطة قد تكون ارقام غير صحيحة ،بيانات مكررة ،بيانات مرتبات ولكن يوجد بها حروف ابجدية. ومن الممكن التخلص من البيانات المغلوطة بازالة المكرر واعادة حساب الارقام وفي عملية تدخيل البيانات نتأكد ان البيانات المدخلة لها نفس النوع لنفس العمود.
اعداد النموذج المفاهيمي للبيانات
تسمى هذه الخطوة أيضا نمذجة بيانات النظام. ويتم خلالها بناء النموذج الذي يعكس الموضوعات(الأشياء) الرئيسية للبيانات ، وعلاقاتها مع بعضها البعض.ويسمى التحليل في هذا المستوى بتحليل المضمون أو المعنى.
تحليل العلاقات
ويتم فيه تحسين النموذج المفاهيمي بإعادة تصميم الكينونات بطريقة تقلل التكرارات وتحول الكينونات إلى علاقات مبسطة يمكن التعامل معها بمرونة وسهولة .وتسمى هذه العملية أيضا تسوية أو تطبيع البيانات وبناء النموذج العلاقي للبيانات.
تصميم قاعدة البيانات
وتهتم بتحويل النموذج العلاقاني إلى توصيف قاعدة بيانات النظام .
التواصل
Once data is analyzed, it may be reported in many formats to the users of the analysis to support their requirements.[5] The users may have feedback, which results in additional analysis.
When determining how to communicate the results, the analyst may consider implementing a variety of data visualization techniques to help communicate the message more clearly and efficiently to the audience. Data visualization uses information displays (graphics such as, tables and charts) to help communicate key messages contained in the data. Tables are a valuable tool by enabling the ability of a user to query and focus on specific numbers; while charts (e.g., bar charts or line charts), may help explain the quantitative messages contained in the data.[6]
الرسائل الكمومية
Stephen Few described eight types of quantitative messages that users may attempt to communicate from a set of data, including the associated graphs.[7][8]
- Time-series: A single variable is captured over a period of time, such as the unemployment rate over a 10-year period. A line chart may be used to demonstrate the trend.
- Ranking: Categorical subdivisions are ranked in ascending or descending order, such as a ranking of sales performance (the measure) by salespersons (the category, with each salesperson a categorical subdivision) during a single period. A bar chart may be used to show the comparison across the salespersons.[9]
- Part-to-whole: Categorical subdivisions are measured as a ratio to the whole (i.e., a percentage out of 100%). A pie chart or bar chart can show the comparison of ratios, such as the market share represented by competitors in a market.[10]
- Deviation: Categorical subdivisions are compared against a reference, such as a comparison of actual vs. budget expenses for several departments of a business for a given time period. A bar chart can show the comparison of the actual versus the reference amount.[11]
- Frequency distribution: Shows the number of observations of a particular variable for a given interval, such as the number of years in which the stock market return is between intervals such as 0–10%, 11–20%, etc. A histogram, a type of bar chart, may be used for this analysis.
- Correlation: Comparison between observations represented by two variables (X,Y) to determine if they tend to move in the same or opposite directions. For example, plotting unemployment (X) and inflation (Y) for a sample of months. A scatter plot is typically used for this message.[12]
- Nominal comparison: Comparing categorical subdivisions in no particular order, such as the sales volume by product code. A bar chart may be used for this comparison.[13]
- Geographic or geo-spatial: Comparison of a variable across a map or layout, such as the unemployment rate by state or the number of persons on the various floors of a building. A cartogram is typically used.[7]
تقنيات تحليل البيانات الكمومية
- Check raw data for anomalies prior to performing your analysis;
- Re-perform important calculations, such as verifying columns of data that are formula driven;
- Confirm main totals are the sum of subtotals;
- Check relationships between numbers that should be related in a predictable way, such as ratios over time;
- Normalize numbers to make comparisons easier, such as analyzing amounts per person or relative to GDP or as an index value relative to a base year;
- Break problems into component parts by analyzing factors that led to the results, such as DuPont analysis of return on equity.[14]
الأنشطة التحليلية لمستخدمي البيانات
# | المهمة | وصف عام |
Pro Forma Abstract |
أمثلة |
---|---|---|---|---|
1 | Retrieve Value | Given a set of specific cases, find attributes of those cases. | What are the values of attributes {X, Y, Z, ...} in the data cases {A, B, C, ...}? | - What is the mileage per gallon of the Ford Mondeo?
- How long is the movie Gone with the Wind? |
2 | Filter | Given some concrete conditions on attribute values, find data cases satisfying those conditions. | Which data cases satisfy conditions {A, B, C...}? | - What Kellogg's cereals have high fiber?
- What comedies have won awards? - Which funds underperformed the SP-500? |
3 | Compute Derived Value | Given a set of data cases, compute an aggregate numeric representation of those data cases. | What is the value of aggregation function F over a given set S of data cases? | - What is the average calorie content of Post cereals?
- What is the gross income of all stores combined? - How many manufacturers of cars are there? |
4 | Find Extremum | Find data cases possessing an extreme value of an attribute over its range within the data set. | What are the top/bottom N data cases with respect to attribute A? | - What is the car with the highest MPG?
- What director/film has won the most awards? - What Marvel Studios film has the most recent release date? |
5 | Sort | Given a set of data cases, rank them according to some ordinal metric. | What is the sorted order of a set S of data cases according to their value of attribute A? | - Order the cars by weight.
- Rank the cereals by calories. |
6 | Determine Range | Given a set of data cases and an attribute of interest, find the span of values within the set. | What is the range of values of attribute A in a set S of data cases? | - What is the range of film lengths?
- What is the range of car horsepowers? - What actresses are in the data set? |
7 | Characterize Distribution | Given a set of data cases and a quantitative attribute of interest, characterize the distribution of that attribute’s values over the set. | What is the distribution of values of attribute A in a set S of data cases? | - What is the distribution of carbohydrates in cereals?
- What is the age distribution of shoppers? |
8 | Find Anomalies | Identify any anomalies within a given set of data cases with respect to a given relationship or expectation, e.g. statistical outliers. | Which data cases in a set S of data cases have unexpected/exceptional values? | - Are there exceptions to the relationship between horsepower and acceleration?
- Are there any outliers in protein? |
9 | Cluster | Given a set of data cases, find clusters of similar attribute values. | Which data cases in a set S of data cases are similar in value for attributes {X, Y, Z, ...}? | - Are there groups of cereals w/ similar fat/calories/sugar?
- Is there a cluster of typical film lengths? |
10 | Correlate | Given a set of data cases and two attributes, determine useful relationships between the values of those attributes. | What is the correlation between attributes X and Y over a given set S of data cases? | - Is there a correlation between carbohydrates and fat?
- Is there a correlation between country of origin and MPG? - Do different genders have a preferred payment method? - Is there a trend of increasing film length over the years? |
11 | Contextualization[15] | Given a set of data cases, find contextual relevancy of the data to the users. | Which data cases in a set S of data cases are relevant to the current users' context? | - Are there groups of restaurants that have foods based on my current caloric intake? |
عقبات التحليل الفعال
Barriers to effective analysis may exist among the analysts performing the data analysis or among the audience. Distinguishing fact from opinion, cognitive biases, and innumeracy are all challenges to sound data analysis.[16]
الخلط بين الحقيقة والرأي
Effective analysis requires obtaining relevant facts to answer questions, support a conclusion or formal opinion, or test hypotheses.[17] Facts by definition are irrefutable, meaning that any person involved in the analysis should be able to agree upon them. The auditor of a public company must arrive at a formal opinion on whether financial statements of publicly traded corporations are "fairly stated, in all material respects".[18] This requires extensive analysis of factual data and evidence to support their opinion.
التحيزات المعرفية
There are a variety of cognitive biases that can adversely affect analysis. For example, confirmation bias is the tendency to search for or interpret information in a way that confirms one's preconceptions.[19] In addition, individuals may discredit information that does not support their views.[20]
Analysts may be trained specifically to be aware of these biases and how to overcome them.[21] In his book Psychology of Intelligence Analysis, retired CIA analyst Richards Heuer wrote that analysts should clearly delineate their assumptions and chains of inference and specify the degree and source of the uncertainty involved in the conclusions.[22] He emphasized procedures to help surface and debate alternative points of view.[23]
عدم معرفة القواعد الحسابية
Effective analysts are generally adept with a variety of numerical techniques. However, audiences may not have such literacy with numbers or numeracy; they are said to be innumerate.[24] Persons communicating the data may also be attempting to mislead or misinform, deliberately using bad numerical techniques.[25]
For example, whether a number is rising or falling may not be the key factor. More important may be the number relative to another number, such as the size of government revenue or spending relative to the size of the economy (GDP) or the amount of cost relative to revenue in corporate financial statements.[26] This numerical technique is referred to as normalization[14] or common-sizing. There are many such techniques employed by analysts, whether adjusting for inflation (i.e., comparing real vs. nominal data) or considering population increases, demographics, etc.[27]
Analysts may also analyze data under different assumptions or scenarios. For example, when analysts perform financial statement analysis, they will often recast the financial statements under different assumptions to help arrive at an estimate of future cash flow, which they then discount to present value based on some interest rate, to determine the valuation of the company or its stock.[28] Similarly, the CBO analyzes the effects of various policy options on the government's revenue, outlays and deficits, creating alternative future scenarios for key measures.[29]
موضوعات أخرى
التحليل وذكاء الأعمال
Analytics is the "extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions." It is a subset of business intelligence, which is a set of technologies and processes that uses data to understand and analyze business performance to drive decision-making.[30]
التعليم
In education, most educators have access to a data system for the purpose of analyzing student data.[31] These data systems present data to educators in an over-the-counter data format (embedding labels, supplemental documentation, and a help system and making key package/display and content decisions) to improve the accuracy of educators' data analyses.[32]
ملاحظات الممارس
This section contains rather technical explanations that may assist practitioners but are beyond the typical scope of a Wikipedia article.[33]
التحليل الأولي للبيانات
The most important distinction between the initial data analysis phase and the main analysis phase is that during initial data analysis one refrains from any analysis that is aimed at answering the original research question. The initial data analysis phase is guided by the following four questions:[34]
Quality of data
The quality of the data should be checked as early as possible. Data quality can be assessed in several ways, using different types of analysis: frequency counts, descriptive statistics (mean, standard deviation, median), normality (skewness, kurtosis, frequency histograms), normal imputation is needed.[35]
- Analysis of extreme observations: outlying observations in the data are analyzed to see if they seem to disturb the distribution.[36]
- Comparison and correction of differences in coding schemes: variables are compared with coding schemes of variables external to the data set, and possibly corrected if coding schemes are not comparable.[37]
- Test for common-method variance. The choice of analyses to assess the data quality during the initial data analysis phase depends on the analyses that will be conducted in the main analysis phase.[38]
Quality of measurements
The quality of the measurement instruments should only be checked during the initial data analysis phase when this is not the focus or research question of the study.[39] One should check whether structure of measurement instruments corresponds to structure reported in the literature.
There are two ways to assess measurement quality:
- Confirmatory factor analysis
- Analysis of homogeneity (internal consistency), which gives an indication of the reliability of a measurement instrument.[40] During this analysis, one inspects the variances of the items and the scales, the Cronbach's α of the scales, and the change in the Cronbach's alpha when an item would be deleted from a scale[41]
Initial transformations
After assessing the quality of the data and of the measurements, one might decide to impute missing data, or to perform initial transformations of one or more variables, although this can also be done during the main analysis phase.[42]
Possible transformations of variables are:[43]
- Square root transformation (if the distribution differs moderately from normal)
- Log-transformation (if the distribution differs substantially from normal)
- Inverse transformation (if the distribution differs severely from normal)
- Make categorical (ordinal / dichotomous) (if the distribution differs severely from normal, and no transformations help)
Did the implementation of the study fulfill the intentions of the research design?
One should check the success of the randomization procedure, for instance by checking whether background and substantive variables are equally distributed within and across groups. If the study did not need or use a randomization procedure, one should check the success of the non-random sampling, for instance by checking whether all subgroups of the population of interest are represented in the sample.[44]
Other possible data distortions that should be checked are:
- dropout (this should be identified during the initial data analysis phase)
- Item non-response (whether this is random or not should be assessed during the initial data analysis phase)
- Treatment quality (using manipulation checks).[45]
Characteristics of data sample
In any report or article, the structure of the sample must be accurately described. It is especially important to exactly determine the size of the subgroup when subgroup analyses will be performed during the main analysis phase.[46]
The characteristics of the data sample can be assessed by looking at:
- Basic statistics of important variables
- Scatter plots
- Correlations and associations
- Cross-tabulations[47]
Final stage of the initial data analysis
During the final stage, the findings of the initial data analysis are documented, and necessary, preferable, and possible corrective actions are taken. Also, the original plan for the main data analyses can and should be specified in more detail or rewritten. In order to do this, several decisions about the main data analyses can and should be made:
- In the case of non-normals: should one transform variables; make variables categorical (ordinal/dichotomous); adapt the analysis method?
- In the case of missing data: should one neglect or impute the missing data; which imputation technique should be used?
- In the case of outliers: should one use robust analysis techniques?
- In case items do not fit the scale: should one adapt the measurement instrument by omitting items, or rather ensure comparability with other (uses of the) measurement instrument(s)?
- In the case of (too) small subgroups: should one drop the hypothesis about inter-group differences, or use small sample techniques, like exact tests or bootstrapping?
- In case the randomization procedure seems to be defective: can and should one calculate propensity scores and include them as covariates in the main analyses?[48]
Analysis
Several analyses can be used during the initial data analysis phase:[49]
- Univariate statistics (single variable)
- Bivariate associations (correlations)
- Graphical techniques (scatter plots)
It is important to take the measurement levels of the variables into account for the analyses, as special statistical techniques are available for each level:[50]
- Nominal and ordinal variables
- Frequency counts (numbers and percentages)
- Associations
- circumambulations (crosstabulations)
- hierarchical loglinear analysis (restricted to a maximum of 8 variables)
- loglinear analysis (to identify relevant/important variables and possible confounders)
- Exact tests or bootstrapping (in case subgroups are small)
- Computation of new variables
- Continuous variables
- Distribution
- Statistics (M, SD, variance, skewness, kurtosis)
- Stem-and-leaf displays
- Box plots
- Distribution
Nonlinear analysis
Nonlinear analysis is often necessary when the data is recorded from a nonlinear system. Nonlinear systems can exhibit complex dynamic effects including bifurcations, chaos, harmonics and subharmonics that cannot be analyzed using simple linear methods. Nonlinear data analysis is closely related to nonlinear system identification.[51]
التحليل الرئيسي للبيانات
In the main analysis phase, analyses aimed at answering the research question are performed as well as any other relevant analysis needed to write the first draft of the research report.[52]
Exploratory and confirmatory approaches
In the main analysis phase, either an exploratory or confirmatory approach can be adopted. Usually the approach is decided before data is collected.[53] In an exploratory analysis no clear hypothesis is stated before analysing the data, and the data is searched for models that describe the data well.[54] In a confirmatory analysis, clear hypotheses about the data are tested.[55]
Exploratory data analysis should be interpreted carefully. When testing multiple models at once there is a high chance on finding at least one of them to be significant, but this can be due to a type 1 error. It is important to always adjust the significance level when testing multiple models with, for example, a Bonferroni correction.[56] Also, one should not follow up an exploratory analysis with a confirmatory analysis in the same dataset.[57] An exploratory analysis is used to find ideas for a theory, but not to test that theory as well.[57] When a model is found exploratory in a dataset, then following up that analysis with a confirmatory analysis in the same dataset could simply mean that the results of the confirmatory analysis are due to the same type 1 error that resulted in the exploratory model in the first place.[57] The confirmatory analysis therefore will not be more informative than the original exploratory analysis.[58]
استقرار النتائج
It is important to obtain some indication about how generalizable the results are.[59] While this is often difficult to check, one can look at the stability of the results. Are the results reliable and reproducible? There are two main ways of doing that.
- Cross-validation. By splitting the data into multiple parts, we can check if an analysis (like a fitted model) based on one part of the data generalizes to another part of the data as well.[60] Cross-validation is generally inappropriate, though, if there are correlations within the data, e.g. with panel data.[61] Hence other methods of validation sometimes need to be used. For more on this topic, see statistical model validation.[62]
- Sensitivity analysis. A procedure to study the behavior of a system or model when global parameters are (systematically) varied. One way to do that is via bootstrapping.[63]
برمجيات حرة لتحليل البيانات
- DevInfo – a database system endorsed by the United Nations Development Group for monitoring and analyzing human development.
- ELKI – data mining framework in Java with data mining oriented visualization functions.
- KNIME – the Konstanz Information Miner, a user friendly and comprehensive data analytics framework.
- أورانج – A visual programming tool featuring interactive data visualization and methods for statistical data analysis, data mining, and machine learning.
- Pandas – Python library for data analysis
- PAW – FORTRAN/C data analysis framework developed at CERN
- R – a programming language and software environment for statistical computing and graphics.
- ROOT – C++ data analysis framework developed at CERN
- SciPy – Python library for data analysis
المسابقات الدولية لتحليل البيانات
انظر أيضاً
- Actuarial science
- Analytics
- Big data
- Business intelligence
- Censoring (statistics)
- Computational physics
- Data acquisition
- Data blending
- Data governance
- Data mining
- Data Presentation Architecture
- Data science
- Digital signal processing
- Dimension reduction
- Early case assessment
- Exploratory data analysis
- Fourier analysis
- Machine learning
- Multilinear PCA
- Multilinear subspace learning
- Multiway data analysis
- Nearest neighbor search
- Nonlinear system identification
- Predictive analytics
- Principal component analysis
- Qualitative research
- Scientific computing
- Structured data analysis (statistics)
- System identification
- Test method
- Text analytics
- Unstructured data
- Wavelet
المصادر
الهوامش
- ^ Xia, B. S., & Gong, P. (2015). Review of business intelligence through data analysis. Benchmarking, 21(2), 300-311. doi:10.1108/BIJ-08-2012-0050
- ^ Exploring Data Analysis
- ^ Adèr, 2008, p. 334-335.
- ^ Grandjean, Martin (2014). "La connaissance est un réseau" (PDF). Les Cahiers du Numérique. 10 (3): 37–54. doi:10.3166/lcn.10.3.37-54. Archived (PDF) from the original on 2015-09-27. Retrieved 2015-05-05.
- ^ Data requirements for semiconductor die. Exchange data formats and data dictionary, BSI British Standards, doi:, http://dx.doi.org/10.3403/02271298, retrieved on 2021-05-31
- ^ Visualizing Data About UK Museums: Bar Charts, Line Charts and Heat Maps. 2021. doi:10.4135/9781529768749. ISBN 9781529768749. S2CID 240967380.
- ^ أ ب "Stephen Few-Perceptual Edge-Selecting the Right Graph for Your Message-2004" (PDF). Archived (PDF) from the original on 2014-10-05. Retrieved 2014-10-29.
- ^ "Stephen Few-Perceptual Edge-Graph Selection Matrix" (PDF). Archived (PDF) from the original on 2014-10-05. Retrieved 2014-10-29.
- ^ Swamidass, P. M. (2000). "X-Bar Chart". Encyclopedia of Production and Manufacturing Management. p. 841. doi:10.1007/1-4020-0612-8_1063. ISBN 978-0-7923-8630-8.
- ^ "Chart C5.3. Percentage of 15-19 year-olds not in education, by labour market status (2012)". doi:10.1787/888933119055. Retrieved 2021-06-03.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ "Chart 7: Households: final consumption expenditure versus actual individual consumption". doi:10.1787/665527077310. Retrieved 2021-06-03.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ Garnier, Elodie M.; Fouret, Nastasia; Descoins, Médéric (3 February 2020). "Table 2: Graph comparison between Scatter plot, Violin + Scatter plot, Heatmap and ViSiElse graph". PeerJ. 8: e8341. doi:10.7717/peerj.8341/table-2.
- ^ "Product comparison chart: Wearables". PsycEXTRA Dataset. 2009. doi:10.1037/e539162010-006. Retrieved 2021-06-03.
- ^ أ ب خطأ استشهاد: وسم
<ref>
غير صحيح؛ لا نص تم توفيره للمراجع المسماةKoomey1
- ^ خطأ استشهاد: وسم
<ref>
غير صحيح؛ لا نص تم توفيره للمراجع المسماةConTaaS
- ^ "Connectivity tool transfers data among database and statistical products". Computational Statistics & Data Analysis. 8 (2): 224. July 1989. doi:10.1016/0167-9473(89)90021-2. ISSN 0167-9473.
- ^ Information relevant to your job, Routledge, 2007-07-11, pp. 48–54, doi: , ISBN 978-0-08-054430-4, http://dx.doi.org/10.4324/9780080544304-16, retrieved on 2021-06-03
- ^ Gordon, Roger (March 1990). "Do Publicly Traded Corporations Act in the Public Interest?". National Bureau of Economic Research Working Papers. Cambridge, MA. doi:10.3386/w3303.
- ^ Rivard, Jillian R (2014). Confirmation bias in witness interviewing: Can interviewers ignore their preconceptions? (Thesis). Florida International University. doi:10.25148/etd.fi14071109.
- ^ Papineau, David (1988), Does the Sociology of Science Discredit Science?, Dordrecht: Springer Netherlands, pp. 37–57, doi: , ISBN 978-94-010-7795-8, http://dx.doi.org/10.1007/978-94-009-2877-0_2, retrieved on 2021-06-03
- ^ Bromme, Rainer; Hesse, Friedrich W.; Spada, Hans, eds. (2005). Barriers and Biases in Computer-Mediated Knowledge Communication. doi:10.1007/b105100. ISBN 978-0-387-24317-7.
- ^ Heuer, Richards (2019-06-10). Heuer, Richards J (ed.). Quantitative Approaches to Political Intelligence. doi:10.4324/9780429303647. ISBN 9780429303647. S2CID 145675822.
- ^ "Introduction" (PDF). Central Intelligence Agency. Archived (PDF) from the original on 2021-10-25. Retrieved 2021-10-25.
- ^ "Figure 6.7. Differences in literacy scores across OECD countries generally mirror those in numeracy". doi:10.1787/888934081549. Retrieved 2021-06-03.
- ^ Ritholz, Barry. "Bad Math that Passes for Insight". Bloomberg View. Archived from the original on 2014-10-29. Retrieved 2014-10-29.
- ^ Gusnaini, Nuriska; Andesto, Rony; Ermawati (2020-12-15). "The Effect of Regional Government Size, Legislative Size, Number of Population, and Intergovernmental Revenue on The Financial Statements Disclosure". European Journal of Business and Management Research. 5 (6). doi:10.24018/ejbmr.2020.5.6.651. ISSN 2507-1076. S2CID 231675715.
- ^ Taura, Toshiharu; Nagai, Yukari (2011). "Comparing Nominal Groups to Real Teams". Design Creativity 2010. London: Springer-Verlag London. pp. 165–171. ISBN 978-0-85729-223-0.
- ^ Gross, William H. (July 1979). "Coupon Valuation and Interest Rate Cycles". Financial Analysts Journal. 35 (4): 68–71. doi:10.2469/faj.v35.n4.68. ISSN 0015-198X.
- ^ "25. General government total outlays". doi:10.1787/888932348795. Retrieved 2021-06-03.
- ^ Davenport, Thomas; Harris, Jeanne (2007). Competing on Analytics. O'Reilly. ISBN 978-1-4221-0332-6.
- ^ Aarons, D. (2009). Report finds states on course to build pupil-data systems. Education Week, 29(13), 6.
- ^ Rankin, J. (2013, March 28). How data Systems & reports can either fight or propagate the data analysis error epidemic, and how educator leaders can help. Archived 2019-03-26 at the Wayback Machine Presentation conducted from Technology Information Center for Administrative Leadership (TICAL) School Leadership Summit.
- ^ Brödermann, Eckart J. (2018), Article 2.2.1 (Scope of the Section), Nomos Verlagsgesellschaft mbH & Co. KG, pp. 525, doi: , ISBN 978-3-8452-7656-4, http://dx.doi.org/10.5771/9783845276564-525, retrieved on 2021-06-03
- ^ Adèr 2008a, p. 337.
- ^ Kjell, Oscar N. E.; Thompson, Sam (19 December 2013). "Descriptive statistics indicating the mean, standard deviation and frequency of missing values for each condition (N = number of participants), and for the dependent variables (DV)". PeerJ. 1: e231. doi:10.7717/peerj.231/table-1.
- ^ Practice for Dealing With Outlying Observations, ASTM International, doi:, http://dx.doi.org/10.1520/e0178-16a, retrieved on 2021-06-03
- ^ Alternative Coding Schemes for Dummy Variables, Newbury Park, CA: SAGE Publications, Inc., 1993, pp. 64–75, doi: , ISBN 978-0-8039-5128-0, http://dx.doi.org/10.4135/9781412985628.n5, retrieved on 2021-06-03
- ^ Adèr 2008a, pp. 338-341.
- ^ Newman, Isadore (1998). Qualitative-quantitative research methodology : exploring the interactive continuum. Southern Illinois University Press. ISBN 0-585-17889-5. OCLC 44962443.
- ^ Terwilliger, James S.; Lele, Kaustubh (June 1979). "Some Relationships Among Internal Consistency, Reproducibility, and Homogeneity". Journal of Educational Measurement. 16 (2): 101–108. doi:10.1111/j.1745-3984.1979.tb00091.x. ISSN 0022-0655.
- ^ Adèr 2008a, pp. 341-342.
- ^ Adèr 2008a, p. 344.
- ^ Tabachnick & Fidell, 2007, p. 87-88.
- ^ Random sampling and randomization procedures, BSI British Standards, doi:, http://dx.doi.org/10.3403/30137438, retrieved on 2021-06-03
- ^ Adèr 2008a, pp. 344-345.
- ^ Foth, Christian; Hedrick, Brandon P.; Ezcurra, Martin D. (18 January 2016). "Figure 4: Centroid size regression analyses for the main sample". PeerJ. 4: e1589. doi:10.7717/peerj.1589/fig-4.
- ^ Adèr 2008a, p. 345.
- ^ Adèr 2008a, pp. 345-346.
- ^ Adèr 2008a, pp. 346-347.
- ^ Adèr 2008a, pp. 349-353.
- ^ Billings S.A. "Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains". Wiley, 2013
- ^ Adèr 2008b, p. 363.
- ^ Exploratory Data Analysis, Hoboken, NJ, USA: John Wiley & Sons, Inc., 2017-10-13, pp. 119–138, doi: , ISBN 978-1-119-12680-5, http://dx.doi.org/10.1002/9781119126805.ch4, retrieved on 2021-06-03
- ^ Engaging in Exploratory Data Analysis, Visualization, and Hypothesis Testing – Exploratory Data Analysis, Geovisualization, and Data, CRC Press, 2015-07-28, pp. 106–139, doi: , ISBN 978-0-429-06936-9, http://dx.doi.org/10.1201/b18808-8, retrieved on 2021-06-03
- ^ Hypotheses About Categories, London: SAGE Publications Ltd, 2010, pp. 138–151, doi: , ISBN 978-1-84920-098-1, http://dx.doi.org/10.4135/9781446287873.n14, retrieved on 2021-06-03
- ^ Liquet, Benoit; Riou, Jérémie (2013-06-08). "Correction of the significance level when attempting multiple transformations of an explanatory variable in generalized linear models". BMC Medical Research Methodology. 13 (1): 75. doi:10.1186/1471-2288-13-75. ISSN 1471-2288. PMC 3699399. PMID 23758852.
- ^ أ ب ت Mcardle, John J. (2008). "Some ethical issues in confirmatory versus exploratory analysis". PsycEXTRA Dataset. doi:10.1037/e503312008-001. Retrieved 2021-06-03.
- ^ Adèr 2008b, pp. 361-362.
- ^ Adèr 2008b, pp. 361-371.
- ^ Benson, Noah C; Winawer, Jonathan (December 2018). "Bayesian analysis of retinotopic maps". eLife. 7. doi:10.7554/elife.40224. PMC 6340702. PMID 30520736. Supplementary file 1. Cross-validation schema. DOI:10.7554/elife.40224.014
- ^ Hsiao, Cheng (2014), Cross-Sectionally Dependent Panel Data, Cambridge: Cambridge University Press, pp. 327–368, doi: , ISBN 978-1-139-83932-7, http://dx.doi.org/10.1017/cbo9781139839327.012, retrieved on 2021-06-03
- ^ Hjorth, J.S. Urban (2017-10-19), Cross validation, Chapman and Hall/CRC, pp. 24–56, doi: , ISBN 978-1-315-14005-6, http://dx.doi.org/10.1201/9781315140056-3, retrieved on 2021-06-03
- ^ Sheikholeslami, Razi; Razavi, Saman; Haghnegahdar, Amin (2019-10-10). "What should we do when a model crashes? Recommendations for global sensitivity analysis of Earth and environmental systems models". Geoscientific Model Development. 12 (10): 4275–4296. Bibcode:2019GMD....12.4275S. doi:10.5194/gmd-12-4275-2019. ISSN 1991-9603. S2CID 204900339.
- ^ "The machine learning community takes on the Higgs". Symmetry Magazine. July 15, 2014. Retrieved 14 January 2015.
- ^ Nehme, Jean (September 29, 2016). "LTPP International Data Analysis Contest". Federal Highway Administration. Retrieved October 22, 2017.
- ^ "Data.Gov:Long-Term Pavement Performance (LTPP)". May 26, 2016. Retrieved November 10, 2017.
المراجع
- Adèr, Herman J. (2008a). "Chapter 14: Phases and initial steps in data analysis". In Adèr, Herman J.; Mellenbergh, Gideon J.; Hand, David J (eds.). Advising on research methods : a consultant's companion. Huizen, Netherlands: Johannes van Kessel Pub. pp. 333–356. ISBN 9789079418015. OCLC 905799857.
{{cite book}}
: Invalid|ref=harv
(help); Unknown parameter|publicationplace=
ignored (|publication-place=
suggested) (help) - Adèr, Herman J. (2008b). "Chapter 15: The main analysis phase". In Adèr, Herman J.; Mellenbergh, Gideon J.; Hand, David J (eds.). Advising on research methods : a consultant's companion. Huizen, Netherlands: Johannes van Kessel Pub. pp. 357–386. ISBN 9789079418015. OCLC 905799857.
{{cite book}}
: Invalid|ref=harv
(help); Unknown parameter|publicationplace=
ignored (|publication-place=
suggested) (help) - Tabachnick, B.G. & Fidell, L.S. (2007). Chapter 4: Cleaning up your act. Screening data prior to analysis. In B.G. Tabachnick & L.S. Fidell (Eds.), Using Multivariate Statistics, Fifth Edition (pp. 60–116). Boston: Pearson Education, Inc. / Allyn and Bacon.
قراءات إضافية
- Adèr, H.J. & Mellenbergh, G.J. (with contributions by D.J. Hand) (2008). Advising on Research Methods: A Consultant's Companion. Huizen, the Netherlands: Johannes van Kessel Publishing.
- Chambers, John M.; Cleveland, William S.; Kleiner, Beat; Tukey, Paul A. (1983). Graphical Methods for Data Analysis, Wadsworth/Duxbury Press. ISBN 0-534-98052-X
- Fandango, Armando (2008). Python Data Analysis, 2nd Edition. Packt Publishers.
- Juran, Joseph M.; Godfrey, A. Blanton (1999). Juran's Quality Handbook, 5th Edition. New York: McGraw Hill. ISBN 0-07-034003-X
- Lewis-Beck, Michael S. (1995). Data Analysis: an Introduction, Sage Publications Inc, ISBN 0-8039-5772-6
- NIST/SEMATECH (2008) Handbook of Statistical Methods,
- Pyzdek, T, (2003). Quality Engineering Handbook, ISBN 0-8247-4614-7
- Richard Veryard (1984). Pragmatic Data Analysis. Oxford : Blackwell Scientific Publications. ISBN 0-632-01311-7
- Tabachnick, B.G.; Fidell, L.S. (2007). Using Multivariate Statistics, 5th Edition. Boston: Pearson Education, Inc. / Allyn and Bacon, ISBN 978-0-205-45938-4